Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them.
"There’s almost no story of the future going well that doesn’t have a part that’s like '…and no evil person steals the AI weights and goes and does evil stuff.' So it has highlighted the importance of information security: 'You’re training a powerful AI system; you should make it hard for someone to steal' has popped out to me as a thing that just keeps coming up in these stories, keeps being present. It’s hard to tell a story where it’s not a factor. It’s easy to tell a story where it is a factor." — Holden Karnofsky
What happens when a USB cable can secretly control your system? Are we hurtling toward a security nightmare as critical infrastructure connects to the internet? Is it possible to secure AI model weights from sophisticated attackers? And could AI might actually make computer security better rather than worse?
With AI security concerns becoming increasingly urgent, we bring you insights from 15 top experts across information security, AI safety, and governance, examining the challenges of protecting our most powerful AI models and digital infrastructure — including a sneak peek from an episode that hasn’t yet been released with Tom Davidson, where he explains how we should be more worried about “secret loyalties” in AI agents.
You’ll hear:
Check out the full transcript on the 80,000 Hours website.
Chapters:
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Content editing: Katy Moore and Milo McGuire
Transcriptions and web: Katy Moore
The 20th century saw unprecedented change: nuclear weapons, satellites, the rise and fall of communism, third-wave feminism, the internet, postmodernism, game theory, genetic engineering, the Big Bang theory, quantum mechanics, birth control, and more. Now imagine all of it compressed into just 10 years.
That’s the future Will MacAskill — philosopher, founding figure of effective altruism, and now researcher at the Forethought Centre for AI Strategy — argues we need to prepare for in his new paper “Preparing for the intelligence explosion.” Not in the distant future, but probably in three to seven years.
Links to learn more, highlights, video, and full transcript.
The reason: AI systems are rapidly approaching human-level capability in scientific research and intellectual tasks. Once AI exceeds human abilities in AI research itself, we’ll enter a recursive self-improvement cycle — creating wildly more capable systems. Soon after, by improving algorithms and manufacturing chips, we’ll deploy millions, then billions, then trillions of superhuman AI scientists working 24/7 without human limitations. These systems will collaborate across disciplines, build on each discovery instantly, and conduct experiments at unprecedented scale and speed — compressing a century of scientific progress into mere years.
Will compares the resulting situation to a mediaeval king suddenly needing to upgrade from bows and arrows to nuclear weapons to deal with an ideological threat from a country he’s never heard of, while simultaneously grappling with learning that he descended from monkeys and his god doesn’t exist.
What makes this acceleration perilous is that while technology can speed up almost arbitrarily, human institutions and decision-making are much more fixed.
In this conversation with host Rob Wiblin, recorded on February 7, 2025, Will maps out the challenges we’d face in this potential “intelligence explosion” future, and what we might do to prepare. They discuss:
Chapters:
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore
When OpenAI announced plans to convert from nonprofit to for-profit control last October, it likely didn’t anticipate the legal labyrinth it now faces. A recent court order in Elon Musk’s lawsuit against the company suggests OpenAI’s restructuring faces serious legal threats, which will complicate its efforts to raise tens of billions in investment.
As nonprofit legal expert Rose Chan Loui explains, the court order set up multiple pathways for OpenAI’s conversion to be challenged. Though Judge Yvonne Gonzalez Rogers denied Musk’s request to block the conversion before a trial, she expedited proceedings to the fall so the case could be heard before it’s likely to go ahead. (See Rob’s brief summary of developments in the case.)
And if Musk’s donations to OpenAI are enough to give him the right to bring a case, Rogers sounded very sympathetic to his objections to the OpenAI foundation selling the company, benefiting the founders who forswore “any intent to use OpenAI as a vehicle to enrich themselves.”
But that’s just one of multiple threats. The attorneys general (AGs) in California and Delaware both have standing to object to the conversion on the grounds that it is contrary to the foundation’s charitable purpose and therefore wrongs the public — which was promised all the charitable assets would be used to develop AI that benefits all of humanity, not to win a commercial race. Some, including Rose, suspect the court order was written as a signal to those AGs to take action.
And, as she explains, if the AGs remain silent, the court itself, seeing that the public interest isn’t being represented, could appoint a “special interest party” to take on the case in their place.
This places the OpenAI foundation board in a bind: proceeding with the restructuring despite this legal cloud could expose them to the risk of being sued for a gross breach of their fiduciary duty to the public. The board is made up of respectable people who didn’t sign up for that.
And of course it would cause chaos for the company if all of OpenAI’s fundraising and governance plans were brought to a screeching halt by a federal court judgment landing at the eleventh hour.
Host Rob Wiblin and Rose Chan Loui discuss all of the above as well as what justification the OpenAI foundation could offer for giving up control of the company despite its charitable purpose, and how the board might adjust their plans to make the for-profit switch more legally palatable.
This episode was originally recorded on March 6, 2025.
Chapters:
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions: Katy Moore
A casino offers you a game. A coin will be tossed. If it comes up heads on the first flip you win $2. If it comes up on the second flip you win $4. If it comes up on the third you win $8, the fourth you win $16, and so on. How much should you be willing to pay to play?
The standard way of analysing gambling problems, ‘expected value’ — in which you multiply probabilities by the value of each outcome and then sum them up — says your expected earnings are infinite. You have a 50% chance of winning $2, for '0.5 * $2 = $1' in expected earnings. A 25% chance of winning $4, for '0.25 * $4 = $1' in expected earnings, and on and on. A never-ending series of $1s added together comes to infinity. And that's despite the fact that you know with certainty you can only ever win a finite amount!
Today's guest — philosopher Alan Hájek of the Australian National University — thinks of much of philosophy as “the demolition of common sense followed by damage control” and is an expert on paradoxes related to probability and decision-making rules like “maximise expected value.”
Rebroadcast: this episode was originally released in October 2022.
Links to learn more, highlights, and full transcript.
The problem described above, known as the St. Petersburg paradox, has been a staple of the field since the 18th century, with many proposed solutions. In the interview, Alan explains how very natural attempts to resolve the paradox — such as factoring in the low likelihood that the casino can pay out very large sums, or the fact that money becomes less and less valuable the more of it you already have — fail to work as hoped.
We might reject the setup as a hypothetical that could never exist in the real world, and therefore of mere intellectual curiosity. But Alan doesn't find that objection persuasive. If expected value fails in extreme cases, that should make us worry that something could be rotten at the heart of the standard procedure we use to make decisions in government, business, and nonprofits.
These issues regularly show up in 80,000 Hours' efforts to try to find the best ways to improve the world, as the best approach will arguably involve long-shot attempts to do very large amounts of good.
Consider which is better: saving one life for sure, or three lives with 50% probability? Expected value says the second, which will probably strike you as reasonable enough. But what if we repeat this process and evaluate the chance to save nine lives with 25% probability, or 27 lives with 12.5% probability, or after 17 more iterations, 3,486,784,401 lives with a 0.00000009% chance. Expected value says this final offer is better than the others — 1,000 times better, in fact.
Ultimately Alan leans towards the view that our best choice is to “bite the bullet” and stick with expected value, even with its sometimes counterintuitive implications. Where we want to do damage control, we're better off looking for ways our probability estimates might be wrong.
In this conversation, originally released in October 2022, Alan and Rob explore these issues and many others:
Chapters:
Producer: Keiran Harris
Audio mastering: Ben Cordell and Ryan Kessler
Transcriptions: Katy Moore
America aims to avoid nuclear war by relying on the principle of 'mutually assured destruction,' right? Wrong. Or at least... not officially.
As today's guest — Jeffrey Lewis, founder of Arms Control Wonk and professor at the Middlebury Institute of International Studies — explains, in its official 'OPLANs' (military operation plans), the US is committed to 'dominating' in a nuclear war with Russia. How would they do that? "That is redacted."
Rebroadcast: this episode was originally released in December 2022.
Links to learn more, highlights, and full transcript.
We invited Jeffrey to come on the show to lay out what we and our listeners are most likely to be misunderstanding about nuclear weapons, the nuclear posture of major powers, and his field as a whole, and he did not disappoint.
As Jeffrey tells it, 'mutually assured destruction' was a slur used to criticise those who wanted to limit the 1960s arms buildup, and was never accepted as a matter of policy in any US administration. But isn't it still the de facto reality? Yes and no.
Jeffrey is a specialist on the nuts and bolts of bureaucratic and military decision-making in real-life situations. He suspects that at the start of their term presidents get a briefing about the US' plan to prevail in a nuclear war and conclude that "it's freaking madness." They say to themselves that whatever these silly plans may say, they know a nuclear war cannot be won, so they just won't use the weapons.
But Jeffrey thinks that's a big mistake. Yes, in a calm moment presidents can resist pressure from advisors and generals. But that idea of ‘winning’ a nuclear war is in all the plans. Staff have been hired because they believe in those plans. It's what the generals and admirals have all prepared for.
What matters is the 'not calm moment': the 3AM phone call to tell the president that ICBMs might hit the US in eight minutes — the same week Russia invades a neighbour or China invades Taiwan. Is it a false alarm? Should they retaliate before their land-based missile silos are hit? There's only minutes to decide.
Jeffrey points out that in emergencies, presidents have repeatedly found themselves railroaded into actions they didn't want to take because of how information and options were processed and presented to them. In the heat of the moment, it's natural to reach for the plan you've prepared — however mad it might sound.
In this spicy conversation, Jeffrey fields the most burning questions from Rob and the audience, in the process explaining:
Producer: Keiran Harris
Audio mastering: Ben Cordell
Transcriptions: Katy Moore
Technology doesn’t force us to do anything — it merely opens doors. But military and economic competition pushes us through.
That’s how today’s guest Allan Dafoe — director of frontier safety and governance at Google DeepMind — explains one of the deepest patterns in technological history: once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t. Those who resist too much can find themselves taken over or rendered irrelevant.
Links to learn more, highlights, video, and full transcript.
This dynamic played out dramatically in 1853 when US Commodore Perry sailed into Tokyo Bay with steam-powered warships that seemed magical to the Japanese, who had spent centuries deliberately limiting their technological development. With far greater military power, the US was able to force Japan to open itself to trade. Within 15 years, Japan had undergone the Meiji Restoration and transformed itself in a desperate scramble to catch up.
Today we see hints of similar pressure around artificial intelligence. Even companies, countries, and researchers deeply concerned about where AI could take us feel compelled to push ahead — worried that if they don’t, less careful actors will develop transformative AI capabilities at around the same time anyway.
But Allan argues this technological determinism isn’t absolute. While broad patterns may be inevitable, history shows we do have some ability to steer how technologies are developed, by who, and what they’re used for first.
As part of that approach, Allan has been promoting efforts to make AI more capable of sophisticated cooperation, and improving the tests Google uses to measure how well its models could do things like mislead people, hack and take control of their own servers, or spread autonomously in the wild.
As of mid-2024 they didn’t seem dangerous at all, but we’ve learned that our ability to measure these capabilities is good, but imperfect. If we don’t find the right way to ‘elicit’ an ability we can miss that it’s there.
Subsequent research from Anthropic and Redwood Research suggests there’s even a risk that future models may play dumb to avoid their goals being altered.
That has led DeepMind to a “defence in depth” approach: carefully staged deployment starting with internal testing, then trusted external testers, then limited release, then watching how models are used in the real world. By not releasing model weights, DeepMind is able to back up and add additional safeguards if experience shows they’re necessary.
But with much more powerful and general models on the way, individual company policies won’t be sufficient by themselves. Drawing on his academic research into how societies handle transformative technologies, Allan argues we need coordinated international governance that balances safety with our desire to get the massive potential benefits of AI in areas like healthcare and education as quickly as possible.
Host Rob and Allan also cover:
Chapters:
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions: Katy Moore
On Monday Musk made the OpenAI nonprofit foundation an offer they want to refuse, but might have trouble doing so: $97.4 billion for its stake in the for-profit company, plus the freedom to stick with its current charitable mission.
For a normal company takeover bid, this would already be spicy. But OpenAI’s unique structure — a nonprofit foundation controlling a for-profit corporation — turns the gambit into an audacious attack on the plan OpenAI announced in December to free itself from nonprofit oversight.
As today’s guest Rose Chan Loui — founding executive director of UCLA Law’s Lowell Milken Center for Philanthropy and Nonprofits — explains, OpenAI’s nonprofit board now faces a challenging choice.
Links to learn more, highlights, video, and full transcript.
The nonprofit has a legal duty to pursue its charitable mission of ensuring that AI benefits all of humanity to the best of its ability. And if Musk’s bid would better accomplish that mission than the for-profit’s proposal — that the nonprofit give up control of the company and change its charitable purpose to the vague and barely related “pursue charitable initiatives in sectors such as health care, education, and science” — then it’s not clear the California or Delaware Attorneys General will, or should, approve the deal.
OpenAI CEO Sam Altman quickly tweeted “no thank you” — but that was probably a legal slipup, as he’s not meant to be involved in such a decision, which has to be made by the nonprofit board ‘at arm’s length’ from the for-profit company Sam himself runs.
The board could raise any number of objections: maybe Musk doesn’t have the money, or the purchase would be blocked on antitrust grounds, seeing as Musk owns another AI company (xAI), or Musk might insist on incompetent board appointments that would interfere with the nonprofit foundation pursuing any goal.
But as Rose and Rob lay out, it’s not clear any of those things is actually true.
In this emergency podcast recorded soon after Elon’s offer, Rose and Rob also cover:
Chapters:
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions: Katy Moore
Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?
With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.
Check out the full transcript on the 80,000 Hours website.
You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You’ll hear:
And of course, Rob and Luisa also regularly chime in on what they agree and disagree with.
Chapters:
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions and additional content editing: Katy Moore
If someone said a global health and development programme was sustainable, participatory, and holistic, you'd have to guess that they were saying something positive. But according to today's guest Karen Levy — deworming pioneer and veteran of Innovations for Poverty Action, Evidence Action, and Y Combinator — each of those three concepts has become so fashionable that they're at risk of being seriously overrated and applied where they don't belong.
Rebroadcast: this episode was originally released in March 2022.
Links to learn more, highlights, and full transcript.
Such concepts might even cause harm — trying to make a project embody all three is as likely to ruin it as help it flourish.
First, what do people mean by 'sustainability'? Usually they mean something like the programme will eventually be able to continue without needing further financial support from the donor. But how is that possible? Governments, nonprofits, and aid agencies aim to provide health services, education, infrastructure, financial services, and so on — and all of these require ongoing funding to pay for materials and staff to keep them running.
Given that someone needs to keep paying, Karen tells us that in practice, 'sustainability' is usually a euphemism for the programme at some point being passed on to someone else to fund — usually the national government. And while that can be fine, the national government of Kenya only spends $400 per person to provide each and every government service — just 2% of what the US spends on each resident. Incredibly tight budgets like that are typical of low-income countries.
'Participatory' also sounds nice, and inasmuch as it means leaders are accountable to the people they're trying to help, it probably is. But Karen tells us that in the field, ‘participatory’ usually means that recipients are expected to be involved in planning and delivering services themselves.
While that might be suitable in some situations, it's hardly something people in rich countries always want for themselves. Ideally we want government healthcare and education to be high quality without us having to attend meetings to keep it on track — and people in poor countries have as many or more pressures on their time. While accountability is desirable, an expectation of participation can be as much a burden as a blessing.
Finally, making a programme 'holistic' could be smart, but as Karen lays out, it also has some major downsides. For one, it means you're doing lots of things at once, which makes it hard to tell which parts of the project are making the biggest difference relative to their cost. For another, when you have a lot of goals at once, it's hard to tell whether you're making progress, or really put your mind to focusing on making one thing go extremely well. And finally, holistic programmes can be impractically expensive — Karen tells the story of a wonderful 'holistic school health' programme that, if continued, was going to cost 3.5 times the entire school's budget.
In this in-depth conversation, originally released in March 2022, Karen Levy and host Rob Wiblin chat about the above, as well as:
Chapters:
Producer: Keiran Harris
Audio mastering: Ben Cordell and Ryan Kessler
Transcriptions: Katy Moore
“I want everyone to understand that I am, in fact, a person.” Those words were produced by the AI model LaMDA as a reply to Blake Lemoine in 2022. Based on the Google engineer’s interactions with the model as it was under development, Lemoine became convinced it was sentient and worthy of moral consideration — and decided to tell the world.
Few experts in machine learning, philosophy of mind, or other relevant fields have agreed. And for our part at 80,000 Hours, we don’t think it’s very likely that large language models like LaMBDA are sentient — that is, we don’t think they can have good or bad experiences — in a significant way.
But we think you can’t dismiss the issue of the moral status of digital minds, regardless of your beliefs about the question. There are major errors we could make in at least two directions:
And we’re currently unprepared to face this challenge. We don’t have good methods for assessing the moral status of AI systems. We don’t know what to do if millions of people or more believe, like Lemoine, that the chatbots they talk to have internal experiences and feelings of their own. We don’t know if efforts to control AI may lead to extreme suffering.
We believe this is a pressing world problem. It’s hard to know what to do about it or how good the opportunities to work on it are likely to be. But there are some promising approaches. We propose building a field of research to understand digital minds, so we’ll be better able to navigate these potentially massive issues if and when they arise.
This article narration by the author (Cody Fenwick) explains in more detail why we think this is a pressing problem, what we think can be done about it, and how you might pursue this work in your career. We also discuss a series of possible objections to thinking this is a pressing world problem.
You can read the full article, Understanding the moral status of digital minds, on the 80,000 Hours website.
Chapters:
If a business has spent $100 million developing a product, it’s a fair bet that they don’t want it stolen in two seconds and uploaded to the web where anyone can use it for free.
This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.
Today’s guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic with the security team. One of her jobs is to stop hackers exfiltrating Anthropic’s incredibly expensive intellectual property, as recently happened to Nvidia.
Rebroadcast: this episode was originally released in June 2022.
Links to learn more, highlights, and full transcript.
As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.
The worries aren’t purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we’ll develop so-called artificial ‘general’ intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.
If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.
If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally ‘go rogue,’ breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can’t be shut off.
As Nova explains, in either case, we don’t want such models disseminated all over the world before we’ve confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic — perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly, something we can only speculate on at this point.
If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.
We’ll soon need the ability to ‘sandbox’ (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.
Chapters:
Producer: Keiran Harris
Audio mastering: Ben Cordell and Beppe Rådvik
Transcriptions: Katy Moore