80,000 Hours Podcast

Get the App

80,000 Hours Podcast

Rob, Luisa, Keiran, and the 80,000 Hours team

Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them.

2 hours 16 minutes

#214 – Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway
Most AI safety conversations centre on alignment: ensuring AI systems share our values and goals. But despite progress, we’re unlikely to know we’ve solved the problem before the arrival of human-level and superhuman systems in as little as three years.
So some are developing a backup plan to safely deploy models we fear are actively scheming to harm us — so-called “AI control.” While this may sound mad, given the reluctance of AI companies to delay deploying anything they train, not developing such techniques is probably even crazier.
Today’s guest — Buck Shlegeris, CEO of Redwood Research — has spent the last few years developing control mechanisms, and for human-level systems they’re more plausible than you might think. He argues that given companies’ unwillingness to incur large costs for security, accepting the possibility of misalignment and designing robust safeguards might be one of our best remaining options.
Links to learn more, highlights, video, and full transcript.
As Buck puts it: "Five years ago I thought of misalignment risk from AIs as a really hard problem that you’d need some really galaxy-brained fundamental insights to resolve. Whereas now, to me the situation feels a lot more like we just really know a list of 40 things where, if you did them — none of which seem that hard — you’d probably be able to not have very much of your problem."
Of course, even if Buck is right, we still need to do those 40 things — which he points out we’re not on track for. And AI control agendas have their limitations: they aren’t likely to work once AI systems are much more capable than humans, since greatly superhuman AIs can probably work around whatever limitations we impose.
Still, AI control agendas seem to be gaining traction within AI safety. Buck and host Rob Wiblin discuss all of the above, plus:
- Why he’s more worried about AI hacking its own data centre than escaping
- What to do about “chronic harm,” where AI systems subtly underperform or sabotage important work like alignment research
- Why he might want to use a model he thought could be conspiring against him
- Why he would feel safer if he caught an AI attempting to escape
- Why many control techniques would be relatively inexpensive
- How to use an untrusted model to monitor another untrusted model
- What the minimum viable intervention in a “lazy” AI company might look like
- How even small teams of safety-focused staff within AI labs could matter
- The moral considerations around controlling potentially conscious AI systems, and whether it’s justified
Chapters:
- Cold open |00:00:00|
- Who’s Buck Shlegeris? |00:01:27|
- What's AI control? |00:01:51|
- Why is AI control hot now? |00:05:39|
- Detecting human vs AI spies |00:10:32|
- Acute vs chronic AI betrayal |00:15:21|
- How to catch AIs trying to escape |00:17:48|
- The cheapest AI control techniques |00:32:48|
- Can we get untrusted models to do trusted work? |00:38:58|
- If we catch a model escaping... will we do anything? |00:50:15|
- Getting AI models to think they've already escaped |00:52:51|
- Will they be able to tell it's a setup? |00:58:11|
- Will AI companies do any of this stuff? |01:00:11|
- Can we just give AIs fewer permissions? |01:06:14|
- Can we stop human spies the same way? |01:09:58|
- The pitch to AI companies to do this |01:15:04|
- Will AIs get superhuman so fast that this is all useless? |01:17:18|
- Risks from AI deliberately doing a bad job |01:18:37|
- Is alignment still useful? |01:24:49|
- Current alignment methods don't detect scheming |01:29:12|
- How to tell if AI control will work |01:31:40|
- How can listeners contribute? |01:35:53|
- Is 'controlling' AIs kind of a dick move? |01:37:13|
- Could 10 safety-focused people in an AGI company do anything useful? |01:42:27|
- Benefits of working outside frontier AI companies |01:47:48|
- Why Redwood Research does what it does |01:51:34|
- What other safety-related research looks best to Buck? |01:58:56|
- If an AI escapes, is it likely to be able to beat humanity from there? |01:59:48|
- Will misaligned models have to go rogue ASAP, before they're ready? |02:07:04|
- Is research on human scheming relevant to AI? |02:08:03|
This episode was originally recorded on February 21, 2025.
Video: Simon Monsour and Luke Monsour
Audio engineering: Ben Cordell, Milo McGuire, and Dominic Armstrong
Transcriptions and web: Katy Moore
4 April 2025, 11:59 am
2 hours 35 minutes

15 expert takes on infosec in the age of AI
"There’s almost no story of the future going well that doesn’t have a part that’s like '…and no evil person steals the AI weights and goes and does evil stuff.' So it has highlighted the importance of information security: 'You’re training a powerful AI system; you should make it hard for someone to steal' has popped out to me as a thing that just keeps coming up in these stories, keeps being present. It’s hard to tell a story where it’s not a factor. It’s easy to tell a story where it is a factor." — Holden Karnofsky
What happens when a USB cable can secretly control your system? Are we hurtling toward a security nightmare as critical infrastructure connects to the internet? Is it possible to secure AI model weights from sophisticated attackers? And could AI might actually make computer security better rather than worse?
With AI security concerns becoming increasingly urgent, we bring you insights from 15 top experts across information security, AI safety, and governance, examining the challenges of protecting our most powerful AI models and digital infrastructure — including a sneak peek from an episode that hasn’t yet been released with Tom Davidson, where he explains how we should be more worried about “secret loyalties” in AI agents.
You’ll hear:
- Holden Karnofsky on why every good future relies on strong infosec, and how hard it’s been to hire security experts (from episode #158)
- Tantum Collins on why infosec might be the rare issue everyone agrees on (episode #166)
- Nick Joseph on whether AI companies can develop frontier models safely with the current state of information security (episode #197)
- Sella Nevo on why AI model weights are so valuable to steal, the weaknesses of air-gapped networks, and the risks of USBs (episode #195)
- Kevin Esvelt on what cryptographers can teach biosecurity experts (episode #164)
- Lennart Heim on on Rob’s computer security nightmares (episode #155)
- Zvi Mowshowitz on the insane lack of security mindset at some AI companies (episode #184)
- Nova DasSarma on the best current defences against well-funded adversaries, politically motivated cyberattacks, and exciting progress in infosecurity (episode #132)
- Bruce Schneier on whether AI could eliminate software bugs for good, and why it’s bad to hook everything up to the internet (episode #64)
- Nita Farahany on the dystopian risks of hacked neurotech (episode #174)
- Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (episode #194)
- Nathan Labenz on how even internal teams at AI companies may not know what they’re building (episode #176)
- Allan Dafoe on backdooring your own AI to prevent theft (episode #212)
- Tom Davidson on how dangerous “secret loyalties” in AI models could be (episode to be released!)
- Carl Shulman on the challenge of trusting foreign AI models (episode #191, part 2)
- Plus lots of concrete advice on how to get into this field and find your fit
Check out the full transcript on the 80,000 Hours website.
Chapters:
- Cold open (00:00:00)
- Rob's intro (00:00:49)
- Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots (00:03:21)
- Tantum Collins on why infosec is a rare AI issue that unifies everyone (00:12:39)
- Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI (00:16:23)
- Nova DasSarma on the best available defences against well-funded adversaries (00:22:10)
- Sella Nevo on why AI model weights are so valuable to steal (00:28:56)
- Kevin Esvelt on what cryptographers can teach biosecurity experts (00:32:24)
- Lennart Heim on the possibility of an autonomously replicating AI computer worm (00:34:56)
- Zvi Mowshowitz on the absurd lack of security mindset at some AI companies (00:48:22)
- Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices (00:49:54)
- Bruce Schneier on why it’s bad to hook everything up to the internet (00:55:54)
- Nita Farahany on the possibility of hacking neural implants (01:04:47)
- Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (01:10:48)
- Nova DasSarma on exciting progress in information security (01:19:28)
- Nathan Labenz on how even internal teams at AI companies may not know what they’re building (01:30:47)
- Allan Dafoe on backdooring your own AI to prevent someone else from stealing it (01:33:51)
- Tom Davidson on how dangerous “secret loyalties” in AI models could get (01:35:57)
- Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology (01:52:45)
- Nova DasSarma on politically motivated cyberattacks (02:03:44)
- Bruce Schneier on the day-to-day benefits of improved security and recognising that there’s never zero risk (02:07:27)
- Holden Karnofsky on why it’s so hard to hire security people despite the massive need (02:13:59)
- Nova DasSarma on practical steps to getting into this field (02:16:37)
- Bruce Schneier on finding your personal fit in a range of security careers (02:24:42)
- Rob's outro (02:34:46)
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Content editing: Katy Moore and Milo McGuire
Transcriptions and web: Katy Moore
28 March 2025, 4:53 pm
3 hours 57 minutes

#213 – Will MacAskill on AI causing a “century in a decade” – and how we're completely unprepared
The 20th century saw unprecedented change: nuclear weapons, satellites, the rise and fall of communism, third-wave feminism, the internet, postmodernism, game theory, genetic engineering, the Big Bang theory, quantum mechanics, birth control, and more. Now imagine all of it compressed into just 10 years.
That’s the future Will MacAskill — philosopher, founding figure of effective altruism, and now researcher at the Forethought Centre for AI Strategy — argues we need to prepare for in his new paper “Preparing for the intelligence explosion.” Not in the distant future, but probably in three to seven years.
Links to learn more, highlights, video, and full transcript.
The reason: AI systems are rapidly approaching human-level capability in scientific research and intellectual tasks. Once AI exceeds human abilities in AI research itself, we’ll enter a recursive self-improvement cycle — creating wildly more capable systems. Soon after, by improving algorithms and manufacturing chips, we’ll deploy millions, then billions, then trillions of superhuman AI scientists working 24/7 without human limitations. These systems will collaborate across disciplines, build on each discovery instantly, and conduct experiments at unprecedented scale and speed — compressing a century of scientific progress into mere years.
Will compares the resulting situation to a mediaeval king suddenly needing to upgrade from bows and arrows to nuclear weapons to deal with an ideological threat from a country he’s never heard of, while simultaneously grappling with learning that he descended from monkeys and his god doesn’t exist.
What makes this acceleration perilous is that while technology can speed up almost arbitrarily, human institutions and decision-making are much more fixed.
In this conversation with host Rob Wiblin, recorded on February 7, 2025, Will maps out the challenges we’d face in this potential “intelligence explosion” future, and what we might do to prepare. They discuss:
- Why leading AI safety researchers now think there’s dramatically less time before AI is transformative than they’d previously thought
- The three different types of intelligence explosions that occur in order
- Will’s list of resulting grand challenges — including destructive technologies, space governance, concentration of power, and digital rights
- How to prevent ourselves from accidentally “locking in” mediocre futures for all eternity
- Ways AI could radically improve human coordination and decision making
- Why we should aim for truly flourishing futures, not just avoiding extinction
Chapters:
- Cold open (00:00:00)
- Who’s Will MacAskill? (00:00:46)
- Why Will now just works on AGI (00:01:02)
- Will was wrong(ish) on AI timelines and hinge of history (00:04:10)
- A century of history crammed into a decade (00:09:00)
- Science goes super fast; our institutions don't keep up (00:15:42)
- Is it good or bad for intellectual progress to 10x? (00:21:03)
- An intelligence explosion is not just plausible but likely (00:22:54)
- Intellectual advances outside technology are similarly important (00:28:57)
- Counterarguments to intelligence explosion (00:31:31)
- The three types of intelligence explosion (software, technological, industrial) (00:37:29)
- The industrial intelligence explosion is the most certain and enduring (00:40:23)
- Is a 100x or 1,000x speedup more likely than 10x? (00:51:51)
- The grand superintelligence challenges (00:55:37)
- Grand challenge #1: Many new destructive technologies (00:59:17)
- Grand challenge #2: Seizure of power by a small group (01:06:45)
- Is global lock-in really plausible? (01:08:37)
- Grand challenge #3: Space governance (01:18:53)
- Is space truly defence-dominant? (01:28:43)
- Grand challenge #4: Morally integrating with digital beings (01:32:20)
- Will we ever know if digital minds are happy? (01:41:01)
- “My worry isn't that we won't know; it's that we won't care” (01:46:31)
- Can we get AGI to solve all these issues as early as possible? (01:49:40)
- Politicians have to learn to use AI advisors (02:02:03)
- Ensuring AI makes us smarter decision-makers (02:06:10)
- How listeners can speed up AI epistemic tools (02:09:38)
- AI could become great at forecasting (02:13:09)
- How not to lock in a bad future (02:14:37)
- AI takeover might happen anyway — should we rush to load in our values? (02:25:29)
- ML researchers are feverishly working to destroy their own power (02:34:37)
- We should aim for more than mere survival (02:37:54)
- By default the future is rubbish (02:49:04)
- No easy utopia (02:56:55)
- What levers matter most to utopia (03:06:32)
- Bottom lines from the modelling (03:20:09)
- People distrust utopianism; should they distrust this? (03:24:09)
- What conditions make eventual eutopia likely? (03:28:49)
- The new Forethought Centre for AI Strategy (03:37:21)
- How does Will resist hopelessness? (03:50:13)
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions and web: Katy Moore
11 March 2025, 5:22 pm
36 minutes 50 seconds

Emergency pod: Judge plants a legal time bomb under OpenAI (with Rose Chan Loui)
When OpenAI announced plans to convert from nonprofit to for-profit control last October, it likely didn’t anticipate the legal labyrinth it now faces. A recent court order in Elon Musk’s lawsuit against the company suggests OpenAI’s restructuring faces serious legal threats, which will complicate its efforts to raise tens of billions in investment.
As nonprofit legal expert Rose Chan Loui explains, the court order set up multiple pathways for OpenAI’s conversion to be challenged. Though Judge Yvonne Gonzalez Rogers denied Musk’s request to block the conversion before a trial, she expedited proceedings to the fall so the case could be heard before it’s likely to go ahead. (See Rob’s brief summary of developments in the case.)
And if Musk’s donations to OpenAI are enough to give him the right to bring a case, Rogers sounded very sympathetic to his objections to the OpenAI foundation selling the company, benefiting the founders who forswore “any intent to use OpenAI as a vehicle to enrich themselves.”
But that’s just one of multiple threats. The attorneys general (AGs) in California and Delaware both have standing to object to the conversion on the grounds that it is contrary to the foundation’s charitable purpose and therefore wrongs the public — which was promised all the charitable assets would be used to develop AI that benefits all of humanity, not to win a commercial race. Some, including Rose, suspect the court order was written as a signal to those AGs to take action.
And, as she explains, if the AGs remain silent, the court itself, seeing that the public interest isn’t being represented, could appoint a “special interest party” to take on the case in their place.
This places the OpenAI foundation board in a bind: proceeding with the restructuring despite this legal cloud could expose them to the risk of being sued for a gross breach of their fiduciary duty to the public. The board is made up of respectable people who didn’t sign up for that.
And of course it would cause chaos for the company if all of OpenAI’s fundraising and governance plans were brought to a screeching halt by a federal court judgment landing at the eleventh hour.
Host Rob Wiblin and Rose Chan Loui discuss all of the above as well as what justification the OpenAI foundation could offer for giving up control of the company despite its charitable purpose, and how the board might adjust their plans to make the for-profit switch more legally palatable.
This episode was originally recorded on March 6, 2025.
Chapters:
- Intro (00:00:11)
- More juicy OpenAI news (00:00:46)
- The court order (00:02:11)
- Elon has two hurdles to jump (00:05:17)
- The judge's sympathy (00:08:00)
- OpenAI's defence (00:11:45)
- Alternative plans for OpenAI (00:13:41)
- Should the foundation give up control? (00:16:38)
- Alternative plaintiffs to Musk (00:21:13)
- The 'special interest party' option (00:25:32)
- How might this play out in the fall? (00:27:52)
- The nonprofit board is in a bit of a bind (00:29:20)
- Is it in the public interest to race? (00:32:23)
- Could the board be personally negligent? (00:34:06)
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions: Katy Moore
7 March 2025, 6:48 pm
3 hours 41 minutes

#139 Classic episode – Alan Hájek on puzzles and paradoxes in probability and expected value
A casino offers you a game. A coin will be tossed. If it comes up heads on the first flip you win $2. If it comes up on the second flip you win $4. If it comes up on the third you win $8, the fourth you win $16, and so on. How much should you be willing to pay to play?
The standard way of analysing gambling problems, ‘expected value’ — in which you multiply probabilities by the value of each outcome and then sum them up — says your expected earnings are infinite. You have a 50% chance of winning $2, for '0.5 * $2 = $1' in expected earnings. A 25% chance of winning $4, for '0.25 * $4 = $1' in expected earnings, and on and on. A never-ending series of $1s added together comes to infinity. And that's despite the fact that you know with certainty you can only ever win a finite amount!
Today's guest — philosopher Alan Hájek of the Australian National University — thinks of much of philosophy as “the demolition of common sense followed by damage control” and is an expert on paradoxes related to probability and decision-making rules like “maximise expected value.”
Rebroadcast: this episode was originally released in October 2022.
Links to learn more, highlights, and full transcript.
The problem described above, known as the St. Petersburg paradox, has been a staple of the field since the 18th century, with many proposed solutions. In the interview, Alan explains how very natural attempts to resolve the paradox — such as factoring in the low likelihood that the casino can pay out very large sums, or the fact that money becomes less and less valuable the more of it you already have — fail to work as hoped.
We might reject the setup as a hypothetical that could never exist in the real world, and therefore of mere intellectual curiosity. But Alan doesn't find that objection persuasive. If expected value fails in extreme cases, that should make us worry that something could be rotten at the heart of the standard procedure we use to make decisions in government, business, and nonprofits.
These issues regularly show up in 80,000 Hours' efforts to try to find the best ways to improve the world, as the best approach will arguably involve long-shot attempts to do very large amounts of good.
Consider which is better: saving one life for sure, or three lives with 50% probability? Expected value says the second, which will probably strike you as reasonable enough. But what if we repeat this process and evaluate the chance to save nine lives with 25% probability, or 27 lives with 12.5% probability, or after 17 more iterations, 3,486,784,401 lives with a 0.00000009% chance. Expected value says this final offer is better than the others — 1,000 times better, in fact.
Ultimately Alan leans towards the view that our best choice is to “bite the bullet” and stick with expected value, even with its sometimes counterintuitive implications. Where we want to do damage control, we're better off looking for ways our probability estimates might be wrong.
In this conversation, originally released in October 2022, Alan and Rob explore these issues and many others:
- Simple rules of thumb for having philosophical insights
- A key flaw that hid in Pascal's wager from the very beginning
- Whether we have to simply ignore infinities because they mess everything up
- What fundamentally is 'probability'?
- Some of the many reasons 'frequentism' doesn't work as an account of probability
- Why the standard account of counterfactuals in philosophy is deeply flawed
- And why counterfactuals present a fatal problem for one sort of consequentialism
Chapters:
- Cold open {00:00:00}
- Rob's intro {00:01:05}
- The interview begins {00:05:28}
- Philosophical methodology {00:06:35}
- Theories of probability {00:40:58}
- Everyday Bayesianism {00:49:42}
- Frequentism {01:08:37}
- Ranges of probabilities {01:20:05}
- Implications for how to live {01:25:05}
- Expected value {01:30:39}
- The St. Petersburg paradox {01:35:21}
- Pascal’s wager {01:53:25}
- Using expected value in everyday life {02:07:34}
- Counterfactuals {02:20:19}
- Most counterfactuals are false {02:56:06}
- Relevance to objective consequentialism {03:13:28}
- Alan’s best conference story {03:37:18}
- Rob's outro {03:40:22}
Producer: Keiran Harris
Audio mastering: Ben Cordell and Ryan Kessler
Transcriptions: Katy Moore
25 February 2025, 3:06 pm
2 hours 40 minutes

#143 Classic episode – Jeffrey Lewis on the most common misconceptions about nuclear weapons
America aims to avoid nuclear war by relying on the principle of 'mutually assured destruction,' right? Wrong. Or at least... not officially.
As today's guest — Jeffrey Lewis, founder of Arms Control Wonk and professor at the Middlebury Institute of International Studies — explains, in its official 'OPLANs' (military operation plans), the US is committed to 'dominating' in a nuclear war with Russia. How would they do that? "That is redacted."
Rebroadcast: this episode was originally released in December 2022.
Links to learn more, highlights, and full transcript.
We invited Jeffrey to come on the show to lay out what we and our listeners are most likely to be misunderstanding about nuclear weapons, the nuclear posture of major powers, and his field as a whole, and he did not disappoint.
As Jeffrey tells it, 'mutually assured destruction' was a slur used to criticise those who wanted to limit the 1960s arms buildup, and was never accepted as a matter of policy in any US administration. But isn't it still the de facto reality? Yes and no.
Jeffrey is a specialist on the nuts and bolts of bureaucratic and military decision-making in real-life situations. He suspects that at the start of their term presidents get a briefing about the US' plan to prevail in a nuclear war and conclude that "it's freaking madness." They say to themselves that whatever these silly plans may say, they know a nuclear war cannot be won, so they just won't use the weapons.
But Jeffrey thinks that's a big mistake. Yes, in a calm moment presidents can resist pressure from advisors and generals. But that idea of ‘winning’ a nuclear war is in all the plans. Staff have been hired because they believe in those plans. It's what the generals and admirals have all prepared for.
What matters is the 'not calm moment': the 3AM phone call to tell the president that ICBMs might hit the US in eight minutes — the same week Russia invades a neighbour or China invades Taiwan. Is it a false alarm? Should they retaliate before their land-based missile silos are hit? There's only minutes to decide.
Jeffrey points out that in emergencies, presidents have repeatedly found themselves railroaded into actions they didn't want to take because of how information and options were processed and presented to them. In the heat of the moment, it's natural to reach for the plan you've prepared — however mad it might sound.
In this spicy conversation, Jeffrey fields the most burning questions from Rob and the audience, in the process explaining:
- Why inter-service rivalry is one of the biggest constraints on US nuclear policy
- Two times the US sabotaged nuclear nonproliferation among great powers
- How his field uses jargon to exclude outsiders
- How the US could prevent the revival of mass nuclear testing by the great powers
- Why nuclear deterrence relies on the possibility that something might go wrong
- Whether 'salami tactics' render nuclear weapons ineffective
- The time the Navy and Air Force switched views on how to wage a nuclear war, just when it would allow *them* to have the most missiles
- The problems that arise when you won't talk to people you think are evil
- Why missile defences are politically popular despite being strategically foolish
- How open source intelligence can prevent arms races
- And much more.
Producer: Keiran Harris
Audio mastering: Ben Cordell
Transcriptions: Katy Moore
19 February 2025, 1:56 pm
2 hours 44 minutes

#212 – Allan Dafoe on why technology is unstoppable & how to shape AI development anyway
Technology doesn’t force us to do anything — it merely opens doors. But military and economic competition pushes us through.
That’s how today’s guest Allan Dafoe — director of frontier safety and governance at Google DeepMind — explains one of the deepest patterns in technological history: once a powerful new capability becomes available, societies that adopt it tend to outcompete those that don’t. Those who resist too much can find themselves taken over or rendered irrelevant.
Links to learn more, highlights, video, and full transcript.
This dynamic played out dramatically in 1853 when US Commodore Perry sailed into Tokyo Bay with steam-powered warships that seemed magical to the Japanese, who had spent centuries deliberately limiting their technological development. With far greater military power, the US was able to force Japan to open itself to trade. Within 15 years, Japan had undergone the Meiji Restoration and transformed itself in a desperate scramble to catch up.
Today we see hints of similar pressure around artificial intelligence. Even companies, countries, and researchers deeply concerned about where AI could take us feel compelled to push ahead — worried that if they don’t, less careful actors will develop transformative AI capabilities at around the same time anyway.
But Allan argues this technological determinism isn’t absolute. While broad patterns may be inevitable, history shows we do have some ability to steer how technologies are developed, by who, and what they’re used for first.
As part of that approach, Allan has been promoting efforts to make AI more capable of sophisticated cooperation, and improving the tests Google uses to measure how well its models could do things like mislead people, hack and take control of their own servers, or spread autonomously in the wild.
As of mid-2024 they didn’t seem dangerous at all, but we’ve learned that our ability to measure these capabilities is good, but imperfect. If we don’t find the right way to ‘elicit’ an ability we can miss that it’s there.
Subsequent research from Anthropic and Redwood Research suggests there’s even a risk that future models may play dumb to avoid their goals being altered.
That has led DeepMind to a “defence in depth” approach: carefully staged deployment starting with internal testing, then trusted external testers, then limited release, then watching how models are used in the real world. By not releasing model weights, DeepMind is able to back up and add additional safeguards if experience shows they’re necessary.
But with much more powerful and general models on the way, individual company policies won’t be sufficient by themselves. Drawing on his academic research into how societies handle transformative technologies, Allan argues we need coordinated international governance that balances safety with our desire to get the massive potential benefits of AI in areas like healthcare and education as quickly as possible.
Host Rob and Allan also cover:
- The most exciting beneficial applications of AI
- Whether and how we can influence the development of technology
- What DeepMind is doing to evaluate and mitigate risks from frontier AI systems
- Why cooperative AI may be as important as aligned AI
- The role of democratic input in AI governance
- What kinds of experts are most needed in AI safety and governance
- And much more
Chapters:
- Cold open (00:00:00)
- Who's Allan Dafoe? (00:00:48)
- Allan's role at DeepMind (00:01:27)
- Why join DeepMind over everyone else? (00:04:27)
- Do humans control technological change? (00:09:17)
- Arguments for technological determinism (00:20:24)
- The synthesis of agency with tech determinism (00:26:29)
- Competition took away Japan's choice (00:37:13)
- Can speeding up one tech redirect history? (00:42:09)
- Structural pushback against alignment efforts (00:47:55)
- Do AIs need to be 'cooperatively skilled'? (00:52:25)
- How AI could boost cooperation between people and states (01:01:59)
- The super-cooperative AGI hypothesis and backdoor risks (01:06:58)
- Aren’t today’s models already very cooperative? (01:13:22)
- How would we make AIs cooperative anyway? (01:16:22)
- Ways making AI more cooperative could backfire (01:22:24)
- AGI is an essential idea we should define well (01:30:16)
- It matters what AGI learns first vs last (01:41:01)
- How Google tests for dangerous capabilities (01:45:39)
- Evals 'in the wild' (01:57:46)
- What to do given no single approach works that well (02:01:44)
- We don't, but could, forecast AI capabilities (02:05:34)
- DeepMind's strategy for ensuring its frontier models don't cause harm (02:11:25)
- How 'structural risks' can force everyone into a worse world (02:15:01)
- Is AI being built democratically? Should it? (02:19:35)
- How much do AI companies really want external regulation? (02:24:34)
- Social science can contribute a lot here (02:33:21)
- How AI could make life way better: self-driving cars, medicine, education, and sustainability (02:35:55)
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Camera operator: Jeremy Chevillotte
Transcriptions: Katy Moore
14 February 2025, 5:05 pm
57 minutes 29 seconds

Emergency pod: Elon tries to crash OpenAI's party (with Rose Chan Loui)
On Monday Musk made the OpenAI nonprofit foundation an offer they want to refuse, but might have trouble doing so: $97.4 billion for its stake in the for-profit company, plus the freedom to stick with its current charitable mission.
For a normal company takeover bid, this would already be spicy. But OpenAI’s unique structure — a nonprofit foundation controlling a for-profit corporation — turns the gambit into an audacious attack on the plan OpenAI announced in December to free itself from nonprofit oversight.
As today’s guest Rose Chan Loui — founding executive director of UCLA Law’s Lowell Milken Center for Philanthropy and Nonprofits — explains, OpenAI’s nonprofit board now faces a challenging choice.
Links to learn more, highlights, video, and full transcript.
The nonprofit has a legal duty to pursue its charitable mission of ensuring that AI benefits all of humanity to the best of its ability. And if Musk’s bid would better accomplish that mission than the for-profit’s proposal — that the nonprofit give up control of the company and change its charitable purpose to the vague and barely related “pursue charitable initiatives in sectors such as health care, education, and science” — then it’s not clear the California or Delaware Attorneys General will, or should, approve the deal.
OpenAI CEO Sam Altman quickly tweeted “no thank you” — but that was probably a legal slipup, as he’s not meant to be involved in such a decision, which has to be made by the nonprofit board ‘at arm’s length’ from the for-profit company Sam himself runs.
The board could raise any number of objections: maybe Musk doesn’t have the money, or the purchase would be blocked on antitrust grounds, seeing as Musk owns another AI company (xAI), or Musk might insist on incompetent board appointments that would interfere with the nonprofit foundation pursuing any goal.
But as Rose and Rob lay out, it’s not clear any of those things is actually true.
In this emergency podcast recorded soon after Elon’s offer, Rose and Rob also cover:
- Why OpenAI wants to change its charitable purpose and whether that’s legally permissible
- On what basis the attorneys general will decide OpenAI’s fate
- The challenges in valuing the nonprofit’s “priceless” position of control
- Whether Musk’s offer will force OpenAI to up their own bid, and whether they could raise the money
- If other tech giants might now jump in with competing offers
- How politics could influence the attorneys general reviewing the deal
- What Rose thinks should actually happen to protect the public interest
Chapters:
- Cold open (00:00:00)
- Elon throws a $97.4b bomb (00:01:18)
- What was craziest in OpenAI’s plan to break free of the nonprofit (00:02:24)
- Can OpenAI suddenly change its charitable purpose like that? (00:05:19)
- Diving into Elon’s big announcement (00:15:16)
- Ways OpenAI could try to reject the offer (00:27:21)
- Sam Altman slips up (00:35:26)
- Will this actually stop things? (00:38:03)
- Why does OpenAI even want to change its charitable mission? (00:42:46)
- Most likely outcomes and what Rose thinks should happen (00:51:17)
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions: Katy Moore
12 February 2025, 6:31 pm
3 hours 12 minutes

AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out
Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?
With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.
Check out the full transcript on the 80,000 Hours website.
You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You’ll hear:
- Ajeya Cotra on overrated AGI worries
- Holden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models bigger
- Ian Morris on why the future must be radically different from the present
- Nick Joseph on whether his companies internal safety policies are enough
- Richard Ngo on what everyone gets wrong about how ML models work
- Tom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn’t
- Carl Shulman on why you’ll prefer robot nannies over human ones
- Zvi Mowshowitz on why he’s against working at AI companies except in some safety roles
- Hugo Mercier on why even superhuman AGI won’t be that persuasive
- Rob Long on the case for and against digital sentience
- Anil Seth on why he thinks consciousness is probably biological
- Lewis Bollard on whether AI advances will help or hurt nonhuman animals
- Rohin Shah on whether humanity’s work ends at the point it creates AGI
And of course, Rob and Luisa also regularly chime in on what they agree and disagree with.
Chapters:
- Cold open (00:00:00)
- Rob's intro (00:00:58)
- Rob & Luisa: Bowerbirds compiling the AI story (00:03:28)
- Ajeya Cotra on the misalignment stories she doesn’t buy (00:09:16)
- Rob & Luisa: Agentic AI and designing machine people (00:24:06)
- Holden Karnofsky on the dangers of even aligned AI, and how we probably won’t all die from misaligned AI (00:39:20)
- Ian Morris on why we won’t end up living like The Jetsons (00:47:03)
- Rob & Luisa: It’s not hard for nonexperts to understand we’re playing with fire here (00:52:21)
- Nick Joseph on whether AI companies’ internal safety policies will be enough (00:55:43)
- Richard Ngo on the most important misconception in how ML models work (01:03:10)
- Rob & Luisa: Issues Rob is less worried about now (01:07:22)
- Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy (01:14:08)
- Michael Webb on why he’s sceptical about explosive economic growth (01:20:50)
- Carl Shulman on why people will prefer robot nannies over humans (01:28:25)
- Rob & Luisa: Should we expect AI-related job loss? (01:36:19)
- Zvi Mowshowitz on why he thinks it’s a bad idea to work on improving capabilities at cutting-edge AI companies (01:40:06)
- Holden Karnofsky on the power that comes from just making models bigger (01:45:21)
- Rob & Luisa: Are risks of AI-related misinformation overblown? (01:49:49)
- Hugo Mercier on how AI won’t cause misinformation pandemonium (01:58:29)
- Rob & Luisa: How hard will it actually be to create intelligence? (02:09:08)
- Robert Long on whether digital sentience is possible (02:15:09)
- Anil Seth on why he believes in the biological basis of consciousness (02:27:21)
- Lewis Bollard on whether AI will be good or bad for animal welfare (02:40:52)
- Rob & Luisa: The most interesting new argument Rob’s heard this year (02:50:37)
- Rohin Shah on whether AGI will be the last thing humanity ever does (02:57:35)
- Rob's outro (03:11:02)
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Transcriptions and additional content editing: Katy Moore
10 February 2025, 3:55 pm
3 hours 10 minutes

#124 Classic episode – Karen Levy on fads and misaligned incentives in global development, and scaling deworming to reach hundreds of millions
If someone said a global health and development programme was sustainable, participatory, and holistic, you'd have to guess that they were saying something positive. But according to today's guest Karen Levy — deworming pioneer and veteran of Innovations for Poverty Action, Evidence Action, and Y Combinator — each of those three concepts has become so fashionable that they're at risk of being seriously overrated and applied where they don't belong.
Rebroadcast: this episode was originally released in March 2022.
Links to learn more, highlights, and full transcript.
Such concepts might even cause harm — trying to make a project embody all three is as likely to ruin it as help it flourish.
First, what do people mean by 'sustainability'? Usually they mean something like the programme will eventually be able to continue without needing further financial support from the donor. But how is that possible? Governments, nonprofits, and aid agencies aim to provide health services, education, infrastructure, financial services, and so on — and all of these require ongoing funding to pay for materials and staff to keep them running.
Given that someone needs to keep paying, Karen tells us that in practice, 'sustainability' is usually a euphemism for the programme at some point being passed on to someone else to fund — usually the national government. And while that can be fine, the national government of Kenya only spends $400 per person to provide each and every government service — just 2% of what the US spends on each resident. Incredibly tight budgets like that are typical of low-income countries.
'Participatory' also sounds nice, and inasmuch as it means leaders are accountable to the people they're trying to help, it probably is. But Karen tells us that in the field, ‘participatory’ usually means that recipients are expected to be involved in planning and delivering services themselves.
While that might be suitable in some situations, it's hardly something people in rich countries always want for themselves. Ideally we want government healthcare and education to be high quality without us having to attend meetings to keep it on track — and people in poor countries have as many or more pressures on their time. While accountability is desirable, an expectation of participation can be as much a burden as a blessing.
Finally, making a programme 'holistic' could be smart, but as Karen lays out, it also has some major downsides. For one, it means you're doing lots of things at once, which makes it hard to tell which parts of the project are making the biggest difference relative to their cost. For another, when you have a lot of goals at once, it's hard to tell whether you're making progress, or really put your mind to focusing on making one thing go extremely well. And finally, holistic programmes can be impractically expensive — Karen tells the story of a wonderful 'holistic school health' programme that, if continued, was going to cost 3.5 times the entire school's budget.
In this in-depth conversation, originally released in March 2022, Karen Levy and host Rob Wiblin chat about the above, as well as:
- Why it pays to figure out how you'll interpret the results of an experiment ahead of time
- The trouble with misaligned incentives within the development industry
- Projects that don't deliver value for money and should be scaled down
- How Karen accidentally became a leading figure in the push to deworm tens of millions of schoolchildren
- Logistical challenges in reaching huge numbers of people with essential services
- Lessons from Karen's many-decades career
- And much more
Chapters:
- Cold open (00:00:00)
- Rob's intro (00:01:33)
- The interview begins (00:02:21)
- Funding for effective altruist–mentality development projects (00:04:59)
- Pre-policy plans (00:08:36)
- ‘Sustainability’, and other myths in typical international development practice (00:21:37)
- ‘Participatoriness’ (00:36:20)
- ‘Holistic approaches’ (00:40:20)
- How the development industry sees evidence-based development (00:51:31)
- Initiatives in Africa that should be significantly curtailed (00:56:30)
- Misaligned incentives within the development industry (01:05:46)
- Deworming: the early days (01:21:09)
- The problem of deworming (01:34:27)
- Deworm the World (01:45:43)
- Where the majority of the work was happening (01:55:38)
- Logistical issues (02:20:41)
- The importance of a theory of change (02:31:46)
- Ways that things have changed since 2006 (02:36:07)
- Academic work vs policy work (02:38:33)
- Fit for Purpose (02:43:40)
- Living in Kenya (03:00:32)
- Underrated life advice (03:05:29)
- Rob’s outro (03:09:18)
Producer: Keiran Harris
Audio mastering: Ben Cordell and Ryan Kessler
Transcriptions: Katy Moore
7 February 2025, 1:00 pm
1 hour 14 minutes

If digital minds could suffer, how would we ever know? (Article)
“I want everyone to understand that I am, in fact, a person.” Those words were produced by the AI model LaMDA as a reply to Blake Lemoine in 2022. Based on the Google engineer’s interactions with the model as it was under development, Lemoine became convinced it was sentient and worthy of moral consideration — and decided to tell the world.
Few experts in machine learning, philosophy of mind, or other relevant fields have agreed. And for our part at 80,000 Hours, we don’t think it’s very likely that large language models like LaMBDA are sentient — that is, we don’t think they can have good or bad experiences — in a significant way.
But we think you can’t dismiss the issue of the moral status of digital minds, regardless of your beliefs about the question. There are major errors we could make in at least two directions:
- We may create many, many AI systems in the future. If these systems are sentient, or otherwise have moral status, it would be important for humanity to consider their welfare and interests.
- It’s possible the AI systems we will create can’t or won’t have moral status. Then it could be a huge mistake to worry about the welfare of digital minds and doing so might contribute to an AI-related catastrophe.
And we’re currently unprepared to face this challenge. We don’t have good methods for assessing the moral status of AI systems. We don’t know what to do if millions of people or more believe, like Lemoine, that the chatbots they talk to have internal experiences and feelings of their own. We don’t know if efforts to control AI may lead to extreme suffering.
We believe this is a pressing world problem. It’s hard to know what to do about it or how good the opportunities to work on it are likely to be. But there are some promising approaches. We propose building a field of research to understand digital minds, so we’ll be better able to navigate these potentially massive issues if and when they arise.
This article narration by the author (Cody Fenwick) explains in more detail why we think this is a pressing problem, what we think can be done about it, and how you might pursue this work in your career. We also discuss a series of possible objections to thinking this is a pressing world problem.
You can read the full article, Understanding the moral status of digital minds, on the 80,000 Hours website.
Chapters:
- Introduction (00:00:00)
- Understanding the moral status of digital minds (00:00:58)
- Summary (00:03:31)
- Our overall view (00:04:22)
- Why might understanding the moral status of digital minds be an especially pressing problem? (00:05:59)
- Clearing up common misconceptions (00:12:16)
- Creating digital minds could go very badly - or very well (00:14:13)
- Dangers for digital minds (00:14:41)
- Dangers for humans (00:16:13)
- Other dangers (00:17:42)
- Things could also go well (00:18:32)
- We don't know how to assess the moral status of AI systems (00:19:49)
- There are many possible characteristics that give rise to moral status: Consciousness, sentience, agency, and personhood (00:21:39)
- Many plausible theories of consciousness could include digital minds (00:24:16)
- The strongest case for the possibility of sentient digital minds: whole brain emulation (00:28:55)
- We can't rely on what AI systems tell us about themselves: Behavioural tests, theory-based analysis, animal analogue comparisons, brain-AI interfacing (00:32:00)
- The scale of this issue might be enormous (00:36:08)
- Work on this problem is neglected but seems tractable: Impact-guided research, technical approaches, and policy approaches (00:43:35)
- Summing up so far (00:52:22)
- Arguments against the moral status of digital minds as a pressing problem (00:53:25)
- Two key cruxes (00:53:31)
- Maybe this problem is intractable (00:54:16)
- Maybe this issue will be solved by default (00:58:19)
- Isn't risk from AI more important than the risks to AIs? (01:00:45)
- Maybe current AI progress will stall (01:02:36)
- Isn't this just too crazy? (01:03:54)
- What can you do to help? (01:05:10)
- Important considerations if you work on this problem (01:13:00)
4 February 2025, 1:58 pm
More Episodes? Get the App