LessWrong Curated Podcast

LessWrong Curated Podcast

LessWrong

Audio version of the posts shared in the LessWrong Curated newsletter.

46 minutes 26 seconds

“What Is The Alignment Problem?” by johnswentworth

So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like e.g. corrigibility.

That problem description all makes sense on a hand-wavy intuitive level, but once we get concrete and dig into technical details… wait, what exactly is the goal again? When we say we want to “align AGI”, what does that mean? And what about these “human values” - it's easy to list things which are importantly not human values (like stated preferences, revealed preferences, etc), but what are we talking about? And don’t even get me started on corrigibility!

Turns out, it's surprisingly tricky to explain what exactly “the alignment problem” refers to. And there's good reasons for that! In this post, I’ll give my current best explanation of what the alignment problem is (including a few variants and the [...]

---

Outline:

(01:27) The Difficulty of Specifying Problems

(01:50) Toy Problem 1: Old MacDonald's New Hen

(04:08) Toy Problem 2: Sorting Bleggs and Rubes

(06:55) Generalization to Alignment

(08:54) But What If The Patterns Don't Hold?

(13:06) Alignment of What?

(14:01) Alignment of a Goal or Purpose

(19:47) Alignment of Basic Agents

(23:51) Alignment of General Intelligence

(27:40) How Does All That Relate To Todays AI?

(31:03) Alignment to What?

(32:01) What are a Humans Values?

(36:14) Other targets

(36:43) Paul!Corrigibility

(39:11) Eliezer!Corrigibility

(40:52) Subproblem!Corrigibility

(42:55) Exercise: Do What I Mean (DWIM)

(43:26) Putting It All Together, and Takeaways

The original text contained 10 footnotes which were omitted from this narration.

---

First published:
January 16th, 2025

Source:
https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem

---

Narrated by TYPE III AUDIO.

---

Images from the article:

17 January 2025, 6:15 pm
5 minutes 55 seconds

“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes
Traditional economics thinking has two strong principles, each based on abundant historical data:
- Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human labor will retain high value. Indeed, if anything, the value of labor will go up, not down—for example, dense cities are engines of economic growth!
- Principle (B): “Experience curves”: If the demand for some product goes up, there might be some price increase in the very short term, because the supply curve slopes up. But in the longer term, people will ramp up manufacturing of that product to catch up with the demand. Indeed, if anything, the cost per unit will go down, not up, because of economies of [...]
The original text contained 1 footnote which was omitted from this narration.

The original text contained 1 image which was described by AI.

---

First published:
January 13th, 2025

Source:
https://www.lesswrong.com/posts/TkWCKzWjcbfGzdNK5/applying-traditional-economic-thinking-to-agi-a-trilemma

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
14 January 2025, 4:30 pm
57 minutes 25 seconds

“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov

All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine.

Machinery is Power is Evil

Writing to his son Michael in the RAF:

[here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind, it attempts to actualize desire, and so to create power in this World; and that cannot really be done with any real satisfaction. Labour-saving machinery only creates endless and worse labour. And in addition to this fundamental disability of a creature, is added the Fall, which makes our devices not only fail of their desire but turn to new and horrible evil. So we come inevitably from Daedalus and Icarus to the Giant Bomber. It is not an advance in wisdom! This terrible truth, glimpsed long ago by Sam [...]

---

Outline:

(00:17) Machinery is Power is Evil

(03:45) On Atomic Bombs

(04:17) On Magic and Machines

(07:06) Speed as the root of evil

(08:11) Altruism as the root of evil

(09:13) Sauron as metaphor for the evil of reformers and science

(10:32) On Language

(12:04) The straightjacket of Modern English

(15:56) Argent and Silver

(16:32) A Fallen World

(21:35) All stories are about the Fall

(22:08) On his mother

(22:50) Love, Marriage, and Sexuality

(24:42) Courtly Love

(27:00) Womens exceptional attunement

(28:27) Men are polygamous; Christian marriage is self-denial

(31:19) Sex as source of disorder

(32:02) Honesty is best

(33:02) On the Second World War

(33:06) On Hitler

(34:04) On aerial bombardment

(34:46) On British communist-sympathizers, and the U.S.A as Saruman

(35:52) Why he wrote the Legendarium

(35:56) To express his feelings about the first World War

(36:39) Because nobody else was writing the kinds of stories he wanted to read

(38:23) To give England an epic of its own

(39:51) To share a feeling of eucatastrophe

(41:46) Against IQ tests

(42:50) On Religion

(43:30) Two interpretations of Tom Bombadil

(43:35) Bombadil as Pacifist

(45:13) Bombadil as Scientist

(46:02) On Hobbies

(46:27) On Journeys

(48:02) On Torture

(48:59) Against Communism

(50:36) Against America

(51:11) Against Democracy

(51:35) On Money, Art, and Duty

(54:03) On Death

(55:02) On Childrens Literature

(55:55) In Reluctant Support of Universities

(56:46) Against being Photographed

---

First published:
November 25th, 2024

Source:
https://www.lesswrong.com/posts/jJ2p3E2qkXGRBbvnp/passages-i-highlighted-in-the-letters-of-j-r-r-tolkien

---

Narrated by TYPE III AUDIO.

14 January 2025, 3:58 am
14 minutes 50 seconds

“Parkinson’s Law and the Ideology of Statistics” by Benquo

The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it:

The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due to factors that would have been revealed by a careful historical and ethnographic investigation of the area in question. Therefore, we should spend more resources engaging in such investigations in order to make better-informed World Bank style resource allocation decisions. So goes the story.

It seems to me that the World Bank recommendations were not the natural ones an honest well-intentioned person would have made with the information at hand. Instead they are heavily biased towards top-down authoritarian schemes, due to a combination of perverse incentives, procedures that separate data-gathering from implementation, and an ideology that [...]

---

Outline:

(01:06) Ideology

(02:58) Problem

(07:59) Diagnosis

(14:00) Recommendation

---

First published:
January 4th, 2025

Source:
https://www.lesswrong.com/posts/4CmYSPc4HfRfWxCLe/parkinson-s-law-and-the-ideology-of-statistics-1

---

Narrated by TYPE III AUDIO.

13 January 2025, 6:30 am
25 minutes 11 seconds

“Capital Ownership Will Not Prevent Human Disempowerment” by beren

Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future elicited.

When discussing the future of AI, I semi-often hear an argument along the lines that in a slow takeoff world, despite AIs automating increasingly more of the economy, humanity will remain in the driving seat because of its ownership of capital. This world posits one where humanity effectively becomes a rentier class living well off the vast economic productivity of the AI economy where despite contributing little to no value, humanity can extract most/all of the surplus value created due to its ownership of capital alone.

This is a possibility, and indeed is perhaps closest to what a ‘positive singularity’ looks like from a purely human perspective. However, I don’t believe that this will happen by default in a competitive AI [...]

The original text contained 3 footnotes which were omitted from this narration.

---

First published:
January 5th, 2025

Source:
https://www.lesswrong.com/posts/bmmFLoBAWGnuhnqq5/capital-ownership-will-not-prevent-human-disempowerment

---

Narrated by TYPE III AUDIO.

11 January 2025, 9:45 pm
15 minutes 56 seconds

“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq

TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activation spaces in isolation: It seems likely to find features of the activations - features that help explain the statistical structure of activation spaces, rather than features of the model - the features the model's own computations make use of.

Written at Apollo Research

Introduction

Claim: Activation space interpretability is likely to give us features of the activations, not features of the model, and this is a problem.

Let's walk through this claim.

What do we mean by activation space interpretability? Interpretability work that attempts to understand neural networks by explaining the inputs and outputs of their layers in isolation. In this post, we focus in particular on the problem of decomposing activations, via techniques such as sparse autoencoders (SAEs), PCA, or just by looking at individual neurons. This [...]

---

Outline:

(00:33) Introduction

(02:40) Examples illustrating the general problem

(12:29) The general problem

(13:26) What can we do about this?

The original text contained 11 footnotes which were omitted from this narration.

---

First published:
January 8th, 2025

Source:
https://www.lesswrong.com/posts/gYfpPbww3wQRaxAFD/activation-space-interpretability-may-be-doomed

---

Narrated by TYPE III AUDIO.

10 January 2025, 2:58 pm
8 minutes 40 seconds

“What o3 Becomes by 2028” by Vladimir_Nesov

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces, and 80% on ARC-AGI. These systems will be built in 2026-2027 and enable pretraining models for 5e28 FLOPs, while o3 itself is plausibly based on an LLM pretrained only for 8e25-4e26 FLOPs. The natural text data wall won't seriously interfere until 6e27 FLOPs, and might be possible to push until 5e28 FLOPs. Scaling of pretraining won't end just yet.

Reign of GPT-4

Since the release of GPT-4 in March 2023, subjectively there was no qualitative change in frontier capabilities. In 2024, everyone in the running merely caught up. To the extent this is true, the reason might be that the original GPT-4 was probably a 2e25 FLOPs MoE model trained on 20K A100. And if you don't already have a cluster this big, and experience [...]

---

Outline:

(00:52) Reign of GPT-4

(02:08) Engines of Scaling

(04:06) Two More Turns of the Crank

(06:41) Peak Data

The original text contained 3 footnotes which were omitted from this narration.

---

First published:
December 22nd, 2024

Source:
https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/what-o3-becomes-by-2028

---

Narrated by TYPE III AUDIO.

9 January 2025, 5:45 pm
25 minutes 26 seconds

“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.)

No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opinions. Have we entered into what will in hindsight be not even the early stages, but actually the middle stage, of the mad tumbling rush into singularity? Or are we just witnessing the exciting early period of a new technology, full of discovery and opportunity, akin to the boom years of the personal computer and the web?

AI is approaching elite skill at programming, possibly barreling into superhuman status at advanced mathematics, and only picking up speed. Or so the framing goes. And yet, most of the reasons for skepticism are still present. We still evaluate AI only on neatly encapsulated, objective tasks [...]

---

Outline:

(02:49) The Slow Scenario

(09:13) The Fast Scenario

(17:24) Identifying The Requirements for a Short Timeline

(22:53) How To Recognize The Express Train to AGI

The original text contained 14 footnotes which were omitted from this narration.

The original text contained 3 images which were described by AI.

---

First published:
January 6th, 2025

Source:
https://www.lesswrong.com/posts/auGYErf5QqiTihTsJ/what-indicators-should-we-watch-to-disambiguate-agi

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podc

9 January 2025, 12:58 pm
1 hour 18 minutes

“How will we update about scheming?” by ryan_greenblatt

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, I (and co-authors) released "Alignment Faking in Large Language Models", which provides empirical evidence for some components of the scheming threat model.

One question that's really important is how likely scheming is. But it's also really important to know how much we expect this uncertainty to be resolved by various key points in the future. I think it's about 25% likely that the first AIs capable of obsoleting top human experts[1] are scheming. It's really important for me to know whether I expect to make basically no updates to my P(scheming)[2] between here and the advent of potentially dangerously scheming models, or whether I expect to be basically totally confident one way or another by that point (in the same way that, though I might [...]

---

Outline:

(03:20) My main qualitative takeaways

(04:56) Its reasonably likely (55%), conditional on scheming being a big problem, that we will get smoking guns.

(05:38) Its reasonably likely (45%), conditional on scheming being a big problem, that we wont get smoking guns prior to very powerful AI.

(15:59) My P(scheming) is strongly affected by future directions in model architecture and how the models are trained

(16:33) The model

(22:38) Properties of the AI system and training process

(23:02) Opaque goal-directed reasoning ability

(29:24) Architectural opaque recurrence and depth

(34:14) Where do capabilities come from?

(39:42) Overall distribution from just properties of the AI system and training

(41:20) Direct observations

(41:43) Baseline negative updates

(44:35) Model organisms

(48:21) Catching various types of problematic behavior

(51:22) Other observations and countermeasures

(52:02) Training processes with varying (apparent) situational awareness

(54:05) Training AIs to seem highly corrigible and (mostly) myopic

(55:46) Reward hacking

(57:28) P(scheming) under various scenarios (putting aside mitigations)

(01:05:19) An optimistic and a pessimistic scenario for properties

(01:10:26) Conclusion

(01:11:58) Appendix: Caveats and definitions

(01:14:49) Appendix: Capabilities from intelligent learning algorithms

The original text contained 15 footnotes which were omitted from this narration.

---

First published:
January 6th, 2025

Source:
https://www.lesswrong.com/posts/aEguDPoCzt3287CCD/how-will-we-update-about-scheming

---

Narrated by TYPE III AUDIO.

8 January 2025, 4:58 pm
20 minutes 22 seconds

“OpenAI #10: Reflections” by Zvi
This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing.

Table of Contents
1. The Battle of the Board.
2. Altman Lashes Out.
3. Inconsistently Candid.
4. On Various People Leaving OpenAI.
5. The Pitch.
6. Great Expectations.
7. Accusations of Fake News.
8. OpenAI's Vision Would Pose an Existential Risk To Humanity.
The Battle of the Board

Here is what he says about the Battle of the Board in Reflections:

Sam Altman: A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was [...]

---

Outline:

(00:25) The Battle of the Board

(05:12) Altman Lashes Out

(07:48) Inconsistently Candid

(09:35) On Various People Leaving OpenAI

(10:56) The Pitch

(12:07) Great Expectations

(12:56) Accusations of Fake News

(15:02) OpenAI's Vision Would Pose an Existential Risk To Humanity

---

First published:
January 7th, 2025

Source:
https://www.lesswrong.com/posts/XAKYawaW9xkb3YCbF/openai-10-reflections

---

Narrated by TYPE III AUDIO.
8 January 2025, 7:15 am
2 minutes 15 seconds

“Maximizing Communication, not Traffic” by jefftk
As someone who writes for fun, I don't need to get people onto my site:
- If I write a post and some people are able to get the core ideajust from the title or a tweet-length summary, great!
- I can include the full contents of my posts in my RSS feed andon FB, because so what if people read the whole post there and neverclick though to my site?
It would be different if I funded my writing through ads (maximizetime on site to maximize impressions) or subscriptions (get the chanceto pitch, probably want to tease a paywall).

Sometimes I notice myself accidentallycopying what makes sense for other writers. For example, becauseI can't put full-length posts on Bluesky or Mastodon I write shortintros and [...]

---

First published:
January 5th, 2025

Source:
https://www.lesswrong.com/posts/ZqcC6Znyg8YrmKPa4/maximizing-communication-not-traffic

---

Narrated by TYPE III AUDIO.
7 January 2025, 4:58 am
More Episodes? Get the App

About LessWrong Curated Podcast

Links

Listeners Also Subscribed To

Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.