- 1 hour 12 minutesState of Enterprise AI 2026: Aaron Levie on Tokenmaxxing, Rise of Headless, and AI-Proofing Your Job
Aaron Levie, co-founder and CEO of Box, returns to the MAD Podcast with the clearest read in tech on what AI is actually doing inside the world's largest enterprises right now - not the hype version, the real one. After hundreds of Fortune 500 CIO conversations this year, Aaron explains why we're still in "day one" of the agent era, why one badly written agent run can now cost $1,000 in compute, and why progress at the AI labs is paradoxically slowing enterprise deployment. We get into the token cost shock now reshaping IT budgets, why coding agents have reached escape velocity while the rest of knowledge work hasn't, the rise of headless software and what replaces per-seat pricing, the emergence of the forward-deployed engineer as the hottest job in tech, why Aaron thinks the AI doomers are wrong about jobs, and where startups can still win as the labs move up the stack.
(00:00) Intro
(01:18) Silicon Valley engineering vs. everyone else
(05:35) Are enterprise CIOs actually bullish on AI?
(08:51) Tokenmaxxing & why your AI bill is about to explode
(11:34) The myth of falling token costs and AI spend escaping IT budgets
(17:37) The $5B startup hiding in AI compute
(18:14) The mosaic of models inside every enterprise
(21:28) Why coding works and the rest of knowledge work doesn't
(25:53) The Bob and Sally problem: access control breaks agents
(30:31) Will enterprise AI really take 10 years to roll out?
(32:24) The capability overhang: why faster models slow diffusion
(34:23) Data is the bottleneck (it always was)
(39:02) The rise of internal forward-deployed engineers
(41:23) Why the AI doomers are wrong about jobs
(43:43) Headless software is inevitable
(46:14) What replaces per-seat pricing
(47:37) How Box itself is going headless
(49:42) How the org chart actually evolves
(1:00:33) Future-proofing yourself as an enterprise employee
(1:06:40) Are we all just going to work for OpenAI and Anthropic?
(1:07:11) Where startups can still win as the labs move up
28 May 2026, 11:30 am - 1 hour 13 minutesOpenAI's Yann Dubois: Why AI Progress Suddenly Feels Real
AI suddenly feels like it has crossed a threshold, and Yann Dubois, co-lead of the Post-training Frontiers team at OpenAI, joins Matt Turck to explain why. Yann’s team has led the post-training behind the company's reasoning models, including the recent GPT-5.5 release. In this conversation, we go inside the shift from raw model capability to useful, reliable systems: what changed with GPT-5.5, why reinforcement learning is moving beyond math and coding competitions into messy real-world work, how reasoning models like GPT-5.5 actually work, the difference between GPT-5.5 Thinking and GPT-5.5 Pro, why post-training has become one of the most important frontiers in AI, and why evals, model-as-judge, hallucinations, agentic workflows, GDPval, and continual learning are now central to the next phase of frontier models. Yann also shares why continual learning remains one of AI's biggest unsolved problems three years after ChatGPT, and where startups still have massive room to build as frontier models race ahead.
(00:00) - Cold open
(00:34) - Intro
(01:30) - Why recent AI progress feels like a step function
(04:13) - Model reliability & the rollercoaster of shipping 5.5
(07:33) - How OpenAI structures vertical and horizontal teams
(09:49) - Improving model efficiency and test-time compute
(12:32) - Yann Dubois' journey from Switzerland to OpenAI
(15:37) - Reasoning in 2026: Real-world utility vs verifiable rewards
(18:34) - GPT-5.5 Thinking vs Pro: Scaling test-time compute
(20:09) - How reasoning models become more efficient
(23:23) - Pre-training scaling and overcoming the data wall
(27:03) - Multimodal data, synthetic data, and embodied AI
(31:05) - Demystifying mid-training and post-training
(37:21) - Does RL create new capabilities in AI?
(38:53) - The challenges and frontier of scaling RL
(43:09) - Is building AI models a craft or a strict science?
(48:21) - How AI models generalize across different domains
(54:18) - How reinforcement learning cures AI hallucinations
(56:04) - Negative generalization and conflicting instructions
(58:05) - Can RL scale to law, medicine, and the broader economy?
(1:00:19) - The evaluation bottleneck and Model as a Judge
(1:04:21) - Continuous AI progress & continual learning
(1:08:49) - Will foundation models eat the agent harness?
(1:11:23) - Why startups should focus on the last mile of AI
21 May 2026, 11:30 am - 1 hour 5 minutesWhy AWS and Azure Cannot Run Autonomous AI – Ivan Burazin (Daytona)
If AI agents are the new digital knowledge workers, where exactly do they do their work? In this episode of the MAD Podcast, Ivan Burazin joins us to unpack the emerging infrastructure stack for AI agents and explain why every agent needs its own secure, stateful "computer." We explore the technical realities of sandboxes, dive into why legacy, stateless hyperscalers weren't built for these new workloads, and break down the mechanics of microVMs and custom schedulers alongside a contrarian prediction on an impending CPU shortage. Finally, Ivan delivers an absolute masterclass on product-led growth, community building, and go-to-market strategy for technical founders.
(00:40) Intro
(02:13) What is an AI agent sandbox?
(03:17) Security risks of running agents locally
(05:17) Stateful vs. stateless hyperscalers
(07:04) The history of cloud IDEs and the end of localhost
(09:45) Do all AI agents need a sandbox?
(12:26) Sandbox use cases: RL evals & background agents
(14:10) Unpacking the emerging AI Agent Stack
(16:20) The unsolved problem of agent memory and learning
(19:37) Where sandboxes fit in the agent harness
(21:35) OpenAI, Anthropic, and agent SDKs
(23:06) Ivan's founder journey: From CodeAnywhere to Daytona
(26:59) GTM strategies and building developer communities
(33:48) Why customer support is your best GTM strategy
(35:34) Leveraging Twitter during the AI super cycle
(40:50) The technical anatomy of a sandbox
(41:53) Why fast spin-up speeds maximize GPU efficiency
(46:09) Firecracker, QEMU, and isolation primitives
(49:58) Why sandbox snapshots and state forking matter
(51:40) Why Daytona built a custom scheduler from scratch
(55:24) The challenge of long-running stateful sandboxes
(58:10) The build your own sandbox trap
(1:01:03) Why AI agents might trigger a global CPU shortage
(1:02:46) The future of the AI Agent Stack
14 May 2026, 11:30 am - 1 hour 16 minutesOpenAI Board Member Zico Kolter on the Real Risks of Frontier AI
What actually happens before a frontier AI model gets released — and who decides whether it is safe enough? In this episode of The MAD Podcast, Matt Turck sits down with Zico Kolter — OpenAI board member, Head of the Machine Learning Department at Carnegie Mellon, and co-founder of Gray Swan — for a deep conversation on the real risks of frontier AI. They discuss how OpenAI’s safety oversight works before major model releases, why more powerful models do not automatically become safer, how jailbreaks and prompt injection expose real weaknesses in AI systems, why AI agents dramatically expand the attack surface, and where frontier AI is headed next. A clear, practical discussion on OpenAI, AI safety, AI security, AI agents, frontier models, red teaming, reinforcement learning, and the future of AI governance.
(00:00) Intro
(01:32) OpenAI board role and Safety & Security Committee
(03:53) How OpenAI reviews major model releases
(05:33) OpenAI’s preparedness framework explained
(09:46) Are frontier AI models getting safer?
(12:33) Why AI safety does not come from scale
(15:23) The four categories of AI risk
(19:38) Doomerism vs accelerationism in AI
(24:11) The six-month AI pause debate
(26:20) AI safety as a global effort
(28:04) How Zico Kolter got into machine learning
(31:05) OpenAI in the early days
(34:14) Why Carnegie Mellon became an AI powerhouse
(38:43) What Gray Swan does in AI security
(40:44) AI safety vs AI security
(43:15) The GCG jailbreak paper
(49:19) How AI labs responded to jailbreak research
(50:19) State-of-the-art AI defenses
(52:32) State-of-the-art AI attacks
(54:22) Why AI agents expand the attack surface
(58:39) Are AI agents ready for production?
(59:40) Mechanistic interpretability explained
(1:02:31) Will AI be safer in two years?
(1:03:46) Reinforcement learning and self-improving models
(1:08:09) Do post-transformer architectures matter?
(1:09:29) Best research directions in AI now
(1:11:00) Zico Kolter’s Intro to Modern AI course
(1:14:53) Why modern AI is simpler than people think
7 May 2026, 11:30 am - 58 minutesAnthropic’s Felix Rieseberg: Claude Cowork, Mythos, and the SaaS Extinction
Felix Rieseberg leads engineering for Claude Cowork at Anthropic, one of the most important new agentic AI products in the market today. In this episode of The MAD Podcast, Matt Turck sits down with Felix to discuss Anthropic’s newly announced Claude Mythos Preview, why Felix sees it as a genuine step-function change, and what it means when frontier AI starts showing outsized cybersecurity capabilities.
The conversation then goes deep on Claude Cowork: how it emerged from Claude Code, what the famous “10-day” story really means, why Anthropic believes AI needs access to the local computer, and how Cowork actually works under the hood. Felix explains why skills are just text files, why memory is often just text files too, and how Anthropic thinks about building trust in AI agents.
They also explore some of the biggest questions in AI product design and the future of software: why UX may matter as much as the model itself, why execution is becoming dramatically cheaper, what that means for product management and startups, and why Felix believes taste, alignment, and understanding humans may matter more than ever.
(00:00) Intro
(01:53) Claude Mythos Preview and the “step-function change”
(06:16) Why Anthropic is treating Mythos differently
(11:19) The real story behind Claude Cowork’s “10-day” build
(12:42) Why Anthropic realized Claude Code needed a non-technical version
(15:44) What Claude Cowork actually is
(17:03) Under the hood: virtual machines, tools, skills
(18:36) Where Cowork’s memory actually lives
(19:26) How Cowork connects to files, apps, and the internet
(20:45) Why Felix thinks the local computer is under-appreciated
(24:49) Trust: how do you get users comfortable with AI agents?
(28:45) What UX actually means for AI agents
(31:27) Anthropic Cowork's roadmap is only one month long
(34:12) Building 100 prototypes
(35:10) If execution is free, what becomes the bottleneck?
(37:25) Does it come down to taste?
(40:12) The hardest part of building Claude Cowork
(41:43) Advice for founders building AI agents
(44:21) SaaSpocalypse: what’s left for software startups?
(49:30) Where AI agents are going next
(51:20) Regulated industries and enterprise adoption
(54:15) Hot takes: what's underrated, overrated, and what Felix would build today
10 April 2026, 11:30 am - 1 hour 4 minutesAI is Already Building AI | Google DeepMind’s Mostafa Dehghani
Are we truly on the verge of AI automating its own research and development? In this deep-dive episode of the MAD Podcast, Matt Turck sits down with Mostafa Dehghani, a pioneering AI researcher at Google DeepMind whose work on Universal Transformers and Vision Transformers (ViT) helped lay the groundwork for today's frontier models.
Moving past the hype, Mostafa breaks down the actual mechanics of "thinking in loops" and Recursive Self-Improvement (RSI). He explores the critical bottlenecks holding back true AGI—from evaluation limits and formal verification to the brutal math of long-horizon reliability.
Mostafa and Matt also discuss the shift from pre-training to post-training, how Gemini's Nano Banana 2 processes pixels and text simultaneously, and why the "frozen" nature of today's models means Continual Learning is the next massive frontier for enterprise AI and data pipelines.
(00:00) Intro
(01:17) What “loops” in AI actually mean
(05:04) Self-improvement as the next chapter of machine learning
(07:32) Are Karpathy’s autoresearch agents an early form of AI self-improvement?
(08:56) AI building AI: how close are we?
(10:02) The biggest bottlenecks: evals, automation, and long horizons
(12:36) Can formal verification unlock recursive self-improvement?
(14:06) What is model collapse?
(15:33) Generalization vs specialization in AI
(18:04) What is a specialized model today?
(20:57) Could top AI researchers themselves be automated?
(24:02) If AI builds AI, does data matter less than compute?
(26:22) Post-training vs pre-training: where will progress come from?
(28:14) Why pre-training is not dead
(29:45) What is continual learning?
(31:53) How real is continual learning today?
(33:43) Mostafa Dehghani’s background and path into AI
(36:13) The story behind Universal Transformers
(39:56) How Vision Transformers changed AI
(43:47) Gemini, multimodality, and Nano Banana
(47:46) Why multimodality helps build a world model
(52:44) Why image generation is getting faster and more efficient
(54:44) Hot takes
(54:53) What the AI field is getting wrong
(56:17) Why continual learning is underrated
(57:26) Does RAG go away over time?
(58:21) What people are too confident about in AI
(59:56) If he were starting from scratch today
2 April 2026, 11:30 am - 1 hour 1 minuteBenedict Evans: OpenAI’s Moat Problem & the Future of Software
Is OpenAI trapped without a defensible moat? World-renowned independent tech analyst Benedict Evans returns to the MAD Podcast and argues that foundation models have zero network effects, making them closer to commodity infrastructure than the next iOS. We unpack OpenAI’s "mile wide, inch deep" usage problem, why simply having a "better model" does not solve the core UX challenge, and whether the hyperscalers' massive CapEx spending is a sustainable strategy or a fast track to financial gravity.
We also explore the reality behind the recent "SaaSpocalypse", the structural shift from traditional enterprise systems to "improvised" and "ephemeral" software, and where the actual white space lies for founders and investors navigating the artificial intelligence hype cycle.
(00:00) Intro
(01:06) OpenAI's Focus Shift
(03:12) ChatGPT usage: a "mile wide, inch deep"
(09:03) Why better models do not solve the real problem
(13:58) Why AI product teams are strategy takers, not strategy setters
(15:38) Do agents help create defensibility?
(20:06) OpenClaw and the "Desktop Linux" moment for AI
(25:52) Why "everyone will build their own software" is completely wrong
(28:09) Improvised software vs. institutionalized software
(29:23) The Jevons Paradox: Why there will be more software, not less
(36:15) Are we heading toward value destruction before value creation?
(38:03) Circular revenue, leverage, and AI bubble dynamics
(38:53) Big Tech's Trillion-Dollar CapEx Crisis & Financial Gravity
(45:23) Why AI job exposure charts can be misleading
(52:15) How Fortune 500 Execs are actually deploying AI today
(56:45) The White Space: What this means for founders and investors
19 March 2026, 11:30 am - 46 minutes 57 secondsEverything Gets Rebuilt: The New AI Agent Stack | Harrison Chase, LangChain
Harrison Chase, co-founder and CEO of LangChain, joins the MAD Podcast to explain why everything in AI is getting rebuilt. As agents evolve from simple prompt-based systems into software that can plan, use tools, write code, manage files, and remember things over time, the real frontier is shifting from the model itself to the stack around the model. In this conversation, we go deep on harnesses, subagents, filesystems, sandboxes, observability, memory, and the new infrastructure required to make AI agents actually work in the real world.
(00:00) Intro - meet Harrison Chase
(01:32) What changed in agents over the last year
(03:57) Why coding agents are ahead
(06:26) Do models commoditize the framework layer?
(08:27) Harnesses, in plain English
(10:11) Why system prompts matter so much
(13:11) The upside — and downside — of subagents
(15:31) Why a useful agent needs a filesystem
(18:13) The core primitives of modern agents
(19:12) Skills: the new primitive
(20:19) What context compaction actually means
(23:02) How memory works in agents
(25:16) One mega-agent or many specialized agents?
(27:46) Has MCP won?
(29:38) Why agents need sandboxes
(32:35) How sandboxes help with security
(33:32) How Harrison Chase started LangChain
(37:24) LangChain vs LangGraph vs Deep Agents
(40:17) Why observability matters more for agents
(41:48) Evals, no-code, and continuous improvement
(44:41) What LangChain is building next
(45:29) Where the real moat in AI lives
12 March 2026, 11:30 am - 1 hour 3 minutesAI That Can Prove It’s Right: Verification as the Missing Layer in AI — Carina Hong
What if AI didn’t just sound right — but could prove it? In this episode of the MAD Podcast, Matt Turck sits down with Carina Hong, a 24-year-old former math olympiad competitor and Rhodes Scholar, and the founder/CEO of Axiom Math, to unpack how AxiomProver earned a perfect 12/12 on the Putnam 2025 and why formal verification (via Lean) may be the missing layer for reliable reasoning. Carina argues we’re entering a “math renaissance” where verified reasoning systems can tackle problems that currently take researchers months — and potentially push beyond math into verified code, hardware, and high-stakes software. They go inside the “generation + verification” loop, what it means to build AI that can be trusted, and what this approach could unlock on the road to superintelligent reasoning.
(00:00) Intro
(01:25) Why the World Needs an AI Mathematician
(02:57) Scoring 12/12 on the World's Hardest Math Test (Putnam)
(04:05) The First AI to Solve Open Research Conjectures
(06:59) Does AI Solve Math in "Alien" Ways? (The Move 37 Effect)
(08:59) "Lean": The Programming Language of Proofs Explained
(10:51) How Axiom's Approach Differs from DeepMind & OpenAI
(16:06) Formal vs. Informal Reasoning (And Auto-Formalization)
(17:37) The AI "Reward Hacking" Problem
(20:18) Building an AI That is 100% Correct, 100% of the Time
(23:23) Beyond Math: Verified Code & Hardware Verification
(25:12) The Brutal Reality of Competitive Math Olympiads
(29:30) From Neuroscience to Stanford Law to Dropout Founder
(33:57) How Axiom Actually Works Under the Hood (The Architecture)
(37:51) The Secret to Generating Perfect Synthetic Data
(40:14) Tokens, Proof Length, and Inference Cost
(42:58) The "Everest" of Mathematics: Scaling Reasoning Trees
(46:32) Can an AI Win a Fields Medal?
(47:25) "Math Renaissance": What Changes if This Works
(55:47) How Mathematicians React to AI (And Why Proof Certificates Matter)
(57:30) Becoming a CEO: Dropping Ego and Building Culture
(1:00:42) Recruiting World-Class Talent & Building the Axiom "Tribe"
26 February 2026, 12:30 pm - 1 hour 22 minutesVoice AI’s Big Moment: Why Everything Is Changing Now (ft. Neil Zeghidour, Gradium AI)
Voice used to be AI’s forgotten modality — awkward, slow, and fragile. Now it’s everywhere. In this reference episode on all things Voice AI, Matt Turck sits down with Neil Zeghidour, a top AI researcher and CEO of Gradium AI (ex-DeepMind/Google, Meta, Kyutai), to cover voice agents, speech-to-speech models, full-duplex conversation, on-device voice, and voice cloning.
We unpack what actually changed under the hood — why voice is finally starting to feel natural, and why it may become the default interface for a new generation of AI assistants and devices.
Neil breaks down today’s dominant “cascaded” voice stack — speech recognition into a text model, then text-to-speech back out — and why it’s popular: it’s modular and easy to customize. But he argues it has two key downsides: chaining models adds latency, and forcing everything through text strips out paralinguistic signals like tone, stress, and emotion. The next wave, he suggests, is combining cascade-like flexibility with the more natural feel of speech-to-speech and full-duplex conversation.
We go deep on full-duplex interaction (ending awkward turn-taking), the hardest unsolved problems (noisy real-world environments and multi-speaker chaos), and the realities of deploying voice at scale — including why models must be compact and when on-device voice is the right approach.
Finally, we tackle voice cloning: where it’s genuinely useful, what it means for deepfakes and privacy, and why watermarking isn’t a silver bullet.
If you care about voice agents, real-time AI, and the next generation of human-computer interaction, this is the episode to bookmark.
Neil Zeghidour
LinkedIn - https://www.linkedin.com/in/neil-zeghidour-a838aaa7/
X/Twitter - https://x.com/neilzegh
Gradium
Website - https://gradium.ai
X/Twitter - https://x.com/GradiumAI
Matt Turck (Managing Director)
Blog - https://mattturck.com
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://twitter.com/mattturck
FirstMark
Website - https://firstmark.com
X/Twitter - https://twitter.com/FirstMarkCap
(00:00) Intro
(01:21) Voice AI’s big moment — and why we’re still early
(03:34) Why voice lagged behind text/image/video
(06:06) The convergence era: transformers for every modality
(07:40) Beyond Her: always-on assistants, wake words, voice-first devices
(11:01) Voice vs text: where voice fits (even for coding)
(12:56) Neil’s origin story: from finance to machine learning
(18:35) Neural codecs (SoundStream): compression as the unlock
(22:30) Kyutai: open research, small elite teams, moving fast
(31:32) Why big labs haven’t “won” voice AI4
(34:01) On-device voice: where it works, why compact models matter
(46:37) The last mile: real-world robustness, pronunciation, uptime
(41:35) Benchmarking voice: why metrics fail, how they actually test
(47:03) Cascades vs speech-to-speech: trade-offs + what’s next
(54:05) Hardest frontier: noisy rooms, factories, multi-speaker chaos
(1:00:50) New languages + dialects: what transfers, what doesn’t
(1:02:54 Hardware & compute: why voice isn’t a 10,000-GPU game
(1:07:27) What data do you need to train voice models?
(1:09:02) Deepfakes + privacy: why watermarking isn’t a solution
(1:12:30) Voice + vision: multimodality, screen awareness, video+audio
(1:14:43) Voice cloning vs voice design: where the market goes
(1:16:32) Paris/Europe AI: talent density, underdog energy, what’s next
19 February 2026, 12:30 pm - 58 minutes 20 secondsMistral AI vs. Silicon Valley: The Rise of Sovereign AI
While Silicon Valley obsesses over AGI, Timothée Lacroix and the team at Mistral AI are quietly building the industrial and sovereign infrastructure of the future. In his first-ever appearance on a US podcast, the Mistral AI Co-Founder & CTO reveals how the company has evolved from an open-source research lab into a full-stack sovereign AI power—backed by ASML, running on their own massive supercomputing clusters, and deployed in nation-state defense clouds to break the dependency on US hyperscalers.
Timothée offers a refreshing, engineer-first perspective on why the current AI hype cycle is misleading. He explains why "Sovereign AI" is not just a geopolitical buzzword but a necessity for any enterprise that wants to own its intelligence rather than rent it. He also provides a contrarian reality check on the industry's obsession with autonomous agents, arguing that "trust" matters more than autonomy and explaining why he prefers building robust "workflows" over unpredictable agents.
We also dive deep into the technical reality of competing with the US giants. Timothée breaks down the architecture of the newly released Mistral 3, the "dense vs. MoE" debate, and the launch of Mistral Compute—their own infrastructure designed to handle the physics of modern AI scaling. This is a conversation about the plumbing, the 18,000-GPU clusters, and the hard engineering required to turn AI from a magic trick into a global industrial asset.
Timothée Lacroix
LinkedIn - https://www.linkedin.com/in/timothee-lacroix-59517977/
Google Scholar - https://scholar.google.com.do/citations?user=tZGS6dIAAAAJ&hl=en&oi=ao
Mistral AI
Website - https://mistral.ai
X/Twitter - https://x.com/MistralAI
Matt Turck (Managing Director)
Blog - https://mattturck.com
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://twitter.com/mattturck
FirstMark
Website - https://firstmark.com
X/Twitter - https://twitter.com/FirstMarkCap
(00:00) — Cold Open
(01:27) — Mistral vs. The World: From Research Lab to Sovereign Power
(03:48) — Inside Mistral Compute: Building an 18,000 GPU Cluster
(08:42) — The Trillion-Dollar Question: Competing Without a Big Tech Parent
(10:37) — The Reality of Enterprise AI: Escaping "POC Purgatory"
(15:06) — Why Mistral Hires Forward Deployed Engineers (FDEs)
(16:57) — The Contrarian Take: Why "Agents" are just "Workflows"
(19:35) — Trust > Autonomy: The Truth About Agent Reliability
(21:26) — The Missing Stack: Governance and Versioning for AI
(26:24) — When Will AI Actually Work? (The 2026 Timeline)
(30:33) — Beyond Chat: The "Banger" Sovereign Use Cases
(35:46) — Mistral 3 Architecture: Mixture of Experts vs. Dense
(43:12) — Synthetic Data & The Post-Training Bottleneck
(45:12) — Reasoning Models: Why "Thinking" is Just Tool Use
(46:22) — Launching DevStral 2 and the Vibe CLI
(50:49) — Engineering Lessons: How to Build Frontier AI Efficiently
(56:08) — Timothée’s View on AGI & The Future of Intelligence
12 February 2026, 12:30 pm - More Episodes? Get the App