Vanishing Gradients

Get the App

Vanishing Gradients

Hugo Bowne-Anderson

a data podcast with hugo bowne-anderson

51 minutes 53 seconds

The Rise of Agentic Search

We’re really moving from a world where humans are authoring search queries and humans are executing those queries and humans are digesting the results to a world where AI is doing that for us.
Jeff Huber, CEO and co-founder of Chroma, joins Hugo to talk about how agentic search and retrieval are changing the very nature of search and software for builders and users alike.
We Discuss:
* “Context engineering”, the strategic design and engineering of what context gets fed to the LLM (data, tools, memory, and more), which is now essential for building reliable, agentic AI systems;
* Why simply stuffing large context windows is no longer feasible due to “context rot” as AI applications become more goal-oriented and capable of multi-step tasks
* A framework for precisely curating and providing only the most relevant, high-precision information to ensure accurate and dependable AI systems;
* The “agent harness”, the collection of tools and capabilities an agent can access, and how to construct these advanced systems;
* Emerging best practices for builders, including hybrid search as a robust default, creating “golden datasets” for evaluation, and leveraging sub-agents to break down complex tasks
* The major unsolved challenge of agent evaluation, emphasizing a shift towards iterative, data-centric approaches.
You can also find the full episode on Spotify, Apple Podcasts, and YouTube.
You can also interact directly with the transcript here in NotebookLM: If you do so, let us know anything you find in the comments!
👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈
Oh! One more thing: we’ve just announced a Vanishing Gradients livestream for January 21 that you may dig:
* A Builder’s Guide to Agentic Search & Retrieval with Doug Turnbull and John Berryman (register to join live or get the recording afterwards.
Show notes
* Jeff Huber on Twitter
* Jeff Huber on LinkedIn
* Try Chroma!
* Context Rot: How Increasing Input Tokens Impacts LLM Performance by The Chroma Team
* AI Agent Harness, 3 Principles for Context Engineering, and the Bitter Lesson Revisited
* From Context Engineering to AI Agent Harnesses: The New Software Discipline
* Generative Benchmarking by The Chroma Team
* Effective context engineering for AI agents by The Anthropic Team
* Making Sense of Millions of Conversations for AI Agents by Ivan Leo (Manus) and Hugo
* How we built our multi-agent research system by The Anthropic Team
* Upcoming Events on Luma
* Watch the podcast video on YouTube
👉 Want to learn more about Building AI-Powered Software? Check out our Building AI Applications course. It’s a live cohort with hands on exercises and office hours. Our final cohort is in Q1, 2206. Here is a 35% discount code for readers. 👈
https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgch

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

19 December 2025, 4:36 am
1 hour 2 minutes

Episode 64: Data Science Meets Agentic AI with Michael Kennedy (Talk Python)

We have been sold a story of complexity. Michael Kennedy (Talk Python) argues we can escape this by relentlessly focusing on the problem at hand, reducing costs by orders of magnitude in software, data, and AI.
In this episode, Michael joins Hugo to dig into the practical side of running Python systems at scale. They connect these ideas to the data science workflow, exploring which software engineering practices allow AI teams to ship faster and with more confidence. They also detail how to deploy systems without unnecessary complexity and how Agentic AI is fundamentally reshaping development workflows.
We talk through:
- Escaping complexity hell to reduce costs and gain autonomy
- The specific software practices, like the "Docker Barrier", that matter most for data scientists
- How to replace complex cloud services with a simple, robust $30/month stack
- The shift from writing code to "systems thinking" in the age of Agentic AI
- How to manage the people-pleasing psychology of AI agents to prevent broken code
- Why struggle is still essential for learning, even when AI can do the work for you
LINKS
Talk Python In Production, the Book! (https://talkpython.fm/books/python-in-production)
Just Enough Python for Data Scientists Course (https://training.talkpython.fm/courses/just-enough-python-for-data-scientists)
Agentic AI Programming for Python Course (https://training.talkpython.fm/courses/agentic-ai-programming-for-python)
Talk Python To Me (https://talkpython.fm/) and a recent episode with Hugo as guest: Building Data Science with Foundation LLM Models (https://talkpython.fm/episodes/show/526/building-data-science-with-foundation-llm-models)
Python Bytes podcast (https://pythonbytes.fm/)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtube.com/live/jfSRxxO3aRo?feature=share)
Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (35% off for listeners) (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

3 December 2025, 4:00 am
1 hour 13 seconds

Episode 63: Why Gemini 3 Will Change How You Build AI Agents with Ravin Kumar (Google DeepMind)

Gemini 3 is a few days old and the massive leap in performance and model reasoning has big implications for builders: as models begin to self-heal, builders are literally tearing out the functionality they built just months ago... ripping out the defensive coding and reshipping their agent harnesses entirely.
Ravin Kumar (Google DeepMind) joins Hugo to breaks down exactly why the rapid evolution of models like Gemini 3 is changing how we build software. They detail the shift from simple tool calling to building reliable "Agent Harnesses", explore the architectural tradeoffs between deterministic workflows and high-agency systems, the nuance of preventing context rot in massive windows, and why proper evaluation infrastructure is the only way to manage the chaos of autonomous loops.
They talk through:
- The implications of models that can "self-heal" and fix their own code
- The two cultures of agents: LLM workflows with a few tools versus when you should unleash high-agency, autonomous systems.
- Inside NotebookLM: moving from prototypes to viral production features like Audio Overviews
- Why Needle in a Haystack benchmarks often fail to predict real-world performance
- How to build agent harnesses that turn model capabilities into product velocity
- The shift from measuring latency to managing time-to-compute for reasoning tasks
LINKS
From Context Engineering to AI Agent Harnesses: The New Software Discipline, a podcast Hugo did with Lance Martin, LangChain (https://high-signal.delphina.ai/episode/context-engineering-to-ai-agent-harnesses-the-new-software-discipline)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (https://research.trychroma.com/context-rot)
Effective context engineering for AI agents by Anthropic (https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/CloimQsQuJM)
Join the final cohort of our Building AI Applications course starting Jan 12, 2026 (https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav): https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgrav

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

22 November 2025, 7:30 am
59 minutes 4 seconds

Episode 62: Practical AI at Work: How Execs and Developers Can Actually Use LLMs

Many leaders are trapped between chasing ambitious, ill-defined AI projects and the paralysis of not knowing where to start. Dr. Randall Olson argues that the real opportunity isn't in moonshots, but in the "trillions of dollars of business value" available right now. As co-founder of Wyrd Studios, he bridges the gap between data science, AI engineering, and executive strategy to deliver a practical framework for execution.
In this episode, Randy and Hugo lay out how to find and solve what might be considered "boring but valuable" problems, like an EdTech company automating 20% of its support tickets with a simple retrieval bot instead of a complex AI tutor. They discuss how to move incrementally along the "agentic spectrum" and why treating AI evaluation with the same rigor as software engineering is non-negotiable for building a disciplined, high-impact AI strategy.
They talk through:
How a non-technical leader can prototype a complex insurance claim classifier using just photos and a ChatGPT subscription.
The agentic spectrum: Why you should start by automating meeting summaries before attempting to build fully autonomous agents.
The practical first step for any executive: Building a personal knowledge base with meeting transcripts and strategy docs to get tailored AI advice.
Why treating AI evaluation with the same rigor as unit testing is essential for shipping reliable products.
The organizational shift required to unlock long-term AI gains, even if it means a short-term productivity dip.
LINKS
Randy on LinkedIn (https://www.zenml.io/llmops-database)
Wyrd Studios (https://thewyrdstudios.com/)
Stop Building AI Agents (https://www.decodingai.com/p/stop-building-ai-agents)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)
🎓 Learn more:
In Hugo's course: Building AI Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20
Next cohort starts November 3: come build with us!

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

31 October 2025, 7:00 am
28 minutes 4 seconds

Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production

Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.
Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.
We talk through:
- Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos
- The essential MLOps hygiene (tracing and continuous evals) that most teams skip
- The optimal (and very low) limit for the number of tools an agent can reliably use
- How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains
- The principle of using simple Python/RegEx before resorting to costly LLM judges
LINKS
The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K! (https://www.zenml.io/llmops-database)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/-YQjKH3wRvc)
🎓 Learn more:
-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20) — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20
Next cohort starts November 3: come build with us!

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

16 October 2025, 3:00 am
1 hour 13 minutes

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain

Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.
Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.
We talk through:
The 10(+1) critical mistakes that cause teams to waste time on evals
Why "hallucination scores" are a waste of time (and what to measure instead)
The manual review process that finds major issues in hours, not weeks
A step-by-step method for building LLM judges you can actually trust
How to use domain experts without getting stuck in endless review committees
Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents
If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.
LINKS
Hamel's website and blog (https://hamel.dev/)
Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise (https://vanishinggradients.fireside.fm/51)
Hamel Husain on Lenny's pocast, which includes a live demo of error analysis (https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill)
The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era (https://vanishinggradients.fireside.fm/9)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtube.com/live/QEk-XwrkqhI?feature=share)
Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

30 September 2025, 7:30 am
47 minutes 37 seconds

Episode 59: Patterns and Anti-Patterns For Building with AI

John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling.
From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle.
We talk through:
- Why chasing perfect accuracy is a dead end
- How to use agents without losing control
- Context engineering: fitting the right information in the window
- Starting simple instead of over-orchestrating
- Separating retrieval from generation in RAG
- Splitting complex extractions into smaller checks
- Knowing when frameworks help — and when they slow you down
A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.
LINKS:
Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo (https://maven.com/p/4485aa/context-engineering-for-ai-agents)
The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo (https://maven.com/p/a8195d/the-hidden-simplicity-of-gen-ai-systems)
Roaming RAG – RAG without the Vector Database, by John (https://arcturus-labs.com/blog/2024/11/21/roaming-rag--rag-without-the-vector-database/)
Cut the Chit-Chat with Artifacts, by John (https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/)
Prompt Engineering for LLMs by John and Albert Ziegler (https://amzn.to/4gChsFf)
Relevant Search by John and Doug Turnbull (https://amzn.to/3TXmDHk)
Arcturus Labs (https://arcturus-labs.com/)
Watch the podcast on YouTube (https://youtu.be/mKTQGKIUq8M)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
🎓 Learn more:
Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

23 September 2025, 11:30 pm
1 hour 45 seconds

Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)

While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy.
Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable.
We talk through:
Using LLMs as “synthetic consumers” to simulate surveys and test product ideas
How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making
Building closed-loop systems where AI generates and critiques ideas
Guardrails for multi-agent workflows in marketing mix modeling
Where generative AI breaks (and how to detect failure modes)
The balance between useful models and “correct” models
If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes.
LINKS:
The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights (https://www.pymc-labs.com/blog-posts/the-ai-mmm-agent)
AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs) (https://youtube.com/live/2Auc57lxgeU)
The Podcast livestream on YouTube (https://youtube.com/live/so4AzEbgSjw?feature=share)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

9 September 2025, 5:45 pm
41 minutes 28 seconds

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.
Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.
We talk through:
- Treating LLM workflows as ETL pipelines for unstructured text
- Error analysis: why you need humans reviewing the first 50–100 traces
- Guardrails like retries, validators, and “gleaning”
- How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs
- Cheap vs. expensive models: when to swap for savings
- Where agents fit in (and where they don’t)
If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.
LINKS
Shreya's website (https://www.sh-reya.com/)
DocETL, A system for LLM-powered data processing (https://www.docetl.org/)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk)
Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

29 August 2025, 11:00 am
45 minutes 41 seconds

Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters

While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning.
We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.”
We talk through:
- Where 270M fits into the Gemma 3 lineup — and why it exists
- On-device use cases where latency, privacy, and efficiency matter
- How smaller models open up rapid, targeted fine-tuning
- Running multiple models in parallel without heavyweight hardware
- Why “small” models might drive the next big wave of AI adoption
If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.
LINKS
Introducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog) (https://developers.googleblog.com/en/introducing-gemma-3-270m/)
Full Model Fine-Tune Guide using Hugging Face Transformers (https://ai.google.dev/gemma/docs/core/huggingface_text_full_finetune)
The Gemma 270M model on HuggingFace (https://huggingface.co/google/gemma-3-270m)
The Gemma 270M model on Ollama (https://ollama.com/library/gemma3:270m)
Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (https://www.youtube.com/live/-IWstEStqok) (Code here (https://github.com/canyon289/ai_agent_basics))
From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo (https://www.youtube.com/live/FNlM7lSt8Uk)(Code here (https://github.com/canyon289/ai_image_agent))
Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo (https://lu.ma/ezgny3dl)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Watch the podcast video on YouTube (https://youtu.be/VZDw6C2A_8E)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

14 August 2025, 4:00 pm
38 minutes 9 seconds

Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy

Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades.
You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)
We talk through:
• The three personas — and the blind spots each has when shipping AI systems
• Why “perfect” tests can be a sign you’re testing the wrong thing
• Development vs. production observability loops — and why you need both
• How curiosity about failing data separates good builders from great ones
• Ways large organizations can create space for experimentation without losing delivery focus
If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.
LINKS
Eric' Website (https://ericmjl.github.io/)
More about the workshops Eric and Hugo taught at SciPy (https://hugobowne.substack.com/p/stress-testing-llms-evaluation-frameworks)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit hugobowne.substack.com

12 August 2025, 3:00 pm
More Episodes? Get the App