Today, we're joined by Chip Huyen, independent researcher and writer to discuss her new book, âAI Engineering.â We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines AI agents, their current limitations and capabilities, and the critical role of effective planning and tool utilization in these systems. Additionally, Chip shares insights on the importance of evaluation in AI systems, highlighting the need for systematic processes, human oversight, and rigorous metrics and benchmarks. Finally, we touch on the impact of open-source models, the potential of synthetic data, and Chipâs predictions for the year ahead.
The complete show notes for this episode can be found at https://twimlai.com/go/715.
Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the companyâs approach and insights on Generative AI and platform best practices. In this episode, we dig into the companyâs platform-centric approach to AI, and how theyâve been evolving their existing MLOps and data platforms to support the new challenges and opportunities presented by generative AI workloads and AI agents. We explore their use of cloud-based infrastructureâin this case on AWSâto provide a foundation upon which they then layer open-source and proprietary services and tools. We cover their use of Llama 3 and open-weight models, their approach to fine-tuning, their observability tooling for Gen AI applications, their use of inference optimization techniques like quantization, and more. Finally, Abhijit shares the future of agentic workflows in the enterprise, the application of OpenAI o1-style reasoning in models, and the new roles and skillsets required in the evolving GenAI landscape.
The complete show notes for this episode can be found at https://twimlai.com/go/714.
Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his âbig brain, little brain, tool brainâ approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products theyâre working on, and explores the future directions in the agentic era.
The complete show notes for this episode can be found at https://twimlai.com/go/713.
Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore recent advancements in the field of automated reasoning, as well as some of the ways it is applied broadly, as well as across AWS, where it is used to enhance security, cryptography, virtualization, and more. We discuss how the new feature helps users to generate, refine, validate, and formalize policies, and how those policies can be deployed alongside LLM applications to ensure the accuracy of generated text. Finally, Byron also shares the benchmarks theyâve applied, the use of techniques like âconstrained codingâ and âbacktracking,â and the future co-evolution of automated reasoning and generative AI.
The complete show notes for this episode can be found at https://twimlai.com/go/712.
Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this yearâs NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that ties conformal prediction to information theory, yielding a novel approach to incorporating uncertainty quantification directly into machine learning models. Finally, we review several papers enabling the efficient use of LoRA (Low-Rank Adaptation) on mobile devices (Hollowed Net, ShiRA, FouRA). Arash also previews the demos Qualcomm will be hosting at NeurIPS, including new video editing diffusion and 3D content generation models running on-device, Qualcomm's AI Hub, and more!
The complete show notes for this episode can be found at https://twimlai.com/go/711.
Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniperâs customers, including diagnosing cable degradation, proactive monitoring for coverage gaps, and real-time fault detection. We also dig into the complexities of integrating data science into networking, the trade-offs between traditional methods and ML-based solutions, the role of feature engineering and data in networking, the applicability of large language models, and Juniperâs approach to using smaller, specialized ML models to optimize speed, latency, and cost. Finally, Shirley shares some future directions for Juniper Mist such as proactive network testing and end-user self-service.
The complete show notes for this episode can be found at https://twimlai.com/go/710.
Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly encounters, and the steps he takes to diagnose these issues. We also cover the significance of building out robust test datasets, data-driven experimentation, evaluation tools, and metrics for different use cases. We also touched on fine-tuning strategies for RAG systems, the effectiveness of different chunking strategies, the use of collaboration tools like Braintrust, and how future models will change the game. Lastly, we cover Jasonâs interest in teaching others how to capitalize on their own AI experience via his AI consulting course.
The complete show notes for this episode can be found at https://twimlai.com/go/709.
Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flipâs incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT dataâmetrics, events, logs, and tracesâwith code to efficiently identify root failure causes in complex software systems. We discuss the challenges of integrating time-series data with LLMs and their multi-decoder architecture designed for this purpose. Sunil describes their system's agent-based design, focusing on clear roles and boundaries to ensure reliability. We examine their "chaos gym," a reinforcement learning environment used for testing and improving the system's robustness. Finally, we discuss the practical considerations of deploying such a system at scale in diverse environments and much more.
The complete show notes for this episode can be found at https://twimlai.com/go/708.
Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in building AI voice agents, and dig into the benefits and limitations of text-based approaches to voice interactions. We dig into whatâs required to deliver real-time voice interactions and the promise of closed-loop, continuously improving, federated learning agents. Finally, Scott shares practical applications of AI voice agents at Deepgram and provides an overview of their newly released agent toolkit.
The complete show notes for this episode can be found at https://twimlai.com/go/707.
Today, we're joined by Tim RocktĂ€schel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, âArtificial Intelligence: 10 Things You Should Know.â We dig into the attainability of artificial superintelligence and the path to achieving generalized superhuman capabilities across multiple domains. We discuss the importance of open-endedness in developing autonomous and self-improving systems, as well as the role of evolutionary approaches and algorithms. Additionally, we cover Timâs recent research projects such as âPromptbreeder,â âDebating with More Persuasive LLMs Leads to More Truthful Answers,â and more.
The complete show notes for this episode can be found at https://twimlai.com/go/706.
Today, we're joined by Lucas GarcĂa, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the âWâ adaptation thatâs been proposed for incorporating ML models. Next, we discuss the complexities of applying deep learning neural networks in safety-critical applications using the aviation industry as an example, and talk through the importance of factors such as data quality, model stability, robustness, interpretability, and accuracy. We also explore formal verification methods, abstract transformer layers, transformer-based architectures, and the application of various software testing techniques. Lucas also introduces the field of constrained deep learning and convex neural networks and its benefits and trade-offs.
The complete show notes for this episode can be found at https://twimlai.com/go/705.
Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.