This Week in Machine Learning & AI Podcast - Podcast en iVoox

Trending Misterio

Explorar





Descargar app  Subir

Inicio

Explorar

Escoge un idioma

Español

Ver contenido de

España

Escoge un idioma

Español

English

iVoox Podcast & radio

Descargar app gratis

Descargar app

This Week in Machine Learning & AI Podcast

Podcast

Por Sam Charrington

470

209

 



Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more. 2q22j

Descargar en Play Store Escuchar en App

470

209



Comunidad

Google I/O 2025 Special Edition - #733

Episodio en This Week in Machine Learning & AI Podcast

Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and creator of the Pipecat open source project. We cover all the highlights from the event, including enhancements to the Gemini models like thinking budgets and thought summaries, native audio output for expressive voice AI, and the new URL Context tool for research agents. The discussion also digs into the Gemini Live API, covering its architecture, the challenges of building real-time voice applications (such as latency and voice activity detection), and new features like proactive audio and asynchronous function calling. Finally, don’t miss our guests’ wish lists for next year’s I/O! The complete show notes for this episode can be found at https://twimlai.com/go/733.

Internet y tecnología 5 días

 0

 0

 7



26:37

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples of unsafe outputs that can emerge from these systems, different approaches to evaluating these safety risks, and the potential reasons behind this counterintuitive behavior. Shifting to the application of generative AI in financial services, Sebastian outlines a domain-specific safety taxonomy designed for the industry's unique needs. We also explore the critical role of governance and regulatory frameworks in addressing these concerns, the role of prompt engineering in bolstering safety, Bloomberg’s multi-layered mitigation strategies, and vital areas for further work in improving AI safety within specialized domains. The complete show notes for this episode can be found at https://twimlai.com/go/732.

Internet y tecnología 1 semana

 0

 0

 10



57:37

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731.

Internet y tecnología 2 semanas

 0

 0

 9



01:01:53

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (M), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous. The complete show notes for this episode can be found at https://twimlai.com/go/730.

Internet y tecnología 3 semanas

 0

 0

 8



01:07:42

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729.

Internet y tecnología 1 mes

 0

 0

 6



55:48

Generative Benchmarking with Kelly Hong - #728

Episodio en This Week in Machine Learning & AI Podcast

In this episode, Kelly Hong, a researcher at Chroma, s us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual behavior. Kelly shares insights from applying this approach to Weights & Biases' technical bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.

Internet y tecnología 1 mes

 0

 0

 9



53:47

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727

Episodio en This Week in Machine Learning & AI Podcast

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately s Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727.

Internet y tecnología 1 mes

 0

 0

 9



01:33:36

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.

Internet y tecnología 1 mes

 0

 0

 6



51:15

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machine learning, including vision-language models and generative AI techniques to improve perception, planning, and simulation for its self-driving vehicles. The conversation explores the evolution of Waymo’s research stack, their custom “Waymo Foundation Model,” and how they’re incorporating multimodal sensor data like lidar, radar, and camera into advanced AI systems. Drago also discusses how Waymo ensures safety at scale with rigorous validation frameworks, predictive world models, and realistic simulation environments. Finally, we touch on the challenges of generalization across cities, freeway driving, end-to-end learning vs. modular architectures, and the future of AV testing through ML-powered simulation. The complete show notes for this episode can be found at https://twimlai.com/go/725.

Internet y tecnología 2 meses

 0

 0

 5



01:08:37

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. The complete show notes for this episode can be found at https://twimlai.com/go/724.

Internet y tecnología 2 meses

 0

 0

 8



50:02

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” versus “verbalized reasoning”—analogous to non-verbalized and verbalized thinking in humans, and discuss how the model searches in latent space to predict the next token and dynamically allocates more compute based on token difficulty. We also explore how the recurrent depth architecture simplifies LLMs, the parallels to diffusion models, the model's performance on reasoning tasks, the challenges of comparing models with varying compute budgets, and architectural advantages such as zero-shot adaptive exits and natural speculative decoding. The complete show notes for this episode can be found at https://twimlai.com/go/723.

Internet y tecnología 2 meses

 0

 0

 10



58:08

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design. The complete show notes for this episode can be found at https://twimlai.com/go/722.

Internet y tecnología 2 meses

 0

 0

 9



41:41

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions. The complete show notes for this episode can be found at https://twimlai.com/go/721.

Internet y tecnología 3 meses

 0

 0

 7



48:59

Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, for Mixture of Experts (MoE) models, and what’s next for Trainium. The complete show notes for this episode can be found at https://twimlai.com/go/720.

Internet y tecnología 3 meses

 0

 0

 6



01:07:29

π0: A Foundation Model for Robotics with Sergey Levine - #719

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team’s new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research. The complete show notes for this episode can be found at https://twimlai.com/go/719.

Internet y tecnología 3 meses

 0

 0

 10



52:00

AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718

Episodio en This Week in Machine Learning & AI Podcast

Today we’re ed by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and adapting. We also examine the rise of agentic foundation models, the emergence of interface agents like Claude with Computer Use and OpenAI Operator, the shift from simple task chains to complex workflows, and the growing range of enterprise use cases. Victor shares insights into emerging design patterns for autonomous multi-agent systems, including graph and message-driven architectures, the advantages of the “actor model” pattern as implemented in Microsoft’s AutoGen, and guidance on how s should approach the ”build vs. buy” decision when working with AI agent frameworks. We also address the challenges of evaluating end-to-end agent performance, the complexities of benchmarking agentic systems, and the implications of our reliance on LLMs as judges. Finally, we look ahead to the future of AI agents in 2025 and beyond, discuss emerging HCI challenges, their potential for impact on the workforce, and how they are poised to reshape fields like software engineering. The complete show notes for this episode can be found at https://twimlai.com/go/718.

Internet y tecnología 3 meses

 0

 0

 8



01:44:29

Speculative Decoding and Efficient LLM Inference with Chris Lott - #717

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token, tokens per second, and tokens per joule. We then dig into a variety of techniques that can be used to accelerate inference such as KV compression, quantization, pruning, speculative decoding, and leveraging small language models (SLMs). We also discuss future directions for enabling on-device agentic experiences such as parallel generation and software tools like Qualcomm AI Orchestrator. The complete show notes for this episode can be found at https://twimlai.com/go/717.

Internet y tecnología 3 meses

 0

 0

 7



01:16:00

Ensuring Privacy for Any LLM with Patricia Thaine - #716

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Patricia Thaine, co-founder and CEO of Private AI to discuss techniques for ensuring privacy, data minimization, and compliance when using 3rd-party large language models (LLMs) and other AI services. We explore the risks of data leakage from LLMs and embeddings, the complexities of identifying and redacting personal information across various data flows, and the approach Private AI has taken to mitigate these risks. We also dig into the challenges of entity recognition in multimodal systems including OCR files, documents, images, and audio, and the importance of data quality and model accuracy. Additionally, Patricia shares insights on the limitations of data anonymization, the benefits of balancing real-world and synthetic data in model training and development, and the relationship between privacy and bias in AI. Finally, we touch on the evolving landscape of AI regulations like GDPR, RA, and the EU AI Act, and the future of privacy in artificial intelligence. The complete show notes for this episode can be found at https://twimlai.com/go/716.

Internet y tecnología 4 meses

 0

 0

 7



51:03

AI Engineering Pitfalls with Chip Huyen - #715

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Chip Huyen, independent researcher and writer to discuss her new book, “AI Engineering.” We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines AI agents, their current limitations and capabilities, and the critical role of effective planning and tool utilization in these systems. Additionally, Chip shares insights on the importance of evaluation in AI systems, highlighting the need for systematic processes, human oversight, and rigorous metrics and benchmarks. Finally, we touch on the impact of open-source models, the potential of synthetic data, and Chip’s predictions for the year ahead. The complete show notes for this episode can be found at https://twimlai.com/go/715.

Internet y tecnología 4 meses

 0

 0

 7



57:07

Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714

Episodio en This Week in Machine Learning & AI Podcast

Today, we're ed by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s Generative AI platform. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to the new challenges and opportunities presented by generative AI workloads and AI agents. We explore their use of cloud-based infrastructure—in this case on AWS—to provide a foundation upon which they then layer open-source and proprietary services and tools. We cover their use of Llama 3 and open-weight models, their approach to fine-tuning, their observability tooling for Gen AI applications, their use of inference optimization techniques like quantization, and more. Finally, Abhijit shares the future of agentic workflows in the enterprise, the application of OpenAI o1-style reasoning in models, and the new roles and skillsets required in the evolving GenAI landscape. The complete show notes for this episode can be found at https://twimlai.com/go/714.

Internet y tecnología 4 meses

 0

 0

 9



57:38

También te puede gustar Ver más

TechEmergence Podcast: Innovation at the Intersect The AI in Business Podcast is for non-technical business leaders who need to find AI opportunities, align AI capabilities with strategy, and deliver ROI. Each week, Emerj AI Research CEO Daniel Faggella and team interview top AI executives from Fortune 2000 firms and unicorn startups - uncovering trends, use-cases, and best practices for practical AI adoption. Visit our advertising page to learn more about reaching our executive audience of Fortune 2000 AI adopters: https://emerj.com/ Actualizado

Ubuntu Podcast 3 experienced Open Source professionals as they discuss the impact Linux has in their daily lives. Upbeat family-friendly banter, conversation and discussion for Linux enthusiasts and casual observers of all ages. A new episode every two weeks covering terminal productivity, desktop experience, development, gaming, hosting, hardware, community, cloud-native and all the Linux Matters that matter. Actualizado

Gradient Dissent - Weights & Biases Lukas Biewald on Gradient Dissent, an AI-focused podcast brought to you by Weights & Biases. Dive into fascinating conversations with industry giants from NVIDIA, Meta, Google, Lyft, OpenAI, and more. Explore the cutting-edge of AI and learn the intricacies of bringing models into production. Actualizado

Ir a Internet y tecnología