From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 5x5gm

14/05/2025

Today, we're ed by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how...

Today, we're ed by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA.
The complete show notes for this episode can be found at https://twimlai.com/go/731.

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 19 días 57:37 Google I/O 2025 Special Edition - #733 11 días 26:37 Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 4 días 01:25:37 How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1 mes 01:07:42 CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 1 mes 55:48 Ver más en APP Comentarios del episodio 55z25