Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 3x6p1w

15/04/2025

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two...

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately s Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work.


The complete show notes for this episode can be found at https://twimlai.com/go/727.

Generative Benchmarking with Kelly Hong - #728 2 meses 53:47 CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 1 mes 55:48 How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1 mes 01:07:42 From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 29 días 01:01:53 RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 22 días 57:37 Ver más en APP Comentarios del episodio 3i11b