← Back to Lex Fridman Podcast
Episode #490

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Published: January 31, 2026 ~3 hours Highly Credible

Guests: Nathan Lambert (Post-Training Lead, AI2) & Sebastian Raschka (Author, "Build an LLM From Scratch")

Quick Take

Two working ML researchers deliver the most technically grounded AI landscape discussion in recent memory. Nathan Lambert leads post-training at the Allen Institute for AI and authored the definitive RLHF book; Sebastian Raschka is the author of "Build a Large Language Model From Scratch" and "Build a Reasoning Model From Scratch." Unlike CEO interviews filled with marketing speak, this is practitioners comparing notes on the actual state of the field—from the DeepSeek moment to why Claude Opus 4.5 hype has become "a meme" to Google's structural TPU advantage over NVIDIA's insane margins.

Key Claims Examined

🇨🇳 "DeepSeek Kicked Off a Movement in China Like ChatGPT Did in the US"

"DeepSeek kicked off a movement within China similar to how ChatGPT kicked off a movement in the US where everything had a chatbot. There are now tons of tech companies in China that are releasing very strong frontier open weight models."

Our Analysis

This is an accurate characterization of the post-DeepSeek R1 landscape. Nathan Lambert names multiple Chinese companies that have emerged or accelerated since DeepSeek's January 2025 release:

  • Z.ai (GLM models) — Has filed IPO paperwork, actively seeking Western mindshare
  • MiniMax — Also pursuing IPO, releasing frontier models
  • Moonshot (Kimi K2 Thinking) — Praised for creative writing and coding capabilities
  • Qwen (Alibaba) — Qwen 3 cited as a standout model for performance

Sebastian adds important nuance: "It's not that DeepSeek got worse, it's just like the other ones are using the ideas from DeepSeek. For example, you mentioned Kimi, same architecture, they're training it."

Verdict: Accurate — documented industry trend

💻 "Claude Code Is Way Better" for Programming

"You can have Claude Code open, you can have Cursor open, you can have VS Code open, and you can select the same models on all of them… and ask questions, and it's very interesting. Claude Code is way better in that domain. It's remarkable."

Our Analysis

This reflects the current practitioner consensus on AI coding tools. The guests offer a nuanced breakdown of preferences:

  • Nathan Lambert: Uses Claude Code for philosophical discussions, code, and agentic tasks. "Claude Code makes it fun to build things, particularly from scratch."
  • Sebastian Raschka: Prefers Codeium plugin for VS Code. "I'm not quite there yet where I'm comfortable with [Claude Code] because maybe I'm a control freak, but I still like to see what's going on."
  • Lex Fridman: Uses "half-and-half Cursor and Claude Code" — finds them "fundamentally different experiences and both useful."

The claim that Claude Code + Claude Opus 4.5 is the best combination for agentic coding is well-supported by current benchmarks and developer sentiment. However, this is a rapidly changing space.

Verdict: Credible — reflects current practitioner consensus

💰 "NVIDIA Chips Margin Is Insane" — Google's TPU Advantage

"The margin on NVIDIA chips is insane and Google can develop everything from top to bottom to fit their stack and not have to pay this margin, and they've had a head start in building data centers."

Our Analysis

This is financially accurate. NVIDIA's gross margins have ranged from 60-75% in recent quarters—extraordinarily high for a hardware company. Google's vertical integration provides structural cost advantages:

  • TPU development: Google designs its own AI accelerators, avoiding NVIDIA's margin premium
  • Data center infrastructure: Decades of investment in custom data center design
  • Software stack: TensorFlow/JAX optimized for their hardware

Nathan's prediction: "Google has the ability to separate research and product a bit better, whereas you hear so much about OpenAI being chaotic operationally." This organizational observation matches reporting from The Information and other tech outlets.

Verdict: Accurate — verified by public financial data

🔓 Chinese Open Models Have "Friendlier Licenses"

"The appeal of the open weight models from China is that the licenses are even friendlier. I think they are just unrestricted open source licenses, whereas if we use something like Llama or Gemma, there are some strings attached."

Our Analysis

This is accurate and a meaningful distinction for commercial users:

  • Meta's Llama: Community license requires reporting if you exceed 700M monthly active users
  • Google's Gemma: Similar commercial restrictions with usage thresholds
  • DeepSeek, Qwen, etc.: Generally Apache 2.0 or MIT licenses with no commercial restrictions

Sebastian explains the commercial appeal: "You can customize them, you can train them, you can add more data post-training, like specialize them into, let's say, law, medical models, whatever you have."

Nathan adds the geopolitical context: "A lot of top US tech companies and other IT companies won't pay for an API subscription to Chinese companies for security concerns... these companies then see open weight models as an ability to influence and take part in a huge growing AI expenditure market in the US."

Verdict: Accurate — verified by license comparison

🧠 "gpt-oss Is First Open Model Trained for Tool Use"

"I think with gpt-oss, what's interesting about it is it's kind of the first open-weight model that was really trained with tool use in mind, which I do think is a bit of a paradigm shift."

Our Analysis

Sebastian highlights OpenAI's open-source release (gpt-oss-120b) as significant for native tool-calling capabilities. This claim requires context:

  • What's true: gpt-oss was explicitly trained for tool use from the ground up, not fine-tuned afterward
  • Prior art: Other models (like Llama 3) had tool-calling capabilities, but typically via post-training rather than native training
  • The significance: Native tool training can reduce hallucinations by encouraging the model to search rather than fabricate answers

Sebastian elaborates: "If I ask the LLM, 'Who won the soccer World Cup in 1998?' instead of just trying to memorize, it could go do a search... It would get you that information reliably instead of just trying to memorize it."

Verdict: Largely accurate — some prior tool-calling models existed, but distinction is meaningful

🏗️ Transformer Architecture Is "Surprisingly Similar" Across Models

"You can still start with GPT-2, and you can add things to that model to make it into this other model. So it's all still kind of like the same lineage... I put them all together in an article once where I just compared them; they are surprisingly similar."

Our Analysis

As the author of "Build an LLM From Scratch," Sebastian speaks with authority here. His key technical insights:

  • Core architecture unchanged: Still decoder-only transformers derived from "Attention Is All You Need" (2017)
  • Key innovations are tweaks: Mixture of Experts, multi-head latent attention, group query attention, sliding window attention
  • Efficiency focus: "Most of them focused on the attention mechanism... different tweaks to make inference or KV cache size more economical"
  • Emerging trend: Linear attention mechanisms like Qwen3-neXt's "gated delta net" inspired by state space models

This matches the consensus in the ML research community. The transformer is now 8+ years old, and while there have been significant efficiency improvements, the core architecture remains remarkably stable.

Verdict: Accurate — reflects research consensus

What Should We Believe?

This episode stands out for its practitioner credibility. Both guests work directly with these models daily:

  1. Trust the technical claims: When Nathan and Sebastian discuss architecture (MoE, attention mechanisms, KV cache), they're speaking from hands-on experience. Sebastian literally wrote the book on building LLMs from scratch.
  2. The model comparison insights are valuable: Their breakdown of when to use ChatGPT vs Claude vs Gemini vs Grok reflects real usage patterns, not marketing hype. The admission that they rarely use Chinese models despite praising them is notably honest.
  3. Take predictions with appropriate uncertainty: Nathan's prediction that "Gemini will continue to make progress on ChatGPT" and that "there will be more open model builders throughout 2026" are informed speculation, not guarantees.
  4. The business model insights are sharp: The discussion of why Chinese companies release open models (influence, distribution, circumventing API security concerns) is sophisticated industry analysis.
  5. Sam Altman's GPU constraints quote is revealing: "We're releasing this because we can use your GPUs. We don't have to use our GPUs and OpenAI can still get distribution out of this" — this candid admission about OpenAI's resource constraints is notable.

The Bottom Line

This is one of the most technically substantive AI conversations in recent memory. Unlike interviews with CEOs who have products to sell, Nathan Lambert and Sebastian Raschka are researchers and educators whose incentives align with accuracy. They freely admit uncertainty ("I don't think there will be a clear winner"), acknowledge their own biases ("We use OpenAI GPT-5 Pro consistently... these models from the US are better"), and provide technical depth without losing accessibility.

Key takeaways: The AI landscape is more competitive than ever, with Chinese open models forcing US companies to respond. Claude Opus 4.5 and Claude Code are currently winning the developer mindshare war. Google's TPU advantage and organizational stability may matter more than model benchmarks. And the transformer architecture, for all its innovations, remains fundamentally GPT-2 with better attention mechanisms and smarter training.

Listen if you want: An honest practitioner's view of the AI landscape without the hype, specific tool recommendations from people who actually use them, and accessible explanations of technical concepts like MoE and attention mechanisms.

Guest Credentials

Nathan Lambert

Post-Training Lead at the Allen Institute for AI (AI2), where he leads work on OLMo and open language models. Author of the definitive book on Reinforcement Learning from Human Feedback (RLHF). Active on X @natolambert and his Substack "Interconnects."

Sebastian Raschka

Machine learning researcher and author of "Build a Large Language Model From Scratch" and "Build a Reasoning Model From Scratch." Previously a professor at UW-Madison. Known for educational content making complex ML concepts accessible. Active on X @rasaborion.