Anthropic Uses Pokémon Red to Benchmark New AI Model

New research report: Executive Guide to AI Investments in 2025.

Home > AI This Week > Anthropic Uses Pokémon Red to Benchmark New AI Model

Anthropic Uses Pokémon Red to Benchmark New AI Model

March 4, 2025

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Anthropic has employed the classic Game Boy game Pokémon Red to test its latest AI model, Claude 3.7 Sonnet. Unlike its predecessor, Claude 3.0 Sonnet, which struggled to leave the starting area, the updated model successfully battled three gym leaders, demonstrating impressive progress. Equipped with basic memory, screen pixel input, and function calls, Claude 3.7 Sonnet leveraged “extended thinking” to perform 35,000 actions and achieve significant milestones. The company revealed that within hours, the AI defeated Brock and subsequently conquered Misty, showcasing its advanced problem-solving capabilities. Pokémon Red joins a range of games now used to assess AI performance

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Subscribe to Our Newsletter

Get updates without the overload — no spam, just relevant news, once per week.

FORM CODE HERE

We’ll keep your email secure and private, and never share it with third parties.

Why Inbenta

With our Composite AI solution, your Virtual Agent continuously learns from each interaction, achieving over 99% accuracy.

Based on 20+ peer reviews

Service & Support

Anthropic Uses Pokémon Red to Benchmark New AI Model

Automate Conversational Experiences with AI

Automate Conversational Experiences with AI

Subscribe to Our Newsletter

Why Inbenta

Related AI This Week posts

Tiny but Mighty AI Model Masters Emotional Speech

Motorola Brings Smart AI to Razr Foldable Phones

Adobe Unveils Mobile Firefly AI Image Generator