New research report: Executive Guide to AI Investments in 2025.

Anthropic Uses Pokémon Red to Benchmark New AI Model

March 4, 2025

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.


Anthropic has employed the classic Game Boy game Pokémon Red to test its latest AI model, Claude 3.7 Sonnet. Unlike its predecessor, Claude 3.0 Sonnet, which struggled to leave the starting area, the updated model successfully battled three gym leaders, demonstrating impressive progress. Equipped with basic memory, screen pixel input, and function calls, Claude 3.7 Sonnet leveraged “extended thinking” to perform 35,000 actions and achieve significant milestones. The company revealed that within hours, the AI defeated Brock and subsequently conquered Misty, showcasing its advanced problem-solving capabilities. Pokémon Red joins a range of games now used to assess AI performance

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Subscribe to Our Newsletter

Get updates without the overload — no spam, just relevant news, once per week.

FORM CODE HERE
We’ll keep your email secure and private, and never share it with third parties.

Why Inbenta

With our Composite AI solution, your Virtual Agent continuously learns from each interaction, achieving over 99% accuracy.

Based on 20+ peer reviews
Service & Support
SELECT YOUR LANGUAGE
SELECT YOUR LANGUAGE