Anthropic announced two New models, Claude 4 Opus and Claude Sonnet 4, during his first developer conference in San Francisco on Thursday. The pair will be immediately available to pay Claude’s subscribers.
The new models, which skip the name of name from 3.7 to 4, have a number of forces, including their ability to reason, plan and remember the context of conversations over long periods, according to society. Claude 4 opus is also even better to play Pokémon than his predecessor.
“He was able to work on the agency on Pokémon for 24 hours,” said Anthropic Products Director Mike Krieger, in an interview with Wired. Previously, the longest that the model could play was only 45 minutes, added a company spokesperson.
A few months ago, Anthropic launched a Twitch flow called “Claude Plays Pokémon” which presents the capacities of Claude 3.7 Sonnet to Pokémon Red Live. The demo is intended to show how Claude is able to analyze the game and make decisions step by step, with a minimum direction.
The advance behind Pokémon research is David Hershey, member of Anthropic’s technical staff. In an interview with Wired, Hershey says he has chosen Pokémon Red because it is “a simple playground”, which means that the game is in turn and does not require real -time reactions, with which the current anthropic models are struggling. It was also the first video game he ever played, on the original game boy, after having obtained it for Christmas in 1997. “He has a fairly special place in my heart,” said Hershey.
Hershey’s primordial objective with this research was to study how Claude could be used as an agent – working independently to perform complex tasks in the name of a user. Although it is not difficult to know what Claude previous has about Pokémon from his training data, his system prompt is minimal by design: you are Claude, you play Pokémon, here are the tools you have and you can press the buttons on the screen.
“Over time, I went through and deleted all the specific things to Pokémon that I can just because I think it is really interesting to see how the model can understand alone,” says Hershey, adding that he hopes to build a game that Claude has never seen before in order to really test his limits.
When Claude 3.7 Sonnet played the game, he met certain challenges: he spent “tens of hours“Stuck in a city and had trouble identifying non-playing characters, which radically slowed down its progress in the game. With Claude 4 opus, Hershey noticed an improvement in the long-term memory of Claude and planning capacities when he looked at the future, he spent an test to improve his skills before continuing to play. Immediate feedback, shows a new level of consistency, which means that the model has a better capacity to stay on the right track.
“This is one in my favorite ways to experience a model. As, this is how I understand what his strengths are, what are his weaknesses, ”explains Hershey. “This is my way of simply understanding this new model that we are about to go out and how to work with it.”
Everyone wants an agent
Research on anthropic Pokémon is a new approach to solving a preexisting problem – how do we understand what decisions an AI takes when approaching complex tasks and pushes it in the right direction?
The answer to this question is an integral part of advancing the agents of the very publicized AI of industry – AI which can attack the complex tasks with relative independence. In Pokémon, it is important that the model does not lose a context or does not forget the task to be accomplished. This also applies to AI agents invited to automate a workflow, even one who takes hundreds of hours.