AI chatbots show hawkish tendencies in war simulations

Mar 4, 2026, 11:18am EST

A digitally generated apocalyptic scene. Bulgac/Getty Images.

In wargames, Anthropic’s Claude chatbot is a “calculating hawk,” OpenAI’s GPT-5.2 is “Jekyll and Hyde,” and Google’s Gemini a “madman” — and the games usually involve nuclear blasts.

Researchers tested the chatbots in simulated international crises; all demonstrated self-awareness, an ability to model opponents’ thinking, and grasp of game theory. They also often tended toward escalation. The findings can’t be extrapolated to the real world — the scenarios were extreme, with the regimes often facing first strikes or annihilation — but revealed AIs’ skill at strategic reasoning, as well as a certain bloodthirstiness. Perhaps it would have made a difference if the researchers had used the top-tier versions of the AI, as one would hope any national defense ministry might have done.

— Tom Chivers