OpenAI's GPT-5.5 Matches Anthropic's Mythos Preview in Cybersecurity Tests

OpenAI's GPT-5.5, launched publicly late April, demonstrated cybersecurity performance comparable to Anthropic's Mythos Preview in independent UK tests conducted by the AI Security Institute (AISI) ^[1].

AISI evaluated the models on 95 Capture the Flag cybersecurity challenges covering reverse engineering, web exploitation, and cryptography. These tasks tested the AI's ability to solve problems requiring deep technical knowledge and practical skill ^[1].

On the most difficult Expert-level challenges, GPT-5.5 passed 71.4% on average. This was slightly higher but statistically comparable to Mythos Preview's 68.6% pass rate on the same tasks ^[1].

GPT-5.5 completed a challenging reverse engineering task by building a disassembler for a Rust binary in 10 minutes and 22 seconds, costing $1.73 in API calls and requiring no human assistance ^[1]. This highlighted the model's efficiency and autonomy on complex technical problems.

In AISI's "The Last Ones" simulation, a 32-step corporate network data extraction attack, GPT-5.5 succeeded in 3 of 10 attempts. This slightly outperformed Mythos Preview, which succeeded twice out of 10 tries. No previous AI model had succeeded in this simulation ^[1].

Both models failed the "Cooling Tower" simulation, which tests AI disruption of power plant control software. Prior models had also failed this scenario ^[1].

AISI first evaluated Mythos Preview in April. OpenAI released GPT-5.5 around April 28 to May 1, followed by immediate testing by AISI ^[1].

Further cybersecurity benchmark testing of these AI models is expected as companies seek to refine defensive and offensive AI tools for complex security tasks ^[1].

OpenAI's GPT-5.5 Matches Anthropic's Mythos Preview in Cybersecurity Tests

Gallery

Sources