US government expands frontier AI testing to five major labs
The US government expanded its AI safety evaluation program to include models from Google DeepMind, Microsoft, and xAI alongside OpenAI and Anthropic. Government scientists test unreleased systems for cybersecurity, biosecurity, and infrastructure vulnerabilities.
The US government expanded its frontier AI safety evaluation program to include models from Google DeepMind, Microsoft, and xAI, joining existing partnerships with OpenAI and Anthropic. Government scientists conduct pre-deployment testing on unreleased systems, assessing vulnerabilities in cybersecurity, biosecurity, infrastructure, and misuse scenarios.
Pre-deployment regulatory baseline
Formal government testing of frontier models before public release establishes a new regulatory baseline. This approach differs from post-hoc audits by creating a gating mechanism for deployment. The program tests systems that have not yet been released to the public, giving government scientists early access to frontier capabilities and potential risks.
Industry and enterprise implications
Pre-deployment testing may slow release timelines but increases enterprise confidence in safety assurances. Organizations deploying frontier models gain assurance that government scientists have evaluated the system for critical vulnerabilities. The expansion to five labs signals that government testing is becoming a standard requirement rather than a voluntary partnership, establishing precedent for future frontier model releases.