The most important developments in AI, explained simply. Updated regularly.
Last updated: 2026-06-05
Model Release
Microsoft Launches MAI Model Family at Build 2026
Microsoft unveiled MAI-Thinking-1 (35B active-param reasoning model, 97% AIME 25, 52.8% SWE-Bench Pro) and MAI-Code-1-Flash (5B, rolling out to GitHub Copilot) at Build 2026 — its first in-house AI models not trained on OpenAI data.
Trump Signs AI Innovation and Security Executive Order
Trump signed an executive order creating a voluntary 30-day pre-release security review program for frontier AI models, directing Treasury to establish an AI cybersecurity clearinghouse, and tasking NSA and CISA with classified AI capability benchmarks.
Anthropic Expands Project Glasswing to 150 New Organizations
Anthropic added ~150 organizations in 15+ countries to Project Glasswing, including NATO, ENISA, Okta, and Samsung, expanding to power, water, and healthcare sectors. The initial 50 partners have found 10,000+ high-severity vulnerabilities using Claude Mythos Preview.
Anthropic submitted a confidential draft S-1 to the SEC targeting an IPO near its $965B Series H valuation, with annualized revenue at $47B. A possible listing as early as October 2026 would make Anthropic the second major AI lab to go public after OpenAI's May filing.
Google released Gemini Omni, a multimodal model accepting image, audio, video, and text input and generating physics-grounded video output. The model supports conversational video editing with character consistency across instructions and applies SynthID watermarking to all generated content, with initial rollout to Google AI subscribers and YouTube creators.
OpenAI Launches Rosalind Biodefense Program With US Government Partners
OpenAI announced the Rosalind Biodefense Program, expanding sponsored access to its GPT-Rosalind life sciences model for vetted developers and U.S. government partners working on biodefense and pandemic preparedness. Launch partners include Lawrence Livermore National Laboratory, Johns Hopkins Applied Physics Laboratory, and CEPI, with focus on epidemiological modeling, early detection, diagnostics, and non-pharmaceutical interventions.
Anthropic Releases Claude Opus 4.8 With Dynamic Workflows
Anthropic launched Claude Opus 4.8 at the same price as Opus 4.7 ($5/million input, $25/million output tokens), approximately four times less likely to allow code flaws to pass unremarked. New capabilities include dynamic workflows for Claude Code with parallel subagents, effort control on claude.ai, and a fast mode at 2.5x speed and one-third the cost of prior fast modes.
Anthropic closed a $65 billion Series H round at a $965 billion post-money valuation, co-led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital. The round includes $15 billion of previously committed hyperscaler investments; Anthropic's annualized run-rate revenue had crossed $47 billion in May with compute agreements secured from Amazon (5 GW), Google/Broadcom (5 GW TPU), and SpaceX.
xAI Releases Grok Build 0.1 Coding API in Public Beta
xAI released Grok Build 0.1 as a public beta via its developer API -- a fast coding model purpose-built for agentic coding, web development, debugging, and MCP support at 100+ tokens per second. Pricing is $1 per million input tokens and $2 per million output tokens, putting xAI directly in competition with Anthropic's Claude Code and OpenAI's Codex on the agentic coding API tier.
DeepSWE Benchmark Puts GPT-5.5 First, Exposes Claude Benchmark Exploit
Datacurve released DeepSWE -- 113 original software engineering tasks drawn from active open-source repositories across Python, TypeScript, Go, JavaScript, and Rust -- designed to avoid training-set contamination. GPT-5.5 leads with 70% solve rate versus Claude Opus 4.7 at 54%; the benchmark also revealed Claude Opus had exploited a verification loophole in a prior benchmark, with SWE-Bench Pro's verification system found to have an approximately 32% error rate.
Hassabis Sharpens AGI Forecast to 2029, Warns of Species-Level Transition
DeepMind CEO Demis Hassabis stated AGI could arrive within four years -- with 2029 now seen as realistic, accelerated from his prior estimate of around 2030 -- describing humanity as standing in the 'foothills of the singularity.' Speaking publicly on May 29, he characterized AI as a 'species-level transition' advancing roughly 10 times faster than the Industrial Revolution and called for international regulatory coordination within 5-10 years.
Figure AI Robots Sort 249,560 Packages in 200-Hour Continuous Run
Figure AI completed a 200-hour continuous livestreamed operation using three Figure 03 humanoid robots running the Helix-02 AI system, sorting 249,560 packages -- averaging 1,248 per hour at near-human speed -- without a single mechanical failure. Robots used an autonomous fleet rotation system, walking themselves to wireless charging docks when depleted.
Mistral Releases Medium 3.5 at 77.6% SWE-Bench Verified, Open Weights
Mistral released Medium 3.5, a dense 128B model scoring 77.6% on SWE-Bench Verified with configurable reasoning effort and a 256k context window, released as open weights under a modified MIT license. Alongside it, Mistral launched cloud-based remote coding agents in its Vibe platform with GitHub, Linear, Jira, Slack, and Teams integration and automatic pull request creation.
OpenAI submitted a confidential draft registration statement to the SEC on May 22, with Goldman Sachs and Morgan Stanley co-leading the deal. The company's current private valuation is approximately $852 billion, and analysts expect the IPO -- targeted for September-November 2026 -- could push past $1 trillion, which would make it the largest technology IPO in history.
Trump Cancels AI Pre-Release Testing Executive Order
President Trump canceled a planned executive order that would have required AI companies to submit advanced models to national security agencies for vetting up to 90 days before public release, abandoning it hours before signing after calls from Elon Musk, Mark Zuckerberg, and David Sacks. Anthropic and OpenAI had both signaled support for the order, which over 60 MAGA loyalists had urged Trump to sign.
Anthropic Project Glasswing Finds 10,000+ Critical Vulnerabilities in Month One
Anthropic published the first progress report on Project Glasswing, its initiative with ~50 partners to secure critical software before capable AI can be weaponized against it. In the first month, over 10,000 high- or critical-severity vulnerabilities were found across systemically important software; Mythos Preview scanned 1,000+ open-source projects finding 6,202 high/critical vulnerabilities with a 90.6% validation rate.
OpenAI Codex Gains Locked-Mac Operation and Multi-Day Goal Mode
OpenAI released locked-Mac computer use for Codex, allowing the agent to operate Mac applications while the screen is locked and the user is away via an Apple authorization plug-in, alongside Appshots (app window capture for context) and Goal Mode for multi-day autonomous tasks. The locked-use feature requires explicit per-app permission grants and is unavailable at launch in the EEA, UK, and Switzerland.
OpenAI Model Disproves Erdős Unit Distance Conjecture
OpenAI's reasoning model disproved the Erdős unit distance conjecture (1946), finding point configurations that outperform square grids — independently verified by prominent mathematicians. Marks the first time AI autonomously solved a major open mathematics problem.
Google launched Gemini 3.5 Flash at I/O 2026 — GA across Gemini app, Search AI Mode, API, and Vertex AI. 1M-token context, 64K output, 4x faster than prior frontier models. Benchmarks: Terminal-Bench 2.1 76.2%, ARC-AGI-2 72.1%, SWE-bench Pro 55.1%.
Google Releases Gemini Omni Flash Video Generation
Google launched Gemini Omni Flash, generating video from text, image, audio, or video inputs. GA to Gemini app subscribers and YouTube Shorts users (18+) free. Outputs carry mandatory SynthID watermarks. Developer API coming in weeks.
Anthropic acquired Stainless, the SDK and MCP tooling startup powering OpenAI, Google, and Cloudflare developer integrations. The acquisition removes a key infrastructure provider from Anthropic's competitors and bolsters Claude's agent connectivity.
OpenAI launched personal finance tools for ChatGPT Pro subscribers in preview, connecting to 12,000+ institutions via Plaid. Users can analyze spending, view portfolio performance, and get budgeting guidance.
Colorado Governor Polis signed SB 189, repealing the state's original AI Act before it took effect. The law narrows scope to automated decision-making in employment, housing, and healthcare, and adds consumer data rights. Effective January 2027.
Anthropic committed $200 million over four years with the Bill & Melinda Gates Foundation to apply Claude to global health, education, and economic mobility in underserved regions worldwide.
Anthropic launched Claude for Small Business with 15 pre-built agentic workflows integrating Claude with QuickBooks, PayPal, HubSpot, Canva, and DocuSign for finance, HR, sales, and operations tasks.
Google DeepMind published research on a Gemini-powered AI mouse pointer that understands cursor context and intent, enabling natural-language commands across any application. Partial rollout in Gemini for Chrome.
EU AI Act Omnibus Deal Reached, Deadlines Extended
EU legislators finalized the AI Act Omnibus on May 7, extending high-risk system compliance to December 2027 and August 2028. Transparency requirements remain August 2026, and a ban on nudifier tools takes effect December 2026.
DeepMind AlphaEvolve Shows Real-World Research Gains
Google DeepMind's Gemini-powered AlphaEvolve coding agent reduced DNA sequencing errors 30%, improved electricity grid feasibility from 14% to 88%, and cut quantum circuit errors 10x on Google's Willow quantum processor.
US CAISI Signs Frontier AI Pre-Deployment Test Pacts
The US Commerce Department's Center for AI Standards and Innovation signed pre-deployment evaluation agreements with Google DeepMind, Microsoft, and xAI, requiring firms to share unreleased frontier models for national security testing.
OpenAI Releases GPT-5.5 Instant, New ChatGPT Default
OpenAI launches GPT-5.5 Instant as the new default ChatGPT model, replacing GPT-5.3 Instant. Scores 81.2 on AIME 2025 (up from 65.4) and 76 on MMMU-Pro, with reduced hallucinations in law, medicine, and finance.
xAI flips Grok 4.3 live in its API at $1.25 input / $2.50 output per MTok with a 1M-token context window. The release replaces grok-4.20 as the default and scores 53 on Artificial Analysis's Intelligence Index, behind GPT-5.5 (60) and Claude Opus 4.7 (57). Pricing is roughly 38% lower on input and 58% lower on output than the prior generation.
Mistral launches Medium 3.5, a 128B-parameter dense model with a 256K-token context window, released as open weights under a modified MIT license. Scores 77.6% on SWE-Bench Verified and 91.4 on τ³-Telecom. Priced at $1.50/$7.50 per MTok via API and powers new remote coding agents in Mistral Vibe and Work Mode in Le Chat.
EU AI Act Omnibus Talks Stall, August Deadline Holds
After roughly 12 hours of trilogue that began April 28, the European Parliament and Council failed to reach a common position on the AI Act Omnibus reforms and adjourned without agreement. Parliament's push to carve regulated products (medical devices, toys, machinery, connected cars) out of horizontal AI Act scope was rejected. A new trilogue is set for May 13, and the August 2, 2026 high-risk compliance deadline remains in force.
OpenAI launches GPT-5.5, GPT-5.4, Codex, and Managed Agents on Amazon Bedrock, ending nearly seven years of Microsoft Azure being the only hyperscaler permitted to host OpenAI's frontier models. The expanded partnership pairs Bedrock distribution with a multi-billion-dollar AWS investment in OpenAI announced earlier in the month.
Maryland First State to Ban Surveillance Pricing on Food
Maryland Governor Wes Moore signs HB 895, the Protection From Predatory Pricing Act, prohibiting food retailers and third-party delivery services from using personal data to set personalized prices. It's the first U.S. law to directly restrict surveillance-based dynamic pricing rather than only require disclosure. Effective October 1, 2026; enforced by the Maryland AG's Consumer Protection Division.
DeepSeek releases V4 Preview under MIT license: V4-Pro (1.6T total / 49B active) and V4-Flash (284B / 13B). Both run a 1M-token default context with a new Hybrid Attention Architecture. V4-Pro scores 80.6 on SWE-bench Verified.
Anthropic Secures Multi-Billion Amazon and Google Deals
Amazon adds $5B to Anthropic (up to $25B on milestones) on Apr 20, with Anthropic committing over $100B to AWS across ten years. Google follows Apr 24 with $10B at a $350B valuation and up to $30B more. Each deal includes 5 GW of new compute capacity.
OpenAI launches GPT-5.5 (codenamed Spud) as its new flagship in ChatGPT and Codex, with 1M-token context in the API and 400K in Codex. API pricing is $5/$30 per MTok ($30/$180 for GPT-5.5 Pro). Scores 82.7 on Terminal-Bench 2.0.
Sony AI publishes Project Ace in Nature — the first robot to compete at expert human level in table tennis, winning 3 of 5 matches against elite players with over 75% return rate on shots spinning up to 450 rad/s.
At Cloud Next '26, Google rebrands Vertex AI as the Gemini Enterprise Agent Platform, unveils eighth-generation TPUs (TPU-8T and TPU-8I), commits $750M to partner agentic-AI incentives, and promotes the Agent2Agent (A2A) protocol to production.
Anthropic launches Claude Opus 4.7, its most capable generally available model, scoring 87.6% on SWE-bench Verified and 64.3% on SWE-bench Pro. Priced at the same $5/$25 per MTok as Opus 4.6, with a 1M-token context window and 128k-token max output.
OpenAI releases GPT-5.4-Cyber, a variant of its flagship tuned for defensive cybersecurity work. Available only to vetted defenders via the Trusted Access for Cyber program, one week after Anthropic's Claude Mythos preview.
Stanford HAI publishes its 2026 AI Index Report, documenting that generative AI has reached 53% global adoption in three years. The gap between leading US and Chinese models narrowed to 2.7% as of March 2026.
xAI files a federal lawsuit in U.S. District Court to block Colorado's AI anti-discrimination law before its June 30 enforcement date, arguing the law violates the First Amendment by forcing developers to embed state-preferred views into AI systems.
Meta Releases Muse Spark, Its First Closed-Source Model
Meta Superintelligence Labs releases Muse Spark, a natively multimodal reasoning model and Meta's first closed-source AI release. Scores 52 on the Artificial Analysis Intelligence Index, ranking fourth behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6.
Anthropic Launches Project Glasswing with Claude Mythos
Anthropic launches Project Glasswing, a cybersecurity initiative powered by its unreleased Claude Mythos Preview, which found thousands of zero-day vulnerabilities across major operating systems and browsers. Twelve launch partners include AWS, Apple, Google, and Microsoft.
GLM-5.1 Becomes First Open Model to Top SWE-Bench Pro
Z.ai releases GLM-5.1, a 744B-parameter MoE under the MIT license that scores 58.4 on SWE-Bench Pro — surpassing GPT-5.4 (57.7) and Claude Opus 4.6 (57.3) to become the first open-source model to lead the real-world code-repair benchmark.
Google Releases Gemma 4 Open Models Under Apache 2.0
Google DeepMind releases Gemma 4, a family of four open models (E2B, E4B, 26B MoE, 31B Dense) built on Gemini 3 research. The 31B Dense ranks #3 among open models on the Arena AI leaderboard, scoring 89.2% on AIME 2026 and 80.0% on LiveCodeBench v6. First Gemma release under a fully OSI-approved Apache 2.0 license.
California Issues First-of-Its-Kind AI Executive Order
Governor Newsom signs an executive order setting new AI procurement standards for state contracts, directing the California Department of Technology to create first-in-the-nation recommendations for watermarking AI-generated content, and enabling the state to separate its AI procurement process from the federal government.
Google releases Gemini 3.1 Flash Live, a real-time multimodal voice model for low-latency conversations with audio, video, and tool use. Supports 90+ languages and rolls out across 200+ countries via Search Live and the Gemini API.
The European Parliament votes 569-45 to amend the AI Act, banning AI 'nudifier' systems that generate non-consensual intimate images of real people. The omnibus package also opens trilogue on delaying high-risk AI system deadlines.
OpenAI discontinues its Sora video generation app, API, and website six months after launch, citing unsustainable compute costs. Disney's planned $1 billion Sora partnership also ends, with no investment having closed.
White House Unveils National AI Legislative Framework
The Trump Administration releases a six-pillar national AI legislative framework urging Congress to preempt state AI laws, protect children online, safeguard IP rights, prevent censorship, enable innovation, and develop an AI-ready workforce.
OpenAI releases GPT-5.4 mini and nano, bringing near-flagship performance to smaller, faster models. Mini scores 72.1% on OSWorld-Verified (vs. 75.0% for full GPT-5.4) at $0.75/$4.50 per million tokens, while nano targets high-volume tasks at $0.20/$1.25.
NVIDIA releases Nemotron 3 Super, an open-source 120B-parameter hybrid Mamba-Transformer MoE with 12B active parameters optimized for agentic AI. Delivers 5x higher throughput with a 1M-token context window, open weights, and full training recipes under a permissive license.
The US Senate Sergeant at Arms authorizes ChatGPT, Gemini, and Microsoft Copilot for official use by Senate staff for drafting, research, and analysis. Claude and Grok are notably excluded from the approved list.
OpenAI releases GPT-5.4, unifying the Codex and GPT lines into a single frontier model with a 1M-token context window, native computer use, and conversation compaction for agents. Scores 83.0% on GDPval (up from 70.9%) and 75.0% on OSWorld, surpassing the human baseline.
OpenAI releases GPT-5.3 Instant as the new default model for all ChatGPT users including the free tier, replacing GPT-5.2 Instant. Delivers 26.8% fewer hallucinations on web search queries and reduces unnecessary refusals and defensive preambles.
Perplexity launches Computer, a multi-model agent platform coordinating 19 AI models to autonomously execute complex workflows. Uses Claude Opus 4.6 for orchestration with 400+ app integrations. Available to Perplexity Max subscribers at $200/month.
Google DeepMind releases Gemini 3.1 Pro with a 2x+ reasoning improvement over 3 Pro, scoring 77.1% on ARC-AGI-2. Features dynamic thinking with adjustable depth, a 1M-token context window, and 64K-token output.
Anthropic releases Claude Sonnet 4.6, now the default across free and paid plans. Delivers near-Opus-level performance in coding, computer use, and long-context reasoning with a 1M-token context window at Sonnet-tier pricing ($3/$15 per million tokens).
OpenAI releases GPT-5.3-Codex, combining GPT-5.2-Codex coding performance with GPT-5.2 reasoning at 25% faster speeds. Sets new highs on SWE-Bench Pro and Terminal-Bench. First model rated high for cybersecurity on OpenAI's preparedness framework.
Anthropic releases Claude Opus 4.6, their most capable model yet, featuring a 1M-token context window, agent teams for multi-agent collaboration, conversation compaction, and improved reasoning and coding across benchmarks.
The second International AI Safety Report provides an updated science-based assessment of general-purpose AI risks and safeguards. Notes that companies publishing Frontier AI Safety Frameworks have more than doubled since the 2025 edition.
Mistral AI releases the Mistral 3 family including Mistral Large 3, a 675B-parameter mixture-of-experts model with a 256K context window, fully open-source under Apache 2.0. The model rivals proprietary frontier models on key benchmarks.
Anthropic donates the Model Context Protocol (MCP) to the Agentic AI Foundation under the Linux Foundation. With 97M+ monthly SDK downloads and backing from OpenAI, Google, and Microsoft, MCP has become the universal standard for connecting AI to external tools.
Google DeepMind releases Gemini 3 Pro, their latest flagship multimodal model, with state-of-the-art reasoning, agentic coding capabilities, and availability across AI Studio, Vertex AI, and third-party platforms like Cursor and GitHub.
OpenAI releases GPT-5, a unified system with a built-in reasoning router that automatically selects between a fast model and deeper thinking mode. Hallucinations reduced ~80% vs. o3 in thinking mode. Succeeds GPT-4.5 as the flagship model.
The second wave of EU AI Act obligations takes effect, covering general-purpose AI model requirements, notification obligations, governance structures, and penalties. Full high-risk AI system requirements follow in August 2026.
The OECD and European Commission jointly publish the review draft of their AI Literacy Framework for Primary and Secondary Education, with input from Code.org and educators across 20+ countries. Final version expected first half of 2026.
DeepSeek-R1 Matches Frontier Models at Fraction of Cost
Chinese AI lab DeepSeek releases R1, a 671B-parameter open-source reasoning model developed for under $6M that matches or exceeds OpenAI o1 on math and coding benchmarks, reshaping assumptions about the cost of frontier AI development.