We don't usually write news. There's enough of that. But the last two weeks broke our usual rule — every major lab shipped a coding model or a meaningful coding update almost back to back, and the questions started piling up in our inbox: "Do we switch?" "Is Cursor dead?" "Should we wait?" So here's the version we'd give a founder over coffee — what actually shipped, what it changes for teams building real software, and what we're doing about it on client work.
The two-week scoreboard
Microsoft launched MAI-Code-1-Flash, its first coding model. Google shipped Gemini 3.5 Flash and folded it into Antigravity. OpenAI made Codex generally available on AWS via Bedrock with 60+ new plugins. Anthropic updated Claude Opus 4.8 with dynamic, agent-orchestrating workflows. Four labs, one fortnight. The era of one obvious default is over.
1. Microsoft finally stepped into the ring
At Build, Microsoft announced MAI-Code-1-Flash — its first in-house coding model — alongside MAI-Thinking-1, a reasoning model. The headline isn't raw capability; it's the pitch: cheaper, faster, and less dependent on OpenAI. For Microsoft that's a strategic story about margins and control. For the rest of us it means a fourth serious vendor and downward pressure on price.
Our read: MAI-Code-1-Flash is not where you'd move a production agent today, but it's a shot across the bow. When the company that owns GitHub, VS Code and Azure decides to build its own coding model, the whole stack you depend on is going to shift underneath you over the next year. Worth watching, not worth migrating to this week.
2. Google: Gemini 3.5 Flash lands inside Antigravity
Google shipped Gemini 3.5 Flash — "frontier intelligence with action" — generally available via Antigravity, the Gemini API, AI Studio and Android Studio. They also rolled out a $100 AI Ultra plan aimed squarely at developers, with 5x the usage limits of AI Pro. We covered Antigravity when it launched; this update is what makes it genuinely usable for agentic work rather than a demo.
Flash is the operative word. The play here is speed and cost per action for agent loops where you fire hundreds of small calls. If your workflow is agent-heavy and latency-sensitive, this is the launch from the fortnight most worth a serious trial.
3. OpenAI: Codex goes enterprise on AWS
OpenAI made its frontier models and Codex generally available on AWS through Amazon Bedrock, including GovCloud — with AWS-native security, governance, billing and compliance. They also added dozens of new plugins (Databricks, Salesforce, Hex, Clay and more). Codex now reportedly sees over 5 million weekly users.
This one isn't about a smarter model — it's about distribution and procurement. For an enterprise already standardized on AWS, "Codex is in Bedrock" removes the single biggest blocker we hit on client projects: security and procurement sign-off. That's a bigger deal for shipping than any benchmark.
4. Anthropic: Claude Opus 4.8 and dynamic workflows
Anthropic's Claude Opus 4.8 update introduced dynamic workflows — the model automatically generates orchestration scripts and spins up multiple subagents for large tasks. Flip on the ultracode setting and it'll take on codebase migrations or security audits by fanning the work out across agents rather than grinding through one context.
This is the most architecturally interesting launch of the four. It's the same shift we wrote about in our spec-driven development piece, pushed one level up: you stop orchestrating the agents by hand and let the model decide the orchestration. For the kind of multi-day refactors and audits we get hired for, that's a real change in how the work gets done.
“A year ago you picked a model. Now you pick an architecture — and the model is a swappable part inside it.”
So what actually changed?
- There is no single "best" coding model anymore — there are four credible ones, each strongest at a different job.
- Price and speed are now competitive levers, not afterthoughts. Expect cost per token to keep falling.
- Distribution beats benchmarks: Codex-on-Bedrock and Gemini-in-Antigravity matter because of where they live, not their leaderboard scores.
- The frontier moved from "write this function" to "orchestrate this whole task" — agents managing agents.
How we're choosing tools right now
We don't bet a client's codebase on whatever shipped this week. Here's the actual rule of thumb we're running in June 2026:
- 1Default to Claude Opus 4.8 for complex, multi-file work and migrations — the dynamic-workflow orchestration is ahead for the heavy stuff we do most.
- 2Reach for Gemini 3.5 Flash when the loop is agentic and high-volume, where speed and cost per action dominate.
- 3Use Codex when the client is AWS-native and procurement is the real bottleneck — meeting them in Bedrock is worth more than a few benchmark points.
- 4Keep MAI-Code-1-Flash on the watchlist. Microsoft owns too much of the stack to ignore, but it's early.
- 5Pin model versions in production. Never let "latest" silently change under a shipping feature.
The boring truth
The model is the easy part to swap. Your spec, your tests, your review process and your architecture are what actually determine output quality. Teams that built those last year are the ones calmly trying four models this month instead of panicking about which one to marry.
Confused about which of these to actually build on — or stuck choosing between three vendors a client's board keeps asking about? That's most of our conversations lately. Send us what you're building and we'll send back a candid, no-pitch take on the right tool for it, free.