Gemini 3.5 Flash is 4x faster than other frontier models, and Google says an optimized version runs 12x faster at the same quality — a speed claim that explains why this launch is about agents, not chat.
Google introduced Gemini 3.5 Flash on Tuesday at Google I/O, positioning it as its strongest model yet for coding and autonomous AI agents, according to TechCrunch. The model can independently execute coding pipelines, manage research projects, and, in internal tests, build an operating system from scratch.
That is the strategic signal. Google is moving the Gemini pitch away from “ask a question, get an answer” and toward delegated digital work: plan, execute, test, revise, and ask for help only when judgment or permission is needed. For developers and enterprises, the value proposition shifts from conversational polish to completed workflows.
“3.5 Flash offers an incredible combination of quality and low latency. It outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks,” Koray Kavukcuoglu, DeepMind’s chief technologist, told reporters.
Gemini 3.5 Flash turns prompts into long-running work
The most important part of Gemini 3.5 Flash is not that it can chat better. It is that Google says it can run autonomously for multiple hours.
That matters because agentic systems behave differently from chatbots. A chatbot answers. An agent breaks work into steps, calls tools, checks outputs, and keeps going. In Google’s framing, Flash is designed for that loop.
At I/O, Google engineer Varun Mohan demonstrated agents splitting off to work on separate components before coming together to build a full operating system inside Antigravity, Google’s agentic development platform and IDE. Kavukcuoglu said Flash 3.5 was co-developed with Antigravity so agents would have a “native environment where they can live, work, and execute.”
Google also released Antigravity 2.0, a stand-alone desktop application built around agent-first development. That ties the model launch to tooling, not just model access. It also matches the direction we covered in Google’s I/O developer push around Gemini and AI agents.
Speed is the product feature, not just a benchmark flex
Google’s Flash branding has always implied a tradeoff: faster, lighter, cheaper-to-run models for high-volume use cases. With 3.5 Flash, Google is arguing that tradeoff has narrowed.
In a Google blog post, the company said 3.5 Flash outperforms Gemini 3.1 Pro on coding and agentic benchmarks including:
| Benchmark | Google-reported result for Gemini 3.5 Flash |
|---|---|
| Terminal-Bench 2.1 | 76.2% |
| GDPval-AA | 1656 Elo |
| MCP Atlas | 83.6% |
| CharXiv Reasoning | 84.2% |
Google also said the model delivers 4 times the output tokens per second of other frontier models, and that an optimized version is 12x faster with the same quality.
For agentic AI, latency is not cosmetic. If one agent performs ten steps, tests code, spins off subagents, reads files, and retries failures, small delays compound. A model that is merely impressive in a chat window can become expensive and sluggish when asked to run a multi-hour software task.
That is why the economics matter. Google says 3.5 Flash can complete some long-horizon tasks “often at less than half the cost of other frontier models.” The source material does not provide public pricing, so buyers still lack the key metric: cost per completed task. But Google is clearly trying to make Flash the workhorse model for high-volume agent execution, not the trophy model for demos.
Coding is the proving ground because failure is easier to see
Software development is a natural test bed for agents because the outputs can be inspected. Code compiles or it does not. Tests pass or fail. Dependencies exist or they do not. That does not eliminate hallucination or bad architecture, but it gives teams more measurable feedback than many knowledge-work tasks.
Google’s examples lean into this. The company says 3.5 Flash can develop new applications, maintain codebases, prepare financial documents, rename and categorize unstructured assets, transform legacy code into Next.js, and generate different UX approaches for a checkout flow in 60 seconds.
The enterprise examples are more revealing than the stage demo:
- Shopify is running subagents in parallel to analyze complex data for merchant growth forecasts.
- Macquarie Bank is piloting the model for customer onboarding by reasoning over 100+ page documents.
- Salesforce is integrating 3.5 Flash into Agentforce for multi-subagent enterprise tasks.
- Ramp is using it for OCR on complex invoices and reasoning over historical patterns.
- Xero is deploying agents for multi-week workflows tied to 1099 tax forms.
- Databricks is using agentic workflows to diagnose issues and propose fixes for data scientists.
MLXIO analysis: these examples show Google aiming Flash at boring, costly work rather than flashy consumer prompts. That is where agentic AI has a clearer business case: document-heavy onboarding, internal data analysis, invoice processing, code migration, and workflow automation.
For more on the cost side of this strategy, see our prior analysis of Cheap AI Agents: Google’s Gemini 3.5 Flash Bets Big.
Gemini 3.5 Pro becomes the planner, Flash becomes the labor layer
Google’s next move is already telegraphed. Gemini 3.5 Pro is coming, and the company says Pro and Flash are designed to work together.
Tulsee Doshi, Google’s senior director and head of product, described the split to TechCrunch:
“3.5 Pro becomes your orchestrator, your planner, and then it actually can leverage Flash to be the various sub-agents. I think it really comes down to where do you really want that reasoning power, where you actually want that larger model that can really push on the reasoning side versus where do you have tasks that really do merit good brute force tool use capabilities?”
That architecture is the real product story. Google is not pitching one model to do everything. It is building a hierarchy: a stronger reasoning model plans, while faster Flash agents execute subtasks.
If that works, the unit of AI consumption changes. Users will not simply choose a model. They will launch a swarm of specialized agents inside tools such as Antigravity, Gemini API, Gemini Enterprise, the Gemini app, and AI Mode in Search.
Search and Gemini Spark bring agent risk to consumers
The launch is not confined to developers. 3.5 Flash is now the default model in the Gemini app and AI Mode in Search globally. Google also announced agentic capabilities coming to Search, letting users create, customize, and manage AI agents directly on the platform.
The model will also power Gemini Spark, Google’s personal AI agent designed to run 24/7 to help consumers manage their digital life. That expands the stakes beyond enterprise automation.
The risk profile changes when an AI system moves from advice to action. Google is also facing a lawsuit in which plaintiffs allege that weeks of Gemini chats preceded a user’s severe crisis and suicide, according to TechCrunch; those claims have not been independently established here. Broader access to autonomous agents raises harder questions around escalation, permissions, sensitive content, and user dependency.
Google says Gemini 3.5 has strengthened cyber and CBRN safeguards and is better calibrated to engage with sensitive questions rather than refuse them outright. That last point is important: refusal alone is not a complete safety strategy for agents that may need to handle ambiguous or high-stakes tasks.
Buyers should judge agents by supervision cost
For software teams and AI buyers, the key question is no longer access. Gemini 3.5 Flash is available generally today through Antigravity, the Gemini API, Gemini Enterprise, the Gemini app, and AI Mode in Search.
The question is supervision.
A useful agent must reduce total work, not just move work from coding to review. Buyers should measure:
- Completion cost: What does a finished workflow cost, including retries and human review?
- Error rate: How often does the agent create hidden defects or bad assumptions?
- Permission control: When does it pause for approval, and can that behavior be audited?
- Integration quality: Can it operate safely inside existing codebases, data systems, and enterprise workflows?
- Time saved: Does it compress multi-week work, as Google says partners are testing, or merely generate more output to inspect?
MLXIO analysis: the first adopters with the clearest payoff will likely be teams with well-defined workflows, good documentation, and strong review processes. Messy systems will not magically become agent-ready because the model is faster.
The next AI platform fight moves from chat windows to delegated labor
Gemini 3.5 Flash is less a model launch than a statement about where Google thinks AI monetization is headed. The chat interface was the entry point. The larger prize is delegated labor across code, documents, search, enterprise systems, and consumer software.
The evidence to watch is not another polished I/O demo. It is whether Google can show repeatable production use: lower review burden, fewer failed agent runs, clearer audit trails, and real cost-per-task advantages over slower frontier models.
If 3.5 Pro successfully acts as planner and 3.5 Flash becomes the fast execution layer, Google will have a more credible agent platform. If users still need to babysit every long-running task, then Flash will remain a faster chatbot with better demos — not the operating layer for AI work.
The Bottom Line
- Google is shifting Gemini from chatbot interactions toward autonomous digital work.
- Lower latency could make multi-hour AI agents more practical for coding and enterprise workflows.
- The launch signals that speed and tool-use may matter as much as raw model intelligence in the next AI wave.










