Google Rolls Out Gemini 3 Flash as Its New Fast Default Across Search AI Mode and The Gemini App
Image Source: Google Blog
Google has released Gemini 3 Flash, a new model in the Gemini 3 family aimed at delivering frontier level reasoning at lower latency and lower cost. The company says it is rolling out globally as the default model in AI Mode in Google Search, bringing faster reasoning and stronger tool use without sacrificing speed.
Multiple outlets also report Google is making Gemini 3 Flash the default model in the Gemini app, replacing the earlier Gemini 2.5 Flash fast option, while still allowing users to switch to Pro models when needed.
What Google Actually Shipped
Gemini 3 itself was introduced on November 18, 2025, with Google positioning Gemini 3 Pro as its most capable multimodal model and announcing broader deployment across Google products.
Gemini 3 Flash arrived on December 17, 2025 as the speed focused sibling in that family, designed for high frequency workflows where response time and cost matter.
Where You Can Use Gemini 3 Flash Right Now
Google’s own product notes say Gemini 3 Flash is available across consumer, developer, and enterprise surfaces, including:
Google Search AI Mode, rolling out globally as the default model
Gemini app, reported as the new default fast model
Gemini API via Google AI Studio, plus Google’s agent development platform Google Antigravity
Gemini CLI for developer workflows
Android Studio and Android developer tooling
Vertex AI and Gemini Enterprise for business customers
A key point for readers is that several of these rollouts are described as preview or rolling out, which usually means availability and defaults can vary by region, account type, and product surface as Google stages deployment.
Capabilities and Limits
Google’s Gemini API model documentation lists Gemini 3 Flash Preview (model code gemini-3-flash-preview) as a multimodal input model that accepts:
Text, image, video, audio, and PDF as inputs
Text as output
Up to 1,048,576 input tokens and 65,536 output tokens
Support for batch API, caching, code execution, file search, function calling, structured outputs, “thinking”, URL context, and search grounding
Not supporting live API or image generation (in this Flash preview configuration)
Two practical implications for everyday users and IT readers:
Long context plus multimodal input means Flash is built for “lots of stuff at once” tasks, like summarising big documents, analysing mixed media, or working through long coding threads, provided you are happy with text output.
Search grounding support is central to why Google is comfortable pushing Flash into Search AI Mode, where users expect live links and web context rather than static knowledge.
Google also states Gemini 3 models have a knowledge cutoff of January 2025, which matters for offline facts inside the model. Search AI Mode can still provide up to date results through web access and citations because it is part of Search, not just the raw model.
Performance Claims
Google frames Gemini 3 Flash as a “Pareto frontier” move, balancing performance against cost and latency, and it explicitly references LMArena Elo as one of the performance measures used in its positioning.
It also makes two headline claims worth treating as vendor reported unless independently replicated:
About 3 times faster than Gemini 2.5 Pro, based on benchmarking from Artificial Analysis
78 percent on SWE bench Verified, which Google describes as outperforming both the 2.5 series and Gemini 3 Pro on that specific coding agent benchmark
Google is signalling it wants Flash to be the default “do most things quickly” model, while keeping heavier Pro options for users who want deeper reasoning or specialised capabilities.
Model Pricing
Google has published token pricing for Gemini 3 Flash on Vertex AI that lines up with the positioning of Flash as a cheaper, faster default:
Input (text, image, video): USD 0.50 per 1M tokens
Input (audio): USD 1.00 per 1M tokens
Text output (response and reasoning): USD 3.00 per 1M tokens
Cached input is priced lower, and batch API pricing is lower again
For comparison inside Google’s own stack, Vertex AI lists Gemini 3 Pro Preview at USD 2 to USD 4 per 1M input tokens (depending on context tier) and USD 12 to USD 18 per 1M output tokens, which helps explain why Google is pushing Flash as the broad default.
Grounding With Google Search Billing
Google’s published pricing indicates Gemini 3 grounding billing starts January 5, 2026, and pricing is described in terms of search queries, not prompts. Both the Gemini API pricing page and Vertex AI pricing page state an allowance of 5,000 search queries per month at no charge aggregated across Gemini 3 models, then USD 14 per 1,000 search queries, with the note that a single request can trigger one or more billable search queries.
That is important for IT cost planning because grounding spend can scale differently from token spend, especially if an application aggressively verifies facts with multiple searches per user request.
How It Stacks Up Against Other Current “Fast Default” Models
The broader market trend is that major labs are separating “fast default” from “slow thinker”, and then making the fast default the one most people actually touch day to day.
A quick pricing reality check, using official pricing pages:
Gemini 3 Flash on Vertex AI: USD 0.50 per 1M input tokens and USD 3 per 1M output tokens
OpenAI GPT 5.2: USD 1.75 per 1M input tokens and USD $14 per 1M output tokens (with discounted cached input)
Anthropic Claude Sonnet 4.5: USD $3 per 1M input tokens and USD $15 per 1M output tokens (plus caching and batch options)
xAI Grok 4.1 Fast: xAI lists very low token pricing for its “Fast” tier, with separate cached input pricing
This does not prove “best model wins”, but it does show why Google can afford to push Flash deep into Search and the Gemini app: cheap inference makes default deployment financially viable, especially when usage volumes are Search sized.
We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.
