Google Rolls Out Gemini 3 Flash as Its New Fast Default Across Search AI Mode and The Gemini App

Image Source: Google Blog

Google has released Gemini 3 Flash, a new model in the Gemini 3 family aimed at delivering frontier level reasoning at lower latency and lower cost. The company says it is rolling out globally as the default model in AI Mode in Google Search, bringing faster reasoning and stronger tool use without sacrificing speed.

Multiple outlets also report Google is making Gemini 3 Flash the default model in the Gemini app, replacing the earlier Gemini 2.5 Flash fast option, while still allowing users to switch to Pro models when needed.

What Google Actually Shipped

Gemini 3 itself was introduced on November 18, 2025, with Google positioning Gemini 3 Pro as its most capable multimodal model and announcing broader deployment across Google products.

Gemini 3 Flash arrived on December 17, 2025 as the speed focused sibling in that family, designed for high frequency workflows where response time and cost matter.

Where You Can Use Gemini 3 Flash Right Now

Google’s own product notes say Gemini 3 Flash is available across consumer, developer, and enterprise surfaces, including:

  • Google Search AI Mode, rolling out globally as the default model

  • Gemini app, reported as the new default fast model

  • Gemini API via Google AI Studio, plus Google’s agent development platform Google Antigravity

  • Gemini CLI for developer workflows

  • Android Studio and Android developer tooling

  • Vertex AI and Gemini Enterprise for business customers

A key point for readers is that several of these rollouts are described as preview or rolling out, which usually means availability and defaults can vary by region, account type, and product surface as Google stages deployment.

Capabilities and Limits

Google’s Gemini API model documentation lists Gemini 3 Flash Preview (model code gemini-3-flash-preview) as a multimodal input model that accepts:

  • Text, image, video, audio, and PDF as inputs

  • Text as output

  • Up to 1,048,576 input tokens and 65,536 output tokens

  • Support for batch API, caching, code execution, file search, function calling, structured outputs, “thinking”, URL context, and search grounding

  • Not supporting live API or image generation (in this Flash preview configuration)

Two practical implications for everyday users and IT readers:

  • Long context plus multimodal input means Flash is built for “lots of stuff at once” tasks, like summarising big documents, analysing mixed media, or working through long coding threads, provided you are happy with text output.

  • Search grounding support is central to why Google is comfortable pushing Flash into Search AI Mode, where users expect live links and web context rather than static knowledge.

Google also states Gemini 3 models have a knowledge cutoff of January 2025, which matters for offline facts inside the model. Search AI Mode can still provide up to date results through web access and citations because it is part of Search, not just the raw model.

Performance Claims

Google frames Gemini 3 Flash as a “Pareto frontier” move, balancing performance against cost and latency, and it explicitly references LMArena Elo as one of the performance measures used in its positioning.

It also makes two headline claims worth treating as vendor reported unless independently replicated:

  • About 3 times faster than Gemini 2.5 Pro, based on benchmarking from Artificial Analysis

  • 78 percent on SWE bench Verified, which Google describes as outperforming both the 2.5 series and Gemini 3 Pro on that specific coding agent benchmark

Google is signalling it wants Flash to be the default “do most things quickly” model, while keeping heavier Pro options for users who want deeper reasoning or specialised capabilities.

Model Pricing

Google has published token pricing for Gemini 3 Flash on Vertex AI that lines up with the positioning of Flash as a cheaper, faster default:

  • Input (text, image, video): USD 0.50 per 1M tokens

  • Input (audio): USD 1.00 per 1M tokens

  • Text output (response and reasoning): USD 3.00 per 1M tokens

  • Cached input is priced lower, and batch API pricing is lower again

For comparison inside Google’s own stack, Vertex AI lists Gemini 3 Pro Preview at USD 2 to USD 4 per 1M input tokens (depending on context tier) and USD 12 to USD 18 per 1M output tokens, which helps explain why Google is pushing Flash as the broad default.

Grounding With Google Search Billing

Google’s published pricing indicates Gemini 3 grounding billing starts January 5, 2026, and pricing is described in terms of search queries, not prompts. Both the Gemini API pricing page and Vertex AI pricing page state an allowance of 5,000 search queries per month at no charge aggregated across Gemini 3 models, then USD 14 per 1,000 search queries, with the note that a single request can trigger one or more billable search queries.

That is important for IT cost planning because grounding spend can scale differently from token spend, especially if an application aggressively verifies facts with multiple searches per user request.

How It Stacks Up Against Other Current “Fast Default” Models

The broader market trend is that major labs are separating “fast default” from “slow thinker”, and then making the fast default the one most people actually touch day to day.

A quick pricing reality check, using official pricing pages:

  • Gemini 3 Flash on Vertex AI: USD 0.50 per 1M input tokens and USD 3 per 1M output tokens

  • OpenAI GPT 5.2: USD 1.75 per 1M input tokens and USD $14 per 1M output tokens (with discounted cached input)

  • Anthropic Claude Sonnet 4.5: USD $3 per 1M input tokens and USD $15 per 1M output tokens (plus caching and batch options)

  • xAI Grok 4.1 Fast: xAI lists very low token pricing for its “Fast” tier, with separate cached input pricing

This does not prove “best model wins”, but it does show why Google can afford to push Flash deep into Search and the Gemini app: cheap inference makes default deployment financially viable, especially when usage volumes are Search sized.

3% Cover the Fee
TheDayAfterAI News

We are a leading AI-focused digital news platform, combining AI-generated reporting with human editorial oversight. By aggregating and synthesizing the latest developments in AI — spanning innovation, technology, ethics, policy and business — we deliver timely, accurate and thought-provoking content.

Previous
Previous

How Google’s Gemini Double-Check Tool Fights AI Hallucinations: A User Guide

Next
Next

AUAV Outlines AI Driven Drone Inspections for Solar Farms, Using Thermal and Visual Data