Google Gemini 2.5 Flash Beats DeepSeek R1 While Slashing Costs by 72%
Image Credit: Jacky Lee
Google has announced Gemini 2.5 Flash, a new artificial intelligence model aimed at delivering high performance, low latency, and affordability for developers and businesses. Designed for real-time applications like chatbots, virtual assistants, and data processing, Gemini 2.5 Flash builds on the Gemini 2.0 family, enhancing reasoning capabilities while emphasizing cost efficiency.
[Read More: Snapchat and Google Cloud Team Up to Supercharge My AI with Gemini’s Multimodal Magic]
Model Overview
Described as a "workhorse model", Gemini 2.5 Flash is optimized for high-volume, real-time operations, offering a balance between speed, performance, and affordability. It complements the more advanced Gemini 2.5 Pro, launched in March 2025, by targeting less computationally intensive tasks.
The model supports multimodal inputs—text, images, video, and audio—with a 1-million-token context window. This allows processing of extensive datasets in a single prompt, including 3,000 images (7MB each), 45-minute videos with audio, 1-hour videos without audio, or 8.4 hours of audio. Google plans to expand this window to 2 million tokens soon.
Currently available in preview via Google AI Studio and Vertex AI, Gemini 2.5 Flash is accessible for experimentation and enterprise-grade deployment, with general availability expected later this year.
[Read More: Google's Gemini AI Chatbot App Now Available on iPhone]
Hybrid Reasoning and Thinking Budgets
A key innovation in Gemini 2.5 Flash is its hybrid reasoning capability. Developers can adjust a "thinking budget"—ranging from 0 to 24,576 tokens—controlling the computational resources allocated for reasoning.
For simple tasks like factual lookups, reasoning can be minimized or disabled to reduce costs and response times. For complex operations like multi-step problem-solving, more resources can be allocated for deeper analysis. This functionality is managed via a slider or configured through the Gemini API.
When reasoning is disabled, Gemini 2.5 Flash matches the fast response times and cost profile of its predecessor, Gemini 2.0 Flash, while offering improved accuracy, with a 12.1% score on Humanity’s Last Exam compared to 5.1% for 2.0 Flash.
[Read More: Google Gemini AI: Search History and Calendar Data in Focus?]
Performance and Benchmarks
Gemini 2.5 Flash demonstrates strong performance across mathematics, science, code generation, code editing, and visual reasoning tasks. Benchmark results show:
AIME 2025 (math): 78.0%
GPQA Diamond (science): 78.3%
On independent evaluations like the LMArena leaderboard, Gemini 2.5 Flash ties for second place alongside OpenAI’s GPT-4.5 Preview and xAI’s Grok-3, excelling particularly in hard prompts, coding, and longer queries.
Notably, Flash is approximately 5–10 times cheaper than Gemini 2.5 Pro, priced at US$0.15 per million input tokens and US$0.60 per million output tokens without reasoning, compared to Pro’s US$1.25/US$10 pricing.
[Read More: New AI Flaw Lets Hackers Trick Chatbots Like Google Gemini, Study Finds]
Gemini 2.5 Flash vs. OpenAI o4-mini
When comparing Gemini 2.5 Flash to OpenAI’s o4-mini model, the results show that while Flash offers strong performance for its price, o4-mini leads slightly in several important areas. On Humanity’s Last Exam without using any extra tools, Gemini 2.5 Flash scored 12.1%, while o4-mini achieved 14.3%. In a science-focused test called GPQA Diamond, Flash reached 78.3% compared to o4-mini’s 81.4%. In mathematics, Flash posted a 78% score on the AIME 2025 benchmark, whereas o4-mini performed higher at 92.7%. For multilingual understanding, Flash scored 51.5% on the Aider Polyglot test, while o4-mini achieved 68.9%. On the MMMU benchmark, which challenges models to combine visual information with text to solve problems across different subjects, Flash scored 76.7%, slightly behind o4-mini’s 81.6%. Overall, while Gemini 2.5 Flash is highly capable and cost-efficient, OpenAI’s o4-mini generally shows stronger results, particularly in advanced reasoning and visual understanding tasks.
[Read More: OpenAI's 12 Days of AI: Innovations from o1 Model to o3 Preview and Beyond]
Gemini 2.5 Flash vs. DeepSeek R1
Against DeepSeek’s R1 model, Gemini 2.5 Flash shows stronger performance across several major benchmarks. On Humanity’s Last Exam without using extra tools, Flash scored 12.1%, outperforming R1’s 8.6%. In science (GPQA Diamond), Flash reached 78.3% compared to R1’s 71.5%. In mathematics (AIME 2025), Flash scored 78%, ahead of R1’s 70%. However, on coding tasks measured by LiveCodeBench v5, R1 slightly edged ahead, with 64.3% compared to Flash’s 63.5%. In multilingual understanding, Flash scored 51.5% on the Aider Polyglot test, while R1 scored 56.9%. For factual question answering (SimpleQA), Flash and R1 were very close, with Flash at 29.7% and R1 at 30.1%. Despite these mixed results, Gemini 2.5 Flash offers a major pricing advantage, with input costs at $0.15 per million tokens and output costs at US$0.60, compared to DeepSeek R1’s $0.55 input and $2.19 output per million tokens. This makes Flash a more budget-friendly option for businesses managing large-scale AI deployments.
[Read More: Repeated Server Errors Raise Questions About DeepSeek's Stability]
Applications and Accessibility
Gemini 2.5 Flash is well-suited for:
Chatbots and Virtual Assistants: Fast, responsive conversational agents for customer service.
Data Extraction and Summarization: Efficiently handling large documents or datasets with up to 95% accuracy.
Enterprise-Scale AI Deployments: Supporting automated workflows and internal tools with high scalability.
The model is available via Google AI Studio’s free tier (50 messages/day) and through Vertex AI for enterprise users. It is also offered as "2.5 Flash (Experimental)" in the standalone Gemini app for all users, including free-tier access.
For industries requiring on-premises solutions, such as finance and healthcare, Google plans to offer Flash on Google Distributed Cloud and Nvidia Blackwell systems starting in Q3 2025.
In a related initiative, Google is providing U.S. college students free access to Gemini Advanced, which includes Gemini 2.5 Pro, until June 2026—part of its broader strategy to expand AI accessibility.
[Read More: Top 10 AI Chatbots You Need to Know in 2025]
Safety and Transparency Concerns
Unlike Gemini 2.5 Pro, Gemini 2.5 Flash has not been accompanied by a detailed technical or safety report. While a model card offers high-level information, experts have raised concerns about the lack of full transparency, especially given Google's previous commitment to the Frontier Safety Framework introduced in 2024.
This gap could impact developer trust, as comprehensive documentation is vital for evaluating model strengths, limitations, biases, and safety measures.
[Read More: 10 US Stocks in AI Worth Watching for Growth in 2025]
Market Context
Gemini 2.5 Flash enters a competitive field dominated by OpenAI’s ChatGPT, which reportedly has over 800 million weekly users, compared to Gemini’s estimated 250–350 million monthly users.
Google’s pricing strategy—US$0.15 per million input tokens and US$0.60–US$3.50 per million output tokens—positions Flash as an attractive option for enterprises managing large-scale deployments. The flexible reasoning options align with industry trends toward more task-specific, cost-efficient AI models, as seen with OpenAI’s o3-mini and DeepSeek’s R1.
Source: Google DeepMind, Tech Crunch, Venture Beat