Is Google reclaiming its throne? Gemini 3.1 Pro doubles its reasoning score, further reduces hallucination rates, and maintains the same pricing.

Tencent Technology · Feb 20 19:30

In November 2025, Google's Gemini 3 Pro briefly claimed the top position but was soon dethroned by new models from OpenAI and Anthropic.

The throne of the AI world changes hands faster than the release of new smartphones.

On the night of February 19, 2026, Google made a comeback with a new model called "Gemini 3.1 Pro." The official data looks quite appealing: on a test measuring AI's ability to solve entirely new logic problems, known as ARC-AGI-2, Gemini 3.1 Pro more than doubled its score, reaching 77.1%.

Tests conducted by the third-party organization Artificial Analysis also showed that Gemini 3.1 Pro has quietly climbed to the top in terms of overall intelligence index, surpassing Claude Opus 4.6.

With an emphasis on hardcore reasoning, coding capabilities, and cost control, it presents a pragmatic stance towards collaboration with developers and enterprise users.

The hallucination rate continues to decrease; in tests measuring whether models 'pretend to know,' such as the AA-Omniscience Hallucination Rate, Gemini 3.1 Pro saw a 38% reduction compared to its predecessor.

Most importantly, performance has improved while the price remains unchanged. It seems that Google is determined to use a 'more for the same price' strategy to reclaim its lost crown. In pre-market trading of US stocks today, $Alphabet-A (GOOGL.US)$ rose by approximately 1.5%.

01, "Three-Level Thinking" Mode

The previous Gemini 3 Pro might have seemed fast and powerful, but sometimes the answers were still somewhat 'off.' With this iteration, Gemini 3.1 Pro, Google focused on enhancing 'core reasoning capabilities,' in other words, making it better at 'thinking.'

This was most evident in a test called ARC-AGI-2, which does not assess rote memorization but consists entirely of unseen new logic problems specifically designed to evaluate the true reasoning abilities of AI.

Gemini 3.1 Pro scored higher than competing products across all standard tests.

The previous score of Gemini 3 Pro was 31.1%, while Gemini 3.1 Pro surged to 77.1%. Demis Hassabis, the head of Google DeepMind, also emphasized that this marks a significant improvement in the model’s core reasoning and problem-solving capabilities.

However, the true game-changer is not just the score. Gemini 3.1 Pro has introduced a 'three-level thinking' mode—low, medium, and high. This can be understood as equipping the model with an adjustable 'computational power knob.' Simply put, users can decide how much time the model should spend thinking based on the task difficulty.

The previous Gemini 3 Pro only had two levels: low and high. This time, Gemini 3.1 Pro added a medium level and adjusted the meaning of the 'high' mode. When set to high, the model enters a state similar to Deep Think. Deep Think is the reasoning model updated by Google last week, characterized by spending more time handling complex problems. Now, Gemini 3.1 Pro can do this on its own without needing to switch separately.

This feature primarily addresses a practical issue. In the past, developers often needed to prepare multiple models for tasks of varying difficulty—one for simple conversations and another for complex reasoning. Different interfaces, billing systems, and custom logic were required to determine which model to invoke. Over time, maintaining this setup became cumbersome.

Now, a single model suffices. Use the low setting for routine tasks to get quick responses; use the high setting for complex tasks, allowing it to spend more time processing. There’s no need to switch back and forth or maintain multiple models.

02 'Seizing the Throne,' Victory in Benchmarking

Since it aims to 'seize the throne,' it inevitably has to compete against old rivals such as OpenAI’s GPT-5.2 and Anthropic’s Claude Opus 4.6.

Based on the data, Gemini 3.1 Pro proves to be quite formidable. In Artificial Analysis’ intelligence index test, it ranked first in six out of ten evaluations, including Terminal-Bench Hard (coding), GPQA Diamond (scientific knowledge), and Humanity's Last Exam (reasoning knowledge).

In Artificial Analysis’ intelligence index test, Gemini 3.1 Pro overwhelmingly outperformed its competitors.

Particularly in the AA-Omniscience hallucination rate, which tests whether models 'pretend to know what they don’t,' Gemini 3.1 Pro achieved a 38-percentage-point reduction compared to its predecessor. This indicates that it now better understands what it 'does not know' rather than fabricating answers indiscriminately.

In the AA-Omniscience test, the hallucination rate of Gemini 3.1 Pro significantly decreased.

In a CritPt test targeting research-grade physics reasoning problems, Gemini 3.1 Pro scored 18%, surpassing the second-place model by more than five percentage points. Artificial Analysis commented that this indicates Google has indeed put substantial effort into advancing fundamental intelligence this time.

However, competition in the AI field is never just about 'scoring high.' On the Arena leaderboard, which is closer to user experience, the situation is not so one-sided.

This leaderboard ranks models based on user votes for their responses, focusing not on logical correctness but on which answers appear more 'pleasing.' Currently, in text-only tasks, Claude Opus 4.6 still leads Gemini 3.1 Pro by four points, while in coding tasks, the Opus series and GPT-5.2 maintain a slight edge.

The Arena rankings may favor models whose responses 'seem correct' but are not necessarily truly accurate, whereas Gemini 3.1 Pro’s improvement in reducing hallucinations this time is precisely aimed at achieving 'genuine correctness.'

Of course, not all aspects are perfect.

Although data from Artificial Analysis shows that Gemini 3.1 Pro has made progress in real-world agent tasks, with scores increasing from 56.9% to 68.5%, in this domain, competitors like Claude Sonnet 4.6 and GPT-5.2 remain ahead.

Section 03: Not Just Coding, but Capable of Understanding the 'Atmosphere' of Wuthering Heights

Benchmark scores and rankings are ultimately just numbers. What can Gemini 3.1 Pro actually do?

Most impressive is its 'creative programming' capability. For instance, when tasked with designing a modern-style personal portfolio website for Wuthering Heights, Gemini 3.1 Pro does more than simply summarize the book's content; it can 'infer' the novel's gloomy and wild atmosphere and transform it into a stylish, contemporary interface design.

Another example is 3D interaction. Gemini 3.1 Pro can directly generate a piece of code to create a complex 3D simulation of a murmuration of starlings. You can even use your hand to track and manipulate the flock, and as the birds fly, there is accompanying background music generated based on their movement.

Andrew Carr, co-founder of the startup Cartwheel, noticed after testing that this model has significantly improved its understanding of 3D spatial transformations. The recurring issue with rotation sequences in 3D animation, which was often mishandled before, has been perfectly resolved in Gemini 3.1 Pro.

For ordinary users, one of the most practical features might be generating animated SVGs. In the past, creating a small web animation required design knowledge and editing skills. Now, by simply providing a description to Gemini 3.1 Pro, it generates a pure code-based animation that remains sharp when scaled on any screen and has an exceptionally small file size. Many consider this the beginning of 'ambient programming.'

Gemini 3.1 Pro's powerful reasoning capabilities have also broken down the barrier between complex APIs and user-friendly design. In one demonstration by Google, the model directly built a real-time aerospace data dashboard, seamlessly integrating with publicly available telemetry streams to vividly display the real-time trajectory of the International Space Station, transforming cold data interfaces into an accessible interactive experience for the average person.

Gemini 3.1 Pro directly connects to telemetry streams to build an aerospace data interface.

Notably, Shunyu Yao, who previously participated in the Gemini 3 Deep Think research, introduced this new breakthrough on social media. He specifically mentioned that this upgrade is just the beginning, stating, "Even better models will continue to be released in the future."

04 Price Unchanged

After all this, the key question arises: When can we start using Gemini 3.1 Pro? And how much will it cost?

It is already available, and there will be no price increase. Starting February 19, Gemini 3.1 Pro began rolling out gradually in preview form.

Regular users can access Gemini applications or NotebookLM (currently limited to Pro and Ultra subscribers) to try it out, while developers can invoke the Gemini API through Google AI Studio, Gemini CLI, or directly within Android Studio. For enterprise customers, Gemini 3.1 Pro is now available in Vertex AI and Gemini Enterprise.

The most surprising aspect is the pricing. Gemini 3.1 Pro maintains the exact same price as Gemini 3 Pro: starting at $2 per million tokens for input and $12 per million tokens for output. Artificial Analysis calculated that running their entire Intelligent Index test suite costs less than half of what Claude Opus 4.6 does.

Jeff Dean, Chief Scientist of Google DeepMind, also came forward to support the release. He shared a side-by-side comparison video showing that animations generated by Gemini 3.1 Pro are significantly clearer and smoother than those produced by the previous generation.

Google CEO Sundar Pichai personally emphasized the doubling of core reasoning capabilities this time and stated that the new model is highly suitable for handling complex tasks that 'turn creative projects into reality.'

Finally, it is worth noting that this release is only version '3.1' rather than '3.5' or '4.0'.

Looking to pick stocks or analyze them? Want to know the opportunities and risks in your portfolio? For all investment-related questions,just ask Futubull AI!

Editor/Lambor

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Airstar Bank. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.