On Tuesday, Google announced Gemini 3, its latest generative AI model. Coming to millions of users through Search, the Gemini app, AI Studio, Vertex AI, and the new agent-focused platform Google Antigravity, Gemini 3 is said to bring major improvements in reasoning, context awareness, and multimodal AI. Google even claims the model can understand and process text, images, video, audio, and code better than anything released so far.
According to Google executives, Gemini 3 outperforms previous versions on AI leaderboards, reaching 1501 Elo on LMArena, and showing PhD-level results with 37.5% on Humanity’s Last Exam and 91.9% on GPQA Diamond.
The new model comes with Deep Think mode, delivering even higher reasoning scores when prompted with complex issues. Gemini 3 also scores high in mathematics and multimodal reasoning, with Google showing results like 23.4% on MathArena Apex, 81% on MMMU-Pro for multimodal reasoning, and 87.6% on Video-MMMU.
Google is now letting developers and users access Gemini 3 across multiple platforms, with direct coding agents built into Google Antigravity. Gemini 3 is supposed to offer enhanced factual accuracy, with a reported 72.1% score on SimpleQA Verified.
But how does the new model compare with the old? Google says Gemini 3 “outperforms Gemini 2.5 Pro on every major AI benchmark,” with most improvements ranging from 20-50%, though some areas like visual reasoning and math show gains of 500% or more.
The model is said to have gone through a comprehensive safety and security assessment, with Google adding that feedback from outside experts shaped its release.

