xAI's Grok-4 Outperforms OpenAI and Google in AI Benchmarks

Date: 10.07.2025 | Traffic: 100+

xAI, Elon Musk's AI company, is releasing Grok-4 on July 10th, which has already surpassed OpenAI and Google's models in AI benchmark testing. The new model demonstrates superior reasoning, coding capabilities, and multimodal functionality.

xAI's Grok-4 has achieved the highest score of 73 on the Artificial Analysis Intelligence Index, surpassing OpenAI's o3-pro (71) and Google's 2.5 pro (70). The index incorporates MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, and Math-500 evaluations.

Elon Musk highlighted Grok-4's capabilities in a livestream, stating, "Grok4 scores perfect scores on SAT, and near-perfect scores on graduate exams like the GRE. It is smarter than all graduate students in all fields, simultaneously." He also noted that Grok-4 has been trained with nearly the same amount of pretraining compute as Grok3 but has nearly as much reasoning compute.

Grok-4's enhanced reasoning abilities are evident in its performance on Humanity's Last Exam, where it scored 26.9%, significantly higher than Google 2.5 pro's 21%. With access to tools, the model achieved an even more impressive score of 41%.

Industry experts predict this level of advancement will significantly impact various sectors, including education, research, and technology development. The launch of Grok-4 signals a new era in AI capabilities focused on advanced reasoning and problem-solving, potentially accelerating innovation across multiple domains.