By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
inkeinspires.cominkeinspires.cominkeinspires.com
Notification Show More
Font ResizerAa
  • Home
  • Breaking News
    Breaking NewsShow More
    Brazil’s outspoken first lady comes under fire, but refuses to stop speaking out
    June 27, 2025
    2 charged with murder after bride shot dead, groom and 13-year-old nephew wounded at wedding party in France
    June 27, 2025
    Political violence is quintessentially American | Donald Trump
    June 27, 2025
    19 Virginia sheriffs endorse Miyares over Democrat Jones in attorney general race
    June 27, 2025
    China battery giant CATL is expanding globally: Here’s why it matters
    June 27, 2025
  • Business
    BusinessShow More
    Canara Bank hands over Rs 2,283 cr dividend to Centre amid record profits, joins SBI, BoB in robust payouts
    June 27, 2025
    Foreign stocks are crushing US shares, even with the new record high
    June 27, 2025
    Videos reveal driving issues with Tesla’s robotaxi fleet in Austin
    June 27, 2025
    US stocks hit record high as markets recover from Trump tariff shock
    June 27, 2025
    Renewables leaders parse the damage to their industry as Senate finalizes vote on ‘big beautiful bill’
    June 27, 2025
  • Entertainment
    EntertainmentShow More
    Terminator’s Forgotten First Attempt To Save Itself
    June 27, 2025
    Meghan Markle’s $658 Weekender Tote Look Is $36 on Amazon
    June 27, 2025
    Armed Elderly Woman Blocks Texas Highway In 5-Hour Standoff
    June 27, 2025
    Inside Kevin Spacey’s ‘Substantial’ Hollywood Return
    June 27, 2025
    12 Best Movies Like M3GAN
    June 27, 2025
  • Gadgets
    GadgetsShow More
    CES 2025: 41 Products You Can Buy Right Now
    January 13, 2025
    I can’t wait try out these 3 great plant tech gadgets that I saw at CES 2025
    January 13, 2025
    6 on Your Side Consumer Confidence: Kitchen gadgets to upgrade family recipes – ABC 6 News
    January 13, 2025
    35+ Best New Products, Tech and Gadgets
    January 13, 2025
    These gadgets kept me connected and working through a 90-mile backpacking trip
    January 13, 2025
  • Health
    HealthShow More
    A New Study Finds An 8-Hour Eating Window May Help Burn Fat—But Is It Safe? inkeinspires
    June 27, 2025
    184: Crafting a Morning Routine That Works For YOU inkeinspires
    June 26, 2025
    Endurance Exercise and Longevity – BionicOldGuy inkeinspires
    June 26, 2025
    How Zone 2 Cardio Can Burn Fat And Boost Longevity inkeinspires
    June 26, 2025
    What to do when an exercise doesn’t feel right inkeinspires
    June 25, 2025
  • Sports
    SportsShow More
    Brentford appoint former Wolves midfielder Andrews as boss
    June 27, 2025
    Real Betis still hopeful over ‘very complex’ deal for Manchester United’s Antony
    June 27, 2025
    Sri Lanka ODI squad vs Bangladesh announced, Matheesha Pathirana dropped
    June 27, 2025
    Rohit Sharma reveals the unsung hero behind India’s T20 World Cup 2024 triumph
    June 27, 2025
    Keyshawn Davis Under Fire: Fans Blast “Truth Will Reveal Itself” Apology After Missed Weight & Stripped Title
    June 27, 2025
  • Technology
    TechnologyShow More
    US Supreme Court Upholds Texas Porn ID Law
    June 27, 2025
    SCOTUS porn ruling opens door to sweeping internet age verification
    June 27, 2025
    Early Prime Day deals include our favorite mesh Wi-Fi router for a record-low price
    June 27, 2025
    Best Smart Home Safes for 2025: We Cracked the Code
    June 27, 2025
    Mattress Shopping Terms to Know (2025)
    June 27, 2025
  • Posts
    • Post Layouts
    • Gallery Layouts
    • Video Layouts
    • Audio Layouts
    • Post Sidebar
    • Review
      • User Rating
    • Content Features
    • Table of Contents
  • Contact US
  • Pages
    • Blog Index
    • Search Page
    • Customize Interests
    • My Bookmarks
    • 404 Page
Reading: Did xAI lie about Grok 3’s benchmarks?
Share
Font ResizerAa
inkeinspires.cominkeinspires.com
  • Entertainment
Search
  • Home
  • Categories
    • Breaking News
    • Business
    • Sports
    • Technology
    • Entertainment
    • Gadgets
    • Health
  • Contact
Have an existing account? Sign In
Follow US
inkeinspires.com > Technology > Did xAI lie about Grok 3’s benchmarks?
Technology

Did xAI lie about Grok 3’s benchmarks?

MTHANNACH
Last updated: February 23, 2025 12:50 am
MTHANNACH Published February 23, 2025
Share
SHARE

The debates on AI’s references – and how they are reported by the AI ​​- laboratories spread in the public.

This week, an OpenAi employee accused The AI ​​of Elon Musk, XAI, of the publication of the deceptive reference results for its latest IA model, Grok 3. One of the co-founders of Xai, Igor Babushkin, insisted that the company was on the right.

The truth is somewhere between the two.

In a Publish on Xai’s blogThe company has published a graph showing the performances of Grok 3 on Aime 2025, a collection of mathematical questions difficult for a recent mathematics exam by invitation. Some experts have questioned the validity of the love as a reference AI. Nevertheless, the versions like 2025 and older of the test are commonly used to probe the mathematical capacity of a model.

Xai’s graph has shown two variants of Grok 3, Grok 3 reasoning Beta and Grok 3 Mini reasoning, beating the best efficient model of Openai, O3-Mini-High, on Aime 2025. But the employees of Openai on X quickly pointed out that XAI’s graphic did not include the AIME 2025 score from O3-Mini-High to “Cons @ 64”.

What could you ask? Well, it is short for “consensus @ 64”, and it essentially gives a 64 model trying to respond to each problem in a reference and takes the most frequently generated answers as the final responses. As you can imagine, Cons @ 64 tends to stimulate the reference scores of models a little, and omit it from a graph could give the impression that a model exceeds another while in reality , this is not the case.

Grok 3 Reasoning Beta and Grok 3 Mini Reasoning Scores for Like 2025 at “@ 1” – which means that the first score that the models have obtained on the reference – fall below the score of O3 -Mini -High. Grok 3 Reashing Beta also always follows so slightly behind the O1 model of Openai on “average” computing. However, Xai is Grok 3 advertising Like “the most intelligent AI in the world”.

Babykin Articulated on x This Openai has published deceptive reference graphics in the past – although graphics comparing the performance of its own models. A more neutral part in the debate has set up a more “precise” graphic showing almost the performance of all models at Cons @ 64:

Hilarious how some people see my intrigue as an attack on Openai and others as an attack on Grok when in reality it is a deep propaganda
(I actually believe that Grok looks good there, and TTC Chicanery of Openai behind O3-Mini- * High * -pass @ “” “1” “” deserves more exam.) pic.twitter.com/3wh8foufic

– Teortaxes ▶ ️ (Deepseek 推特🐋铁粉 2023 – ∞) (@teortaxestex) February 20, 2025

But as a researcher in AI Nathan Lambert underlined in a postPerhaps the most important metric remains a mystery: the cost of calculation (and monetary) that took each model to obtain its best score. This simply shows how most IA markers do little on the limits of the models – and their strengths.

You Might Also Like

Do AI reasoning models require new approaches to prompting?

Now you can raise Anakin Skywalker with this Darth Vader Tamagotchi

Best Bird Feeders With Cameras, Tested and Reviewed (2025)

This Newly Discovered Asteroid Could Impact Earth. Here’s What to Know

National Security Council adds Gmail to its list of bad decisions

Share This Article
Facebook X Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Subscribe to Our Newsletter
Subscribe to our newsletter to get our newest articles instantly!
loader

Email Address*

Name

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!
[mc4wp_form]
Popular News
Sports

Wayne Rooney says Manchester United need a mass clear out and reveals the two players he’d keep

MTHANNACH MTHANNACH March 13, 2025
A Young Sheldon Actor Sang The CBS Show’s Theme Song
Reacts To Selena Gomez Video
Is 50% Too Much? My Advisor Wants Me to Load Up on Annuities
Nvidia sheds almost $600 billion in market cap, biggest drop ever
- Advertisement -
Ad imageAd image
Global Coronavirus Cases

Confirmed

0

Death

0

More Information:Covid-19 Statistics

Categories

  • Business
  • Breaking News
  • Entertainment
  • Technology
  • Health
  • Sports
  • Gadgets
We influence 20 million users and is the number one business and technology news network on the planet.
Quick Link
  • My Bookmark
  • InterestsNew
  • Contact Us
  • Blog Index
Top Categories
  • Entertainment

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

 

All Rights Reserved © Inkinspires 2025
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?