Benchmark Model - Search News

PewDiePie reveals months-long AI experiment that he says tops ChatGPT

PewDiePie has revealed that he trained his own AI model and claims it outperformed ChatGPT on a coding benchmark.

OpenAI Says Benchmark Used to Measure AI Coding Skill Is 'Contaminated'—Here's Why

OpenAI wants to retire the leading AI coding benchmark—and the reasons reveal a deeper problem with how the whole industry measures itself.

Google’s Latest Gemini 3.1 Pro Model Is a Benchmark Beast

Google just released its most capable Gemini 3.1 Pro AI model that beats all frontier models on Humanity's Last Exam and ...

Google debuts Gemini 3.1 Pro: New frontier model sets benchmark records

Google has unveiled Gemini 3.1 Pro, an upgraded AI model that outperforms its predecessor and competitors on major logic and ...

9don MSN

Google releases Gemini 3.1 Pro: Benchmark performance, how to try it

Google says that its most advanced thinking model yet outperforms Claude and ChatGPT on Humanity's Last Exam and other key ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

8don MSN

Google’s new Gemini Pro model has record benchmark scores—again

Gemini 3.1 Pro promises a Google LLM capable of handling more complex forms of work.

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results