LLM Model Leaderboard

Ping An's Financial LLM Ranks First in CNFinBench Evaluation

The latest CNFinBench evaluation included a range of models representing the forefront of global artificial intelligence (AI) capabilities, including GPT-4o and Claude Sonnet 4, as well as mainland ...

Ars Technica

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

On Tuesday, Anthropic’s Claude 3 Opus large language model (LLM) surpassed OpenAI’s GPT-4 (which powers ChatGPT) for the first time on Chatbot Arena, a popular crowdsourced leaderboard used by AI ...

Virtualization Review

Meta's Llama 3 Cracks Top 5 of AI Leaderboard, Only Non-Proprietary Model

Meta's brand-new Llama 3 large language model (LLM) debuted among the top 5 on an AI leaderboard, being the only non-proprietary model. Meta yesterday (April 18) announced the new open-source model ...

Business Wire

H2O-Danube2-1.8B Achieves Top Ranking on Hugging Face Open LLM Leaderboard for 2 Billion (2B) Parameters Range

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--H2O.ai, the open source leader in Generative AI and machine learning, is proud to announce that its latest open-weights (Apache v2.0) small language model, ...

Virtualization Review

AI's Heavy Hitters: Best Models for Every Task

In today's crowded AI landscape, organizations looking to leverage AI models are faced with an overwhelming number of options. But how to choose? An obvious starting point are all the various AI ...

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range ...

Ars Technica

New Grok 3 release tops LLM leaderboards despite Musk-approved “based” opinions

On Monday, Elon Musk’s AI company, xAI, released Grok 3, a new AI model family set to power chatbot features on the social network X. This latest release adds image analysis and simulated reasoning ...

Which company has the #2 AI model end of March? (Style Control On)

Live pricing, charts, and volume for the Which company has the #2 AI model end of March? (Style Control On) prediction market from Yahoo Finance.

TechSpot

GPT-4 loses its position as "best" LLM to Claude-3 in LMSYS benchmark

In context: It seems as if everyone who is anyone has thrown their hats and their money into developing large language models. This AI explosion prompted a need to benchmark them for comparison. So, ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results