Programming Language Benchmarks

How-To Geek on MSN

Long before ChatGPT, this programming language only worked if you said "please"

Politeness isn't even the weirdest thing about the language.

Autonomous AI Coding Clears 60,000-Line Ceiling: MirrorCode Benchmark Released

AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...

Slator

AI Translation’s Key Benchmark Takes Aim at Low-Resource Languages

The Eleventh Conference on Machine Translation (WMT26) has moved into its active evaluation phase, with test data releases and submission windows now opening across several of the conference’s shared ...

Morning Overview on MSN

Alibaba’s Qwen released three AI models built to drive robots

Alibaba’s Qwen team published three separate AI models designed to give robots the ability to see, manipulate objects, and move through physical spaces. The models, called Qwen-RobotWorld, ...

Crypto Briefing

Kimi AI releases open-source K2.7 Code model with 1 trillion parameters on APIs and Hugging Face

Moonshot AI just dropped Kimi-K2.7-Code, an open-source coding model that wants to make AI-assisted programming less wasteful and more capable. The Beijing-based company claims the model cuts ...

Geeky Gadgets

DeepSWE AI Coding Model Benchmark Finally Solves AI Training Data Contamination

DeepSWE, created by DataCurve offers a benchmark for assessing AI coding models by focusing on real-world programming challenges rather than synthetic test cases. According to Matthew Berman, one of ...

VentureBeat

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude Opus, and ...

1mon

Is there systematic religious bias in AI models? What new research says

ChatGPT, Claude, Grok, Gemini and other AI models display systematic religious bias, according to scientific research from computer scientists at a new group of four major faith-based universities.

CIO

AI is ready to take over Python programming, but not much else

Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable. They said that the ...

SiliconANGLE

AWS brings OpenAI’s AI models and Codex programming assistant to its cloud

Amazon Web Services Inc. today made OpenAI Group PBC’s large language models available on its cloud platform. The algorithms are accessible through Amazon Bedrock alongside Codex, the ChatGPT ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results