JavaScript by Chai Our Code

Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery

ResearchClawBench is a benchmark that measures whether AI coding agents can independently conduct scientific research — from reading raw data to producing publication-quality reports — and then ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery

Trending now