Mass General Brigham's BRIDGE benchmark found top AI models scored 92 on medical exams but just 44.8% on real-world clinical tasks.
Some results have been hidden because they may be inaccessible to you
Show inaccessible resultsSome results have been hidden because they may be inaccessible to you
Show inaccessible results