When 4 of 40 Models Beat Coin Flip: Measuring Claims About Anthropic Opus and Claude Upgrades
https://beausexcellentperspective.trexgame.net/when-a-hospital-cto-must-choose-an-llm-for-clinical-decision-support-aisha-s-story
Only 4 of 40 Models Beat Coin Flip on Hard Questions About Anthropic Opus Improvements The data suggests a surprising gap between vendor claims and real-world discriminative power on narrowly targeted technical questions