There's other small single digit differences, but I doubt that the benchmark is that unreliable...?
MCP-Atlas: The Opus 4.6 score has been updated to reflect revised grading methodology from Scale AI.
reply
There's other small single digit differences, but I doubt that the benchmark is that unreliable...?