Surely they are testing their optimizations against common benchmarks internally...

		taylorfinley 2 days ago \| parent \| context \| favorite \| on: Claude Opus 4.7 Surely they are testing their optimizations against common benchmarks internally? I bet the "real world task" degradation is larger by some multiple than it appears when measured through a benchmark that is part of the target.

		help