Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It’s my experience that models that perform very well on benchmarks do not necessarily perform well in real life

Well, yeah... Like Opus 4.5, 4.6, 4.7. Top of the benchmarks and yet it's a pile of crap at the moment and has been for months.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: