Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If SWE Bench is public then Anthropic is at a minimum probably also looking at their SWE bench scores when making changes, I'd trust more a tracker which runs a private benchmark not known to Anthropic.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: