Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe include in a prompt a threat of legal punishment? Sure somebody has already tried that and tabulated how much it improves scores on different benchmarks)


I suspect the big AI companies try to adversarially train that out as it could be used to "jailbreak" their AI.

I wonder though, what would be considered a meaningful punishment/reward to an AI agent? More/less training compute? Web search rate limits? That assumes that what the AI "wants" is to increase its own intelligence.


Maybe legal threat for the company operating it? Would that help?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: