Maybe include in a prompt a threat of legal punishment? Sure somebody has alread...

bick_nyers · on Dec 27, 2024

I suspect the big AI companies try to adversarially train that out as it could be used to "jailbreak" their AI.

I wonder though, what would be considered a meaningful punishment/reward to an AI agent? More/less training compute? Web search rate limits? That assumes that what the AI "wants" is to increase its own intelligence.

timeon · on Dec 27, 2024

Maybe legal threat for the company operating it? Would that help?