Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is room for plan adaptation but the agent has to justify and highlight it in the PR.

Defining the plan/acceptance criteria for long running task is the hard part.

We recently added a Ralph loop mode in that spirit. The implementation won't start until the human and agent align on verifiable criteria and a different agent judges if criteria are met at the end of each run.

Overall I think this problem is not yet completely solved and improvement on both the UX and model judgement are needed



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: