There is room for plan adaptation but the agent has to justify and highlight it ...

There is room for plan adaptation but the agent has to justify and highlight it in the PR.

Defining the plan/acceptance criteria for long running task is the hard part.

We recently added a Ralph loop mode in that spirit. The implementation won't start until the human and agent align on verifiable criteria and a different agent judges if criteria are met at the end of each run.

Overall I think this problem is not yet completely solved and improvement on both the UX and model judgement are needed