There is room for plan adaptation but the agent has to justify and highlight it in the PR.
Defining the plan/acceptance criteria for long running task is the hard part.
We recently added a Ralph loop mode in that spirit. The implementation won't start until the human and agent align on verifiable criteria and a different agent judges if criteria are met at the end of each run.
Overall I think this problem is not yet completely solved and improvement on both the UX and model judgement are needed
Defining the plan/acceptance criteria for long running task is the hard part.
We recently added a Ralph loop mode in that spirit. The implementation won't start until the human and agent align on verifiable criteria and a different agent judges if criteria are met at the end of each run.
Overall I think this problem is not yet completely solved and improvement on both the UX and model judgement are needed