Do you have any good resources on how to work like that? I made the move from "auto complete on steroids" to "agents write most of my code". But I can't imagine running agents unchecked (and in parallel!) for any significant amount of time.
Right now, I'm finding a decent rhythm in running 10-20 prompts and then kind of checking the results a few different ways. I'll ask the agent to review the code, I'll go through myself, I'll do some usability and gut checks.
This seems to be a good window where I can implement a pretty large feature, and then go through and address structural issues. Goofy thinks like the agent adding an extra database, weird fallback logic where it ends up building multiple systems in parallel, etc.
Currently, I find multiple agents in parallel on the same project to be not super functional. Theres just a lot of weird things, agents get confused about work trees, git conflicts abound, and I found the administrative overhead to be too heavy. I think plenty of people are working on streamlining the orchestration issue.
In the mean time, I combat the ADD by working on a few projects in parallel. This seems to work pretty well for now.
It's still cat herding, but the thing is that refactors are now pretty quick. You just have to have awareness of them
I was thinking it'd be cool to have an IDE that did coloring of, say, the last 10 git commits to a project so you could see what has changed. I think robust static analysis and code as data tools built into an IDE would be powerful as well.
The agents basically see your codebase fresh every time you prompt. And with code changes happening much more regularly, I think devs have to build tools with the same perspective.
It will come naturally! I have started with autocomplete as well. I was stumbling upon different problems and was fixing them by implementing best practices. Current stack is:
1/ Claude Code with yolo mode
2/ superpowers plugin
3/ red/green tdd
4/ a lot of planning and requirements before writing any code
It feels like you always touch this edge of capability of models and your current workflow. Delegate more complex task, and system fails. Delegate more simple and system works great. Improve your workflow and move this complexity to a higher level.
But... I am llm power user for more than a year and a half now. I cant delegate exactly because ive reviewed a lot of llm's code, and it is never good enough for me to step down from reviewing everything manually. I can understand how you can vibe code dashboard or tests, but vibe code your entire backend without checking it thru carefully? Madness.
For me you open a markdown editor and draft up a code plan and details of what you'd do as a coder at a high level then bust into whatever tool in planning mode (I usually fire this into the opus 4.5 model) and have it break it down into concise steps and then hand it off to a simple model (gpt spark, sonnet, composer or whatever) to execute. when I feel frisky I'll just have opus one shot it and it can be done in a few minutes.
I use Claude “on the web” or Google Jules. Essentially everything happens in a sandbox - so yolo isn’t a huge risk. You can even box its network access. You review the PR at the end or steer it if it’s veering off course.