While the speed of prototyping and even shipping to production has increased, I have been asking myself at what cost? I see a lot of garbage being shipped. Not because the code quality is bad, because execution has become cheap now. Ideas even though crap, are getting prototyped. Things which look effective on the surface, but has real UX problems in the underneath, are getting prioritised because someone in the room can talk better and enrol a leader to align with the idea. Good old user research or talking to users to validate ideas, iron out issues in the user flows has become too slow for the new process!!
This I believe is lowkey one of the core ways design broke at tech companies. There are other big ones, design (really product) is broke deeply, but once mockups became easy we stopped having discussions about information architecture and UX. We're talking about whether we think this looks nicer in blue or green. Happened before Figma, but Figma really grew it. The designers that tried to hold onto wireframes (or went to mockups, but tried to still have architecture discussions with them) fought an uphill battle—what were these guys even talking about? Even the other designers thought this.
Once mockups became easy, that little bit of vocational gate (as in gate-keeping) that was holding the wall for UX work went away. Execs and PMs could make decent looking rectangles, so designers became the people who could make especially nice looking rectangles. So you got a lot of product/UX designers that were much more visual designers. That matters, but the prior part was bigger in that the product processes and sprints often started to have little design in them at all. What was a two or three phase process was one, designer got requirements and made design—often they didn't even really get requirements before design. The design was the impetus for the requirements not the other way around.
This is what became standard: Leader would give something vague because they didn't have much idea or vision yet. They probably had something blue-sky-ish, meaning they had a bunch of ideas, which in amorphous abstract blue skies come together. Once those things appear side by side on paper/screen, they're off putting and contradicting. There are problems not just with how to fit the pieces, but with the pieces. The visual the designer provides triggers this. Designers being visual people can see a lot of that in their head beforehand, but won't be heard until they show "bad work." It's pretty common though to see the PM or the leader look at it and say it wasn't the vague requirements, it's that the designer didn't get it. Anyways, it's that design that then kicks off some assessment against reality. Then you have a little bit of a shot at real requirements starting to leak out.
It’s a bit like having a 1:1 scale architectural model made of cardboard. You walk people through it and everything looks good on the surface so they start asking “when can we move in?” skipping all of the hidden engineering that still needs to be done and in some cases pressuring you to just waterproof the exterior so they can live in the model instead of building the real thing.
I've noticed and felt this trend, but I haven't ever seen it put so well or really connected the dots with figma & pretty rectangles.
I remember discussions over relative easy of use of gray box wireframes... and that led to better products.
Now I've got designers vibing monstrosities that would have fit right in in the Flash era, I guess in order to draw even nicer rectangles now that execs than wave at AI and get a design.
Yes, though this has been a problem long before Figma or AI. Photoshop enabled pretty pictures of what a website / app / service might look like, and then these often became set in stone.
One solution is to keep all mockups / prototypes strictly grey scale using bare bones vectors until every stakeholder has weighed in / signed off.
> Good old user research or talking to users to validate ideas, iron out issues in the user flows has become too slow for the new process
I haven't seen these in at least a decade in the industry!! Everywhere I used to work was always "PM wanted" or similar and the validation was always just QA making sure the thing works/does the bare minimum!!! Customer input was just for bugs.
I hope that with AI speeding up prototyping we can actually go the other way long term, where we go back to ACTUALLY talking to a customer and then quickly prototyping it to see if it is what they wanted. Figuring out what the customer wants remains the hardest part of software engineering, but at least right now its mainly because we just dont talk to the customer.
Prototypes aren't only for UX though, sometimes they're for exploring whether something is technically possible, or what are the unknown unknowns in a particular area.
For example, for personal projects, I've been wondering if it's possible to automatically create RSS feeds for pages that don't have them (yes), what are the challenges when building an archive-style page dumping system (need to dump CSSOM alongside getOuterHTML, remove/rewrite remote content, walk iframes, automate Chrome, scroll to load lazily loaded content, etc.), and if training a model to remove native ads from markdown coming from readability is possible (no, at least not with my current approach, but using the dom might work).
A few reasons. Learning is one of them, since I don't normally deal much with browser and web related technologies, so it's a good way to learn more about them.
I also think there are a few interesting things you can explore that go beyond a simple carbon copy of what's on the Internet. Ideas that I've implemented are things like automatic extraction of audio tracks, transcription, and summarization, loading a page or podcast transcript into the context window of a LLM to discuss the arguments or factuality of the claims being made, automatically turning articles to reader view using readability/trafilatura, etc.
Directions I'd like to explore would be things like multimodal search ("that page I read six months ago about computer security with neon green text on a black background", or give me a list of fitness related pages I've read in the last twelve months), personal statistics (how is the mix of topics I've been reading about changing over time), annotating pages instead of just passively reading them, maybe even P2P archiving or discussions about pages, and all kinds of other things.
One of the second order effects of AI collapsing the cost of building things is that product management is much more important now. A Product Owner/Manager who lacks the taste and insight (or data) to know what they should put in front of users and what they should just put in the bin will cause a company real harm, especially if the company moves to a "there's zero effort in building something, so we'll try everything!" model.
The only part that's really collapsed in effort is the translation from requirements into code. If you're using AI to generate requirements you're effectively building things based on what a 'random' requirements generator says. If that's as good as the requirements a Product Owner was writing then that person needs to improve.
Can you help me understand what the "cost" of other people producing garbage is? Prototypes are generally shop jigs. You'd feel weird gold-plating a stop block.
> the "cost" of other people producing garbage is?
Sure can! It's a well known phenomena, won the researchers a Nobel, and explains a lot of the American economy and "lack of taste". The Market of Lemons[0].
Lemon Markets really require one important thing: at time of purchase, the average consumer is unable to differentiate the quality of the product.
Consumers are "rational"[1], so with "all other things being equal"[2], will make their purchases based on price. Therefore, the product that is cheaper but is also _in reality_ lower quality wins. This then pushes out any competition who is trying to differentiate their product through quality. Thus all products in that category decrease in quality and it becomes a race to the bottom, maximizing profits.
I want to stress that this doesn't require that the quality of products are distinguishable by experts, but only by the average consumer. You can probably look around at tech and notice this pretty quickly. The average consumer is not really tech literate[3]. They can't tell the difference. Hell, my parents don't even know the difference between the internet speeds from their ISP, even with the numbers displayed. The numbers mean nothing to them. Do they want 1GBps? 100 MBps? They don't know!
> Prototypes are generally shop jigs
The problem is people are shipping prototypes. We may disagree what is a prototype and what is a shippable product, but that disagreement in itself is worth noting as part of the problem. I mean FFS we in the tech industry love selling things with the promise of future improvements. The last few iPhones shipped with the promise that they were going to get better with AI (did Apple intelligence really pan out? Did it pan out anywhere near what they promised? My Google Pixel phone still can't schedule a haircut for me or book a reservation at a restaurant, despite multiple promises).
[1] Economics uses this term differently than what we use colloquially. Read "consumers make decisions based on the information available to them" not "consumers are geniuses and making perfect decisions"
[2] i.e. the only distinguishing purchase criteria is price
[3] If you think I'm wrong, please go spend a month outside Silicon Valley. Hell, go try a different country, and not in the major metro areas. We're nerds here. Every single person on HN is above average in this respect.
The premise here is that people are selling these prototypes, and they are being bought. I mean, fine, that's bad, but when we discuss "prototypes", I assume uninformed cash transactions are off the table.
I gave the example of Apple and Google for a good reason. Because these big companies are selling products that don't even exist yet. You don't consider that selling prototypes? Fair, they're selling stuff that isn't even a prototype. I'm not sure that's any better.
Or maybe you're making a very different point, which I have entirely missed.
> I gave the example of Apple and Google for a good reason. Because these big companies are selling products that don't even exist yet.
I guess I'm curious what you mean by this, I don't particularly see either of those companies doing this, certainly not in the way this article describes, and not really in any way that's impacted substantially by AI.
What "product that doesn't exist" is Apple selling? Google? Who is paying for it?
> What "product that doesn't exist" is Apple selling? Google? Who is paying for it?
>>> My Google Pixel phone still can't schedule a haircut for me or book a reservation at a restaurant, despite multiple promises
This example has been in how many Android announcements?
Before if you had a crap idea you atleast had to face the social back-pressure of explaining it to someone at a local hackers meetup and trying to convince them to build it for you..
I'm having a hard time seeing your point. Faster iteration = easier to fix UX issues. That's all the LLM is providing here. Problems with UX = bad decisions. Those happen with or without LLMs.
I'm in an industry where I can really see this, executed by honestly talented people able to interpret what the LLMs produce. It's bikeshedding hell. If you pursue every possible idea and get to implement all of them and it actually works, in the best possible scenario with no technical debt because you're able to stay on top of it (presumably in the window you have before you just burn out), you end up with all the ideas at once.
The project has tracked your imaginative state, and perhaps the states of your beta testers as they imagine things. It's a power armor suit tailored to specifically you. Nobody else will ever fit it because it's evolving too fast, all to implement your every whim.
I've seen this take 1.0 projects that are intentionally wildly scope-limited and great at that, and balloon until the project is the Everything Machine, doing everything but send email. I guess in the new era, every project expands until it becomes alive and devotes itself to your service… or at least, does its level best to be that for you and your beta team.
These things are not approachable. They're fever dreams, unparsable by outsiders. Discipline is lacking.
The same thing happens because of tools like the Unity/Unreal engine. Lots of low quality barely-more-than-a-demo "games" uploaded to steam. However those games rightly fail to make any decent $ so probably not a problem long term.
> Things which look effective on the surface, but has real UX problems in the underneath, are getting prioritised because someone in the room can talk better and enrol a leader to align with the idea
This has always existed. The ability to rapidly prototype has not changed it in any way.
An extremely experienced UX researcher once told me that, having been doing field research and user research for 3 decades now, every time it's a Fortune 500 company, after presenting mountains of research, it comes down to what color the CEO liked in the moment.
I don't understand the proclivity to latch onto whatever the new thing is and blame it for shitty decision-making that has existed as long as humans have existed.
Similar experience here, however my feeling is that this isn't necessarily a bad thing. Garbage being made is indicative of a gap in the currently-available tools. User research should shift towards analyzing these prototypes and enhancing existing tools to fill this need.
I think there are costs beyond having to sacrifice writing code yourself.
When prototyping yourself you learn a lot about the problem, see what design decisions lead to what tradeofs.
While you write code your brain is always running in the background, giving you thoughts about how things could break, where the structure could be simplified, where the code could be extended.
I feel this is lost or at least reduced a lot when an LLM writes code because you have a lot less contact with the software.
The approach I've been trying to use at work is making heavy use of AI for the generation of experimental prototypes, but with little intention of keeping all that code verbatim. I can get something to the demo stage much more quickly than before, allowing me to show it to coworkers and users to get feedback on random possibilities. From there, I make a case to properly delegate time for "doing it the right way" as a slotted piece of work. That's resulted in a number of high-impact features being added to our app in the last few months, that in the past would have taken much longer or never came to fruition.
I'm using this specifically in the context of concepts/features that are hard to explain/sell without some working visual or prototype, but which aren't immediately evident needs or features requested by our users. Some of them go nowhere, but I think the net result is an increased ability for me to get my experiments from the "lab" to production.
Productivity has increased only for people who knows what they are doing. I have been able to increase my productivity to build and turn around things faster and in a much polished manner. One problem is too many things goes on in your head and I see using tools like Jira or notion are very handy to capture all edge scenarios, integrations need to be captured. Taking break from AI is very very essential for this to work for me.
I think it is mostly accurate to say that the cost of execution has dropped to zero.
In this new world, the ability to say "no" is more important than ever. It has never been so easy to burn time, money and energy. The fact that you can try anything now can be modeled as a disadvantage. The space of possible solutions got a lot bigger. Unless you have good taste you could wander and get lost very quickly in this vast new expanse. A true expert that has more paths to work with can arrive at higher quality solutions faster. A novice will get into trouble faster.
It often takes 10k hours suffering through an idea with a live customer before we deeply learn why something is a bad/good approach. None of this painful wisdom is available in the models. You can easily change the mind of ChatGPT with a single adjective. You cannot so easily persuade the person who has successfully cast the ring into the volcano already. They know what it actually feels like to get there.
For the past few months, many times i’ve tried this workflow:
1. Ask a coding agent to think and implement a feature that is non trivial
2. This leads to really understand pros and cons for many possible solutions and see it happen end to end
3. Revert all changes and implement it myself when i’m settled on a solution i’m satisfied with
4. At this point the agent is just an iterative reviewer
I’ve felt that any non trivial amount of code not written myself tends to be hard to own. And like the author said, need to keep skills sharp also
I'm truly hopeful that AI will open a new of prototyping. Back in the day, prototyping was how you figured out what to build, you'd very deliberately toss the entire first (or second!) version, and you'd plan to do that.
Might be the opposite in some orgs. Higher ups in working with get visibly annoyed when you start talking about prototypes or trying something out in isolation, they don’t see why you wouldn’t just work with the real codebase and end the project with a PR.
Also seeing a lot of managerial class bypassing the PR system entirely and just committing to main “because it’s faster”.
Most places I've worked, devs were basically afraid to prototype
Either you would get chastised for wasting time with prototypes, or worse, your prototype would end up in production
I think the software industry really needs a cultural reset to embrace slower and deliberate development to build quality, but unfortunately AI has us racing recklessly in the wrong direction
I am so tired of it. Are there any companies out there that actually give devs time to build quality software anymore? I'm so burned out of the "move fast and break everything" grind
Quality must come from engineering. If you’re depending on a product manager to ask you that you can improve the quality of the code, you already lost.
So it requires soft skills, proper framing and ability to iterate quickly on quality-related tasks without leaving junk and multiple-versions behind.
But I completely understand push back for “doing improvements developers want to do”: A lot of developers confuse quality with familiarity or even complexity/verbosity. So business people have a reason to be reluctant.
And as an engineering manager I also had to push back several times. The thing that makes money is not the place to learn new skills, for example.
Prototyping is a lot easier indeed. I've experienced this as well. And many of the prototypes are kind of shipable even after a only a bit of iteration. Mostly it's super easy to go from prototype to something shippable. I hate the term vibe coding actually. Is it still vibe coding when the thing has end to end tests and I've been trying to break it for a few days?
The flip side is, nobody cares. I've put some of these things up on Github and ... nothing. It seems even my pre-AI projects have dropped sharply in eyeballs judging by issues, prs, stars, etc. People are too busy doing their own things to bother looking at other people's stuff. And rightfully so. There's nothing magical about my prompting to what people can prompt themselves. The value of these prototypes just dropped. Except op course for people still doing things the old fashioned way.
So, you can ship your prototype. But there's very little point to doing so. Even if it isn't slop, it's just very hard to stand out from the masses of other people's prototypes. The value of custom applications just dropped by an order of magnitude. Everybody is going to expect things to be tailored to them now.
What are people doing with prototypes afterward? Do you end up shipping it as is to production? What about at work? Are the prototypes useful in that context?
I've started making most of my prototypes single HTML documents with inline CSS and JavaScript, because a single file is a lot easier to store somewhere and share and will probably keep on working forever (browsers are really good at backwards compatibility).
I chuck some of them on my public tools.simonwillison.net collection, others in their own GitHub repos with GitHub Pages enabled, and some I just share in a Gist (served via gisthost.github.io) or stick in a public S3 bucket so I have a URL for them.
If a prototype is against an existing project sometimes I'll leave it to silently rot in a branch on GitHub.
The single file is interesting, we’ve been observing something similar too. Do you have a specific prompt that you load in by default to make this work? Are these react files, or just pure HTML/JS/CSS? Aka do you compile it via Esbuild or webpack or something, or are you asking the model to generate something that works out of the box?
We’ve been seeing Claude artifacts sometimes come out as JSX or TSX
I have been doing the same thing, creating small one file "apps". The problem that I have currently is that I often want my Agent to be able to present me with something like a report on a code change, have me mark it up (comments, choices), and then present those interactions back to the model.
I'm experimenting with different ways to standardize some aspect of this process in a lightweight way so an Agent and I can "communicate with each other over rendered html".
A simple script or cli the agent can run to to serve an html app, act as a sink for interactions (can just submit a button+form to the runner port), and then close the page when done can work.
A little farther out in this direction would be something like a persistent client+server via web or electron. It's always on and you iterate in a loop, streaming diffs/file edits back and forth to each other.
A little farther out and you can load extensions that contain templates to generate the html, custom server code to serve htmx interactivity, and agent functionality.
1. "highlight a web page a-la obsidian web clipper and then intake that information into a personal wiki of concepts" (my third prototype in a row of this concept)
2. "visualize the code review process and organize discussion in a non-linear branching conversation"
And I realized both of them are basically chat pane on the left, agent with custom tools, and html "app" pane on the right to support interactivity.
The project as of this comment doesn't have any functionality yet its basically just the panes and a simple agent messaging channel, but if you (or anyone) are interested in the idea comment here and I will reach out when its a bit farther along and actually useful for building things.
Likewise please share your experiences with this concept, I would love to learn what others are doing with this type of workflow!
Never use React in artifacts - always
plain HTML and vanilla JavaScript and CSS
with minimal dependencies.
CSS should be indented with two spaces
and should start like this:
```
<style>
* {
box-sizing: border-box;
}
```
Inputs and textareas should be font size
16px. Font should always prefer
Helvetica.
JavaScript should be two space indents
and start like this:
```
<script type="module">
// code in here should not be indented at
the first level
```
Prefer Sentence case for headings.
I've been using those for a couple of years, there's a good chance they're not necessary against more recent models. I've found that just saying "use Vanilla JavaScript" is enough to skip React / other build steps.
I avoid any build steps because those make it harder to copy and paste code in and out of LLMs.
You know how when you finish prompting some code generator to build something, and you look over what it has built and feel a sense of emptiness even if it does what you want? I think about what I wish the prototype looked like, and basically start describing details that I expect to exist (think longer versions of e.g. “this should be using our internal graph library, and I figure we can model this task as a traversal, how far have you strayed from this and why?”) and let the agent analyze what it built against my expectations. I’ve spent hours in conversation just “refining the context” this way, and then I channel that into an update process. I figure the prototype is just about proving out behavior, and this next phase is about refining it into the pieces I’ll use elsewhere. It’s kinda fun, I’d absolutely burn out a coworker if I grilled their PRs the way I roast AI contributions :P
I use AI mostly to prototype features in my existing projects. If I have an idea, I use the AI to implement it and try out different ways in which it could be implemented. Then I throw away the code and mostly write the code manually, with AI used primarily for review or docs.
In my day job, we commonly create prototypes to sell the idea/concept to the higher ups, then if we get the green light, we throw the prototype in the bin and start from scratch to build it out properly.
I find this is where AI is genuinely useful, it lets us prototype an idea a lot faster, make no bones about the fact that it is a buggy proof of concept but lets people see the potential and get an idea for what the final product might look like.
I used Claude extensively during an internal hackweek at my company to prototype a new data analysis application. Probably would have never attempted the project without AI. Now it’s in production with more than 20k weekly users. Almost never use Claude to dev on it now, but it definitely helped me get off the ground.
> What are people doing with prototypes afterward?
I think what people ARENT doing is interesting.
Usability isn't even in your list, it is not something most people even think of.
The best thing you can do with a prototype is give it to (potential) users and observe what they do with it. Just because you think that you're clearly communicating the intent of your system does not mean you are.
But is it really any faster than using an already existing code generator/scaffolding tool? How do you know your project isn’t just a regurgitation of another repository? Would it be just as fast to clone some existing project and hack on it?
These are the questions everyone seems to be ignoring and saying “only LLMs can make projects quickly” but ignoring everything those LLMs are built on (your llmis probably calling a code gen tool).
For the at work side, I personally haven’t experienced any disadvantages or missed any project deadlines because I didn’t use an LLM, so what does velocity get me? Thumb twiddling time?
I was thinking the other day how much better Drupal is. Want a online store? A few commands and bam, online store. Want a newspaper? A few commands and bam, newspaper with publishing workflows, user management, and caching.
Using coding agents isn't much different. There are several things the models are trained to do very well and a few commands will get something. If the developer wants to move the project beyond that, it requires domain knowledge and a lot of hacking.
I wonder if the coding agents will move towards the Drupal model where they create interchangeable components with common interfaces. Like Drupal the coding agents never provide anything truly inovative that hasn't been done before.
Drupal and WP etc all have plugins to switch stuff on in minutes, however, customising and making it as your client wants would take a lot of time. WP shops we work with for clients (we need to integrate some times) take weeks to get some plugin to do what they want by adding tags and config options.
It might centralize around a specific framework but I think part of the problem is that people want to generate their own framework or at least not care about what the framework is/does/can do. They treat the LLM as the framework which can be non-deterministic and structureless.
> But is it really any faster than using an already existing code generator/scaffolding tool?
Yes, very much so. Our team was fast with those tools and created many of our own before this LLM AI (we used other AIs though to go faster), however it still took weeks to months from idea to launch; the same complexity now takes days, including everything. We already had rigorous processes and those really help now moving at speed. No way anyone can beat this except better AI.
But “are you really moving at speed after you generate the majority of your application?” is my other point. If you were to start working somewhere with an existing product the changes you would apply are more than likely incremental. What is the advantage of using LLMs to change 1-10 lines of code on average? How do you measure the ROI for that?
What did the time savings gain you? A quicker release date? How can you prove that? “This would have taken weeks” is the old problem of project time estimation. How can I take any engineer seriously that they think they know it saved weeks?
> How can I take any engineer seriously that they think they know it saved weeks
Because we have experience doing things pre-AI.
For example, most projects had no tests before AI because tests are very time intensive and take a lot of forethought since you also need to engineer the code to be testable. Yet now tests are trivial. Every project I delegate to AI has tests. Good tests too. How do I know? Because I have 20 years of experience and I looked.
Or, fixing a bug in my codebase is as simple as copy and pasting the user's bug report email into Claude Code. The LLM verifies that the bug exists, writes a red test to ensure the bug exists, then proposes the best fix that will turn the test green. Meanwhile it did all this while I was doing something else.
Or, the projects I've built that would have taken me a lot of time because they are in domains I don't have much experience in. I built a macOS app around libghostty which involved bridging Swift to Zig, something I've never done before. And when Ghostty has new versions, I ask AI to look at the diff and find new features and APIs I can take advange of in my own project. I didn't write any code in this project myself. It would have taken me a lot of time because everything takes a lot of time. It makes progress on the project while I'm doing something else, yet I use the project as my main terminal every day, it's so good.
It's very wishful thinking to assume that nobody knows whether AI is helping them or not, probably coming from an understandable place where you hope all of this a fever dream and we'll go back to the old way any day now. But you owe it to yourself to believe others here so that you can take it seriously.
Usually I know exactly what I want before hand. What structs. What protocols. How I want the event bus layered and what threads need to exist. And what make targets I want. So generally the generated code is strictly bound to my design pattern. Then it's all a matter of running it. To put it bluntly I'm running benchmarks and testing it while you're still deciding what to name your files.
And? What advantage does that have for you over me when it comes to a personal project? What does that velocity get me? More time to foolishly rewrite/regenerate the already built software from scratch? I don’t spend a lot of time naming my files personally.
So you do no validation of the code that’s generated? Just asking because you didn’t state that as a step in your process. You’re prototyping to running then you’re missing a big step that will most likely cost you later.
Why does it matter how much time I spent writing code for a project I’m most likely either not sharing or if I am sharing it can be obtained for free? Which market am I rushing to? Bluntness doesnt seem to be an advantage other than bragging.
Your tone makes me think you already decided that agents aren't worth your time, but I'll give it a try anyways.
I work as a DevOps engineer and have been using agents exclusively to code since the beginning of the year. Agents are really nice to quickly craft utilities to speed up planning. For instance I had it create a small cli for me that'll pull my cards from azure DevOps, load them as json, markdown and csv, and push updates once I'm done. Then I'll load into context transcripts of meetings and other written requirements, cross with current state of repos, to have meaningfully conrextualized work items without me having to implement these myself. I'll just have a long chat with the agent exploring these cards and defining the necessary refinements for description and acceptance criteria than I jusr push them all at once. Anything you can think of you just ask for the agent, so for me I don't trust code, so I'll have all my clis be no-op by default, so they will first print all they'll do and if I think the changes make sense I approve them and let the script commit to the canonical board.
Working with cloud consoles like Aws in general is a huge hassle, so crafting quick inventory utilities and tools for correlating data is a breeze.
Now the work itself is mainly ci pipelines, terraform files and automation. For these I'll base the agents on the specified work items and enrich them with my own understanding of the problem. I then launch the agents and read the agent output attentively. This is very important. You can't just prompt and leave, you need to be present all the time so you can steer the agent into solving the right problems. At the very least you need to review all the changes after an implementation session is done when you came back from making coffee. Many times it tries to create meaningless abstractions or very complicated solutions that I know can be done better. Or I have a different idea of how to organize the project so I do many follow-up sessions to refactor code.
In my personal projects I do a lot of small utilities. I spent some weeks designing and polishing a replacement for zurg and debridmediamanager the way I like it to be, simple and to the point, also tightly integrating them with jellyfin https://gitlab.com/gabriel.chamon/buzz
I have my own micro desktop environment on top of hyprland called Archie which recently I've been redesigning and improving a lot with agents https://gitlab.com/gabriel.chamon/archie
I have been improving my fork if gamma-launcher so that installing and managing the game on bazzite is simpler and more automated than relying on workarounds for workflows intended for windows https://gitlab.com/gabriel.chamon/gamma-launcher
Now for how I approach developing with agents. I think it's really important to get your constraints sorted out as soon as possible, so have your agent create a CI pipeline for code quality testing, like with ruff, pyright and pytest, to control style, code consistency and cyclomatic complexity. Put in the AGENTS.md explicit instructions that the agent must run these tools at the end of every coding session. If adopting a new project, use the agent to explore the code and see which refactoring points are worth tackling. Agents really thrive on good codebases, so this first code quality improvement pass is a must.
To sum it up, with agents you give up writing code manually for reading lots of code, exploring the domain with the help of the agent and architecting the solution at a strategic level. You trust the agent but you also verify. And lots and lots of manual testing. My personal take is that I'm infinitely productive now, only constrained by how much code and agent terminal output I can read, and also by the rate limits of the model providers and mental fatigue.
> Agents are really nice to quickly craft utilities to speed up planning.
Reminds me of a conversation I had with Kelsey Hightower where he suggested that using agents to build utilities and software was a smarter way to proceed than using agents to do the work. It is almost like the software artifacts are a cached version of your understanding of the problem and can be used over and over again until the problem (or your understanding of it) changes.
Your tone makes me think you have already fallen in love with agents and you think they are the best thing since sliced bread, but let me give you my experience.
I am in a similar professional position to you, and I make a lot of small things in my spare time. I have found using agents very tedious and frustrating to workflow. Initial prototyping can be ok, but when you start to get serious with code it falls apart quickly. If you don't tell the agent literally exactly what to do to the letter, it will guess some things. Usually some of those things are wrong, and dont match the functionality you expected. I find this a very frustrating place to be, trying to tell the agent what is the wrong functionality, and what I expect instead. Usually at this point I enter what I refer to as a doom spiral, where everything I tell the agent just takes me further from what I want, until I eventually have to revert everything it has done and try again.
This gets worse with bugs, where a inevitably a code bug will appear, and trying to tell the agent what the bug is and what is expected instead usually results in more broken functionality elsewhere. When I have written the codebase manually myself, I can usually pinpoint and fix bugs in a few minutes after diagnosing them. I have literally spent hours trying to get an agent to fix a bug without breaking something else.
I thought maybe refactoring code might be a strong point for LLMs, so I tested taking a monolith codebase and asked various agents to refactor into reusable module structures with exposed api endpoints so that I could split apart functions into modular chunks whilst retaining full functionality. They all failed miserably at this, breaking everything and never managing to make a working example.
LLMs and their agents certainly are cool, and they are great at writing emails for people and summarising meeting notes. They can even create very small coded programs well. But let loose on serious production codebases and they can cause much more frustration than they solve. I will come back and try another day when LLMs have evolved again to the next level, but for now they can stay coding my toy projects and dictating my teams meeting notes.
My general experience is that LLMs are both really good and extremely bad. It's so easy to get into a hole of "No, not like that, like this" and it just never getting better (including with new sessions).
I find it fascinating the wildly different experiences people have with LLMs, and honestly I think it's a good thing. We will need code crafters and technomancers, I don't think having either one or the other is healthy, which is why I'm very critical of mandatory LLM use in corporations.
And I don't doubt you have had you agro with LLMs, because I've also had my fair share of issues with them, I just think we have different emotional responses to the workflow with agents. They don't work the first time and they aren't very good at sweeping large sets of loosely related changes. They need to focus on one feature only and crunch it to the end.
Honestly though I've didn't have the chance to work in large codebases, but with those projects I had lots of success and I found the workflow very stimulating, reading the solutions the LLM come up with, some very interesting and some comically bad, but more often than not I'll pick up a technique or an approach I didn't think about. Worse case it's something I can bounce ideas off of.
About bugs, I have the opposite impression. I find it really interesting to get a functionality wrong, provide the agent with the logs and context and explain in detail the issue and have it help me explore the codebase to identify and fix the issue. I've never had an instance until now that I couldn't fix the bug or that I left the session in a worse mental state than I entered.
I'll take buzz, for instance. Before using zurg I had to use Plex because jellyfin would only detect a single file in a folder with multiple files. Codex created the presentation layer I described in a single go and it worked first time. That was really impressive I have to say. The project also has it's own WebDAV server, it integrated with debrid, has a persistent catalogue of media that is independent of debrid and can be used to restore previously deleted media. It has a logging UI, a config UI and a nice event system for waiting for different independent services that it needs to orchestrate. I don't think it's a large codebase, but it's nowhere near a toy project. It also has a very capable CI pipeline that supports the development. The only part I couldn't get the agent to do well for nothing was frontend implementation, maybe because I refused to use a framework and defaulted to plain JavaScript and CSS embedded in jinja2 templated html files. I have picked up a couple of techniques when I did full stack work when I was an intern so I was cabaple of using the browser to inspect and refine the Dom elements. One thing that it did poorly for instance was to create all elements in block display, however planning a refactor to use flexbox throughout the code really improved the UI resilience and it was really effortless to deploy. In buzz I haven't touch most of the code, just some adjustments in the htmls to serve as an example for the agent of how to do it correctly, prompts not being the only way to interact with them, but I read most of the code and validated most of the functionality in merge requests, just like you'd do in a team work.
In a nutshell I think agents are really capable since November last year of working in large code bases, but I don't trust them to just be let loose. They need lots of hand holding and steering, but for me once I got the hang of it I really feel like I'm extremely productive.
My hypothesis is that people are more likely to have success with agents the more they enjoy writing in natural language and reading code, while people that prefer coding and dislike writing text will usually prefer handcrafting their programs.
Yes... and in fact I'm a professional prototypist (as in I get paid to do that) and this is 100% the process.
You do not, never EVER, start from a blank slate.
Step 0 is to actually challenge the value. Before you even start you spend a LOT of time with the person with a need to narrow down what they actually need. Not what they think they want but what's genuinely problematic for that. Again, NOT how to solve it but what's a thorn that's painful for them, not for what you imagine it might be. This honestly often get uncomfortable quickly because you have to ask "Ok but have you tried this? What about this quirky thing?" because it challenges their own attempts. If you don't spend a significant amount of time in that space you WILL implement faster, that's obvious, but you are very likely to efficient "solve" the wrong problem. You will solve what you can solve easily. Think about it like looking for keys where there is light, not where you lost them.
So... that's before one has even started to code, it's mostly uncomfortable discussions. Only once there is some confidence from both parties that the problem is identified can implementing might make sense. Then you don't! You do NOT touch a line of code. Instead you take whatever you can, post-it notes, Lego bricks, existing software, you tape all that together and you ask "Would THAT (very ugly barely working monster) kind of solve it for you?". So you do not build anything new, you ONLY stick together the BIGGEST existing parts.
Only then you might eventually build something but STILL you don't start from a blank slate. You are going to find the highest level of abstraction you can find. They want something related to the Web? You don't freaking build a Web browser, or a even PWA, you paste a code snippet in the console.
You always look for the way with the MINIMUM amount of new code. It's never about implementing faster. It's about NOT implementing faster.
Now the fun part (arguably) actually begins when you have done all that but it's still not enough. You use a CMS or a browser or whatever large exiting verified code base and the need is not solved. Then you rely on the built-in extension system of that code! Guess what, there might even be an existing plugin that does what you thought was novel.
Finally, finally you did find no extension for that existing quality open-source large code based so you HAVE to build it. Well, not so fast, is there another piece of code from another software that does it? Does it have an API? Can you connect to that API to get that functionality in that extension?
Then you have done it, you brought together 2 large pieces of software but you only coded the connector between the two.
Your prototype is, in practice, 10 lines of code.
TL;DR: yes, good prototypists code very little and yet still end up with genuinely novel work.
Standard disclaimer from me that if you are forced to do it by people who have power over you then that is different from doing things voluntarily. You can still “just quit your job” but you have less agency.
I also use AI, not for muh agents but for asking questions. Even when I’m not forced to, unfortunately.
> I still don't think AI is magic, and I'm still cautious about the broader picture; the environmental, financial, and social questions haven't gone anywhere. But for me, right now, the day-to-day reality is that I can move faster, think bigger, and ship more than I could before. And that's been genuinely fun.
Three categories of concern, two of which are relevant for the well-being of commoners, and you still go ahead with it? Why? Because being productive and having fun is more important than the environment and driving people into social crises?
Well the worst thing is it allows you to make a really convincing pile of junk and get people to pay for it. Then you can worry about the details when you have your first paying customers.
That's where everyone is overconfident. And that's where mature companies like ours are starting to get customers come back again after jumping ship for some vibe coded startup and getting screwed.
LLMs, minimum advantages aside, are merely amplifiers for the worst characteristics of the human race.
reply