Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is this meaningful at all, without a control?

How often does software fail in production with human-written code? How many times has a production failure been avoided because an LLM didn't make a typo or mistake that a human would have?

This is pushing an agenda. It's not measuring anything meaningful.



Half this list is bad attribution. LiteLLM was a supply chain attack — stolen PyPI credentials, nothing to do with vibe coding. The Amazon outage number comes from a vendor blog pushing their own product. Nobody else reported it.

But the "where's your control group" take bugs me too. It's not that AI writes buggier code line for line. The gaps are just in different places. Devs who've shipped real apps add rate limiting, auth middleware, proper CORS — because they got burned before. AI skips all of it because nobody prompted for it.

I read through about 80 AI-generated repos a few weeks ago. Code looked decent. The missing stuff was always the same list — no auth on admin routes, API keys hardcoded in client JS, CORS wide open, debug endpoints still live in prod. Over and over.

Nothing there makes a wall of shame. Nothing's exploded yet. But it's the kind of stuff that does.


This is definitely the right question. A list of failures without any baseline won't tell you anything. You would need the same exercise for human-written code at a comparable scale before drawing any conclusions at all. Without it, it's just confirmation bias.


A control? This is just a list of incidents, not an experiment.


The "Why this matters" section at the bottom is clearly drawing conclusions as if it were an experiment.


Not really, no.


Yes really, yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: