Oh, this is cute and a great first game! I'm working on my first game as well (a top-down 2D tower defense game).
What engine or framework did you end up going with? I looked into Unity, tried Godot for a few weeks, but landed on just making a Typescript-powered canvas game with PixiJS for graphics rendering. Found it much easier doing it this way instead of having to learn a game engine.
I agree with you on everything you said here except:
> when you know how the thing works and have that mental context, you will always be faster than an AI
That's just plain false, honestly. No one can type at the speed AI can code, even factoring in the time you need to spend to properly write out the spec & design rules the AI needs to follow when implementing your app/feature/whatever. And that gap will only increase as LLMs get more intelligent.
Some of us do actually have intimate knowledge in certain areas where guidance of an AI takes longer than doing it yourself. It's not about typing speed, it's that when you know something really really well the solution/code is already known to you or the very act of thinking about the problem makes the solution known to you in full. When that happens it's less text to write that solution than it is to write a sufficient description of the solution to AI (not even counting the back and forth required of reviewing the AI output and correcting it).
This is actually my biggest gripe with vibecoding. The single best feature of any programming language is that it is precise. And that is what we throw out?! I favor of natural language, of all things?! We're insane!
It turns out an awful lot of precision (plenty for many things) lives in library and web APIs, documentation, header files and dependency manifests. Language can literally just point at it without repeating it all. Avoidance of mistake through elimination of manual copying in things like actuarial and ballistics tables was what the original computers were built for.
API Glue is the easy and boring part in programming. Nobody really enjoys wiring API A to API B, combining the results and using API C to push it forwards.
Any semi-competent AI Agent can do that with a plan you've written in 5 minutes.
I would love to see an AI try to make sense of GTK API.
I may be wrong, but it seems when people are talking about easy glue code, they’re talking about web services API, not OS API, not graphics or sound API, not file formats libraries,…
I used Sonnet 3.5 over a year ago to decrypt a notoriously shitty local government API to get data out of meetings, votes and discussions.
I know it's a piece of shit API done in the worst possible way on purpose (they don't want openness, but had to fulfill a law that mandates "openness") because I had previously tried to do it manually - twice. I ran out of whisky before I got anything done.
Sonnet _3.5_ almost one-shotted it with just the API "documentation" they had and access to Python and curl.
People have also hooked stuff into proprietary APIs on "smart" devices with zero documentation, just by having an Agent tirelessly run through thousands of permutations to figure it out.
Historically we almost entirely moved from ASM to C, a language with lots of undefined behavior, because precision is not the most valued feature of languages.
It's the existence of UB that is the reduction in precision. A language without UB is more precise, in my view, than one that has UB. I don't know if this a conventional view. But being able to write parsable, compiler-receivable code that does 'uh, whatever', feels like a reduction in precision to me compared to a language that does not have that property.
Otherwise, we're just saying that the precise parts of the language are precise, which isn't much of a differentiator since it's similarly true for all languages.
UB is about edge cases that a compiler should not be enforced to check against and an occurrence is always a bug. You don't necessarily need a precise description of the actual faulty behavior.
Right. The language has well-formed expressions with no defined meaning in terms of machine instructions. My claim is that this is a reduction in precision compared to assembly language.
Grandparent said:
> The single best feature of any programming language is that it is precise.
C overtook a more precise language family because it has features other than precision that people cared about. Perhaps a better tradeoff of expressiveness and readability with precision.
Grandparent could be correct, and precision is the best feature of C, despite being less precise than ASM. And its better expressiveness nets out to a better overall programmer experience. I just wanted to point out that precision is something we do trade away for other things we want.
I don't completely follow the analogy, but I do follow the argument. High precision regarding the requirements often is not needed and that's exactly where LLMs shine.
That's also where engineers come into play. They (and often only they) can judge how much precision is needed depending on the part of the system they are working on.
Could you please explain why you feel that having UB makes C less precise than asm?
To me, the notion of precision isn't in any way related to whether any given statement is sound. It's about the behavior of the language for sound programs.
There are syntactically well-formed C programs that are not sound programs because their behavior is undefined. Or, rephrasing: a subset of all parseable C programs contain 'do whatever, I dunno'. I interpret this as a lack of precision.
One could take the position that specifying precisely 'do whatever, I dunno' counts as perfectly precise. But then a language that was entirely UB would count as precise, which would be an odd position to hold, since you can't specify any behavior at all with it.
Nobody seriously interprets "the C programming language" as "parseable C". Of course there's parseable, undefined C, and of course it's very imprecise. It's not relevant.
Now consider sound, in-spec C. Versus natural language.
Ok I’ll do the same move and show why it doesn’t persuade me.
Consider the subset of natural language that has strictly defined semantics. This would include, for example, talking in about the arithmetic of real numbers. The rest is not relevant to evaluating the precision of natural language.
Does that exclusion feel different in the natural language case? Why?
Perhaps it’s a matter of degree, not a categorical difference.
The exceptions that prove the rule. When your programming language is built up of singular Unicode characters with specific meanings, of course that's faster than typing out in English what you want.
What do you use them for? For most AI users it's usually CRUD and I've never seen a web server or frontend in APL like languages.
The reason why programming is hard is because most languages force you to use a hammer when you need a screw driver. LLMs are very good at misusing hammers and most people find them useful for that reason.
If you use a sane dsl instead the natural language description of a problem is always more complex and much longer than the equivalent description in a dsl. It's also usually wrong to boot.
I don't think you will find anyone who can do better than an LLM at one shotting the prose version of the problem. Both will of course be wrong.
But I also don't think you will find an LLM that can solve the problem faster than a human with Prolog when you have to use the prose description of the problem.
Using esoteric programming languages doesn’t suddenly make it true for the majority of development, which is web apps, CRUD stuff, some data science, etc.
Who is using APL and J these days? I guarantee 90+% of Claude users are developing CRUD web apps, or something similar. Your point about algebra is a non sequitur to what people are actually developing for these days.
The volume of people successfully adopting agentic engineering practices suggests this stuff isn't rocket science, but it is a learned skill and takes setup.
A year later into heavy AI coding, my experience is what you're describing should aid in being able to run 5+ agents simultaneously on a project because you know what you're doing, you set it up right, and you know how to tell agents to leverage that properly.
More LOC committed per day is probably the only one that's guaranteed when you let spicy autocomplete take the wheel.
I don't think it's at all possible to reason about the other more meaningful metrics in software development, because we simply don't have the context of what each human is working on, and as with the WYSIWYG fad of 3 decades ago, "success" is generally self-reported, by people who don't know what they don't know, and thus they don't know what spicy autocomplete is getting woefully wrong.
"But it {compiles,runs,etc}" isn't a meaningful metric when a large portion of the code in question is dynamic/loosely typed in a non-compiled language (JavaScript, Python, Ruby, PHP, etc).
If you are on the right team with the right professionals you can measure. when we first started using LLMs we decided to run the same process as if they did not exist, same sprint planning meetings, same estimation. we did this for 6 months and saw roughly 55% increase in output compared to pre-LLM usage. there are biases in what were tried to achieve, it is not easy to estimate something will take XX hours when you know some portion (for example writing documentation or portions of the test coverage) you won’t have to write but we did our best. after we convinced ourselves of productivity gains we stopped doing this. saying you can’t measure something is typical SWE BS like “we can’t estimate” and the other lies we were able to convince everyone off successfully
Maybe you're the exception and are actually doing it right and actually getting good results, but every time I have heard this, it has been an ignorance-is-bliss scenario where the person saying it is generating massive amounts of code that they don't understand, not because they're incapable but because they don't care to, and immediately wiping their hands of it afterward.
To give an example of where I hear this, it is indistinguishable from the things I hear from my coworkers: "You just need the right setup!" (IMO the actual difference is I need to turn off the part of my brain that cares about what the code actually does or considers edge cases at all)
What I actually see, in practice, are constant bugs where nobody ever actually addresses the root cause, and instead just paves over it with a new Claude mass-edit that inevitably introduces another bug where we'll have to repeat the same process when we run into another production issue.
We end up making no actual progress, but boy do we close tickets, push PRs, and move fast and oh man do we break things. We're just doing it all in-place. But at least we're sucking ourselves off for how fast we're moving and how cutting edge we are, I guess.
I dunno, maybe I'm doing it wrong, maybe my team is all doing it wrong. But like I said the things they say are indistinguishable from the common HN comment that insists how this stuff is jet fuel for them, and I see the actual results, not just the volume of output, and there's no way we're occupying the same reality.
I've seen productivity surveys of senior programmers that share the reverse, and that matches our experience. A common finding is that gardening projects are a lot cheaper now when they're just a few extra terminal tabs running in parallel - security, refactoring, more testing, etc. Non-feature backlog items that senior developers value around tech debt are less of a discussion now. They're often essential now: to make AI coding work well, there is an effective automation poverty line around verification, testing, and specification that needs to be reached.
The understanding code thing is tough. Eg, when a non-senior fullstack developer manually edits frontend css code and didn't start from pixel-perfect designs across all form-factors, do they really understand what they did? I wrote the first formal mechanized specification of the CSS standard, and would claim 95%+ of web developers do not understand core CSS layout rules to beginwith: it was a struggle to semantically formalize even a tiny core of the box model as soon as you have floats. If the AI generates live storybooks and in-tool screenshots of all these things as part of the review process, and doing code review "looks good", what's the difference?
I don't truly think this way - my point is to challenge basic claims of manual coding to be good to begin with and whether AI coding is being held to an artificial standard. What I see in commercial and defense software is a joke compared to what we do in the verification world. AI coding automating review iteration fixes in areas like security engineering and test coverage+amplification has been a blessing for quality improvement.
More fundamentally, we require developers by default to be responsible for knowing what the code does and having tested it. Every case of relaxing that rule has to be explicit, eg, clear that something is a prototype, or an area is vibed with what alternate review/test flow, and we are learning as a team what that means in different situations. In practice, our senior ai coders are doing more quality engineering work than the manual coders, both per-pr and in broader gardening contributions.
I know you said you don't truly think that way, but to counter anyway since some people seem to legitimately hold this viewpoint:
I take issue with the implication that not necessarily having a full understanding of what the code/library/driver/compiler/abstraction is doing is somehow justification/permission to embrace and celebrate having basically no understanding of what any of the code is doing. The in-between space there is the vast majority of the surface area where nuance can and should exist.
>my point is to challenge basic claims of manual coding to be good to begin with and whether AI coding is being held to an artificial standard
That's fair, and I can only speak for myself here; I don't have any inherent philosophical issue with manual vs AI, but my personal experience is that AI coding is just straight-up a frustrating nightmare to deal with, IMO orders of magnitude worse than manual. It's faster, sure, but I end my rage-filled LLM debugging session walking away knowing I learned pretty much nothing and that there's no compounding knowledge or outcome that will keep me from experiencing the same thing tomorrow, and I hate that. I am Sisyphus rolling prompts into a terminal.
But I'm not gonna sit here and act like manual coding makes you morally virtuous or pure or whatever. IMO it's a great forcing function to better (even if not completely) understand what is going on in your system(s) and I think most everyone would agree with that. What's up for debate is probably whether that's worth the time tradeoff now that we have a magic time compressor machine available to us.
Maybe I only find that knowledge tradeoff valuable because I'm a lowly IC and not some super turbo chad 10x principal who built a distributed database in brainfuck 10 years ago for fun and has nothing left to learn, or a technical founder of 5 concurrent startups who is optimizing for business value. It's possible that a heavy bias for learning/skill acquisition blinds me here.
>we require developers by default to be responsible for knowing what the code does and having tested it. Every case of relaxing that rule has to be explicit
1. If what you're replying to was a thing, wouldn't there be a open source project where I could see this in action? or Some sort of example I could watch on youtube somewhere. 2. The people that talk like this in my company, spin up new projects all the time and then just get to hand them off for other teams to clean up the mess and decode what the heck is going on.
1. Probably most of https://github.com/simonw , but take care to seperate adopted / semi-professional from exploratory personal work
2. That sounds like your company has a weak engineering culture and is early on its upskilling journey. We explicitly seperate projects into prototypes vs production, where vibes are fine for the former, eg, demos by designers / data scientists / sales engineers but traditional code review standards for whatever is going into production. That mirrors my qualifier in #1.
I find that success here is a combination of engineering seniority, prompting experience, and domain experience . Anything lacking breaks the automation loop, like not knowing how and what to automate. Ex: All of our team finds value in ai coding, but junior engineers struggle on these dimensions, so are not running the 3+ agents that senior ones are.
You seem to have missed OP's point: some things are only encoded in our brains when you are sufficiently experienced.
Translating that into code can happen directly by you, or into prompt iterations that need to result in the same/similar coded representation.
In other words, when it matters how something works and it is full of intricate details, you do not need to specify it, you just do it (eg. as an example which is probably not the best is you knowing how to avoid N+1 query performance issue — you do not need a ticket or spec to be explicit, you can just do it at no extra effort — models are probably OK at this as it is such a pervasive gotcha, but there are so many more).
That's the failure to automate. The AI isn't telepathic, so agentic engineers not automating this stuff is skipping out on the engineering part.
You setup the environment and then you do the work. Unless you are switching employers every week, you invest in writing that stuff down so the generation is right-ish and generate validation tooling so it auto-detects the mistakes and self-repairs.
sometimes you write the feature and write it well so it's reusable.
imagine you have to implement a specific algorithm for a quantum computer.
There's no value setting up AI to do the writing for you. That might be orders of magnitude harder then writing the algorithm directly.
For highly specialized one-off features, it doesn't always pay off.
On the other hand, if all you do are some generic items that AI can do well... then I'm not sure you're going to have a job long term, your prompts and automation will be useful for the new junior hires that will be specialized in using these and cost effective.
That feels like true in theory, but in practice, we see the reverse for advanced projects where AI is helping us a lot. A decent chunk of our core IP falls into the bucket you're describing:
We have been building a GPU-accelerated graph investigation platform that has grown over 10+ years with fancy stuff all over the place - think accelerated query languages, layout kernels, distribution, etc. R&D-grade high performance engineering projects and kernels end up needing a lot of iterations to make a prototype and initial release. Likewise, they're more devilish to maintain when they need a small tweak later because of the sophistication and bus factor. Both phases benefit.
AI coding helps automate investigation, testing, measurement, patching, etc. The immediate effect is we can squeeze in many more experimental iterations with more fidelity and reach. Having an AI help automatically explore the design space and the details helps a LOT. And later, maintaining a wide surface area of code here that is delicate to touch and infrequently edited is traditionally stressful for teammates, and AI editing + AI-generated automation is helping destress that a LOT. We very much invest in upgrading our team, processes, and tooling here.
I think there's a level above that where the words to describe such structure are familiar and readily available and hey guess what? The model understands those too. Just about every pattern has a name. Or a shape. Or an analog or metaphor in other languages or codebases. All work as descriptors.
This presumes that most of this stays encoded as words in our brains: the effort to translate some of these into words might be similar to translating it into code (still words, just very precise).
It's like talking legalese vs plain English; or formal logic vs English. Some people have the formal stuff come more naturally, and then spitting code out is not a burden.
No, it really doesn't presume anything about brains or information encoding. Just points out that there is a level of mastery in which all the techniques and all the forms have names or adequate descriptions. Teachers often attempt to achieve this, to facilitate education.
It's no accident there is an adage from Aristotle in the vein of: "Those who can, do. Those who understand, teach."
So yes, there is a level of mastery that is beyond being able to do a good job of designing and evolving complex systems which enables people to teach others the same skill set.
However, this is a smaller number of practitioners, and most have learned through practice and looking over how more experienced engineers apply their knowledge.
Where I disagree is that this means everybody is equally capable of teaching with words, or that there are no experts who are bad at teaching (humans or directing AI) — this clearly indicates it is not encoded as words for said experts.
It's been pretty clear in my experience that experts tend to be capable of working with the same ideas in many different forms. That's what I would call mastery. It implies "complete" knowledge, which probably means several interrelated encodings with loci in different parts of the brain. Those interrelated encodings will be highly associated, and discerning in an expert. Which implies a high degree of usefulness and specificity in communication. This matches my experience.
Yes, there are still many areas where skilled humans are faster than AI (meaning faster coding yourself, than providing so much context and guidance that the AI can do it on its "own").
But in general the statement is really not true anymore, generic projects/problems have a pretty good chance that the AI can one shot a working solution from a lazily typed vague prompt.
Yeah it’s when you go off the happy path that it gets difficult. Like there’s a weird behaviour in your vibe-coded app that you don’t quite know how to describe succinctly and you end up in some back-and-forth.
But man AI is phenomenal for getting stuff out of your head and working quick.
That doesn't matter. The statement wasn't "faster than AI right now", it was "will always be faster than AI". And that's just nonsense.
Current AI systems are extremely serial, in that very little of the inherent parallelism of the problem is utilized. Current-gen AI systems run at most a few hundreds of thousands of operations in parallel, while for frontier models, billions of operations could be run in parallel. Or in other words, what currently takes AI 8 hours will take it barely long enough for you to perceive the delay after you release the enter key.
For a demo, play around with https://chatjimmy.ai/ , the AI chatbot of Taalas, where they etched the model into silicon in a distributed way, instead of saving it in RAM and sucking it to execution units by a straw. It's a 8B parameter model, so it's unsuitable for complex problems, but the techniques used for it will work for larger models too, and they are working to get there.
And even Taalas is very far from the limits. Modern better quality LLM chatbots operate at ~40 tokens per second. The Taalas chatbot operates at 17000 tokens/s. If you took full advantage of parallelism, you should be able to have a latency of low hundreds of clock cycles per token, or single request throughput of tens of millions of tokens per second. (With a fully pipelined model able to serve one token per clock cycle, from low hundreds of requests.) Why doesn't everyone do it like that right now? Because to do this, you need to etch your model into silicon, which on modern leading edge manufacturing is a very involved process that costs hundreds of millions+ in development and mask costs (we are not talking about single chips here, you can barely fit that 8B model into one), and will take around a year. So long as the models keep improving so much that a year-old model is considered too old to pay back the capital costs, the investment is not justified. But when it will be done, it will not just make AI faster, it will also make it much more energy-efficient per token. Most of the energy costs are caused by moving data around and loading/storing it in memory.
And I want to stress that none of the above is dependent on any kind of new developments or inventions. We know how to do it, it's held back only by the pace of model improvement and economics. When models reach a state of truly "good enough", it will happen. It feels perverse to me that people are treating this situation as "there was a per-AI period that worked like X, now we are in a post-AI period and we have figured out that it will work like Y". No. We are at the very bottom of a very steep curve, and everything will be very different when it's over.
Huh, I have to say that I am impressed with Chat Jimmy. No doubt that the hardware running this model operates faster than any human. If this was possible to scale, (and I'm not saying it isn't, I just don't think it's likely right now) LLM's have a real shot of replacing real-time graphics, frontend UIs, and all sorts of interactive media if the market allows it.
I still think regardless of how fast a model outputs tokens, it still benefits the person responsible for that output to be well informed and knowledgeable about the abstractions they're piling on top of. If you have deep knowledge, you can operate faster than other people, and make those important decisions in a more intelligent manner than any model.
Maybe in the model we do get super intelligence and my point will finally break, but at that time I don't think I'll be worried about being wrong on the internet.
Ok sorry about that. I seriously don't believe him. The Agent is so fast there's literally no way you can be faster.
Telling the agent your high level plan that you are extremely familiar with and then having the agent execute on 2000 lines of code is FASTER then having you execute on that 2000 lines of code. There is no reality where that can be physically beaten by even someone who's typing really quickly with zero pause. Physically impossible.
Less boring or not? Another way to put it... although my answer is boring, I think I'm right. He is either a liar or like many other people lacks skill in using AI... because the transition to AI is happening so fast... not many people are fully utilizing AI to it's maximum potential. Many still use IDEs, many still interact with terminal. Many people still don't use it to configure infrastructure, do database administration, deploy code... etc.
Why are you starting the clock at the time when you already have a "high level plan that you are extremely familiar with"? I think it's fairer to start from "I received a bug report/feature request" or similar.
Also, haven't you ever had a situation where the prompt you started with ends up being longer than the final code diff? Perhaps a subtle bug that's hard to describe/trigger, but ended up having a simple root cause like an off-by-one error?
Also also, coding agents are infamous for generating way more code than is strictly necessary. The 2000 lines of code that the agent generated may well have been only 200 lines had you written it yourself.
>Why are you starting the clock at the time when you already have a "high level plan that you are extremely familiar with"? I think it's fairer to start from "I received a bug report/feature request" or similar.
Done both. We tag the LLM on slack in a reply and the ticket gets created and forwarded to an agent that automatically works on it. The only time a human is in the loop is review or or queries for changes.
>Also, haven't you ever had a situation where the prompt you started with ends up being longer than the final code diff? Perhaps a subtle bug that's hard to describe/trigger, but ended up having a simple root cause like an off-by-one error?
Sometimes. Getting rarer and rarer.
>Also also, coding agents are infamous for generating way more code than is strictly necessary. The 2000 lines of code that the agent generated may well have been only 200 lines had you written it yourself.
Depends on the agent and it's random. This was mostly true probably 5 months ago. It's much less true now.
AI can write 2000 lines faster than you, but you can write the 2000 lines correctly first shot faster than having AI do 10 iterations on these 2000 lines with your guidance to finally get it right
I know that a better plan could mean fewer iterations, but again that extends the time you need to spend on that plan => the total time of the AI solution
Right but those 10 iterations only take up prompt writing time. When the agent is executing I move onto other tasks in parallel. AI is faster when you parallelize your work flow.
prompt writing and parsing the AI output, and thats still work you have to do - not sure why you bring up parallelism since you cant do other things while you're writing the prompts
Other agents can be working while you're writing prompts.
Let me put it more explicitly. For one project I have 10 folders clones of the same project on my local computer. Each one of those folders is responsible for working on a different ticket/feature. I prompt one folder, move on to the next. It takes practice to get used to this style.
Again it's not about typing speed. High level plans simply don't work very well, especially for big tasks where the optimal solution actually would take 2k lines. Unless you are building something that is extremely generic, AI coming up with the optimal solution rarely ever happens.
> He is either a liar or like many other people lacks skill in using AI
Not a liar, and I'm sorry to say, but AI really doesn't take much skill to use. People who say such statements give me the impression that their ceiling for skills is quite low.
Their are areas I do and will continue to use AI and it works well enough. Giving me prototypes for projects I don't have a lot of knowledge about is one thing. But I use those prototypes to learn.
> configure infrastructure
I make templates I can copy and tweak to do this faster than it takes to tell an agent what to do.
> database administration
Don't do that... Sure get it to write you some SQL to update a table, but don't give it DB admin access for fucks sake.
> deploy code
Tell me, how is your agent able to deploy code more effectively than hitting merge on a PR? Or do you simply mean setting up CI/CD for you? That's usually a set and forget thing that doesn't take much time, so I'd rather do it myself.
>Again it's not about typing speed. High level plans simply don't work very well, especially for big tasks where the optimal solution actually would take 2k lines. Unless you are building something that is extremely generic, AI coming up with the optimal solution rarely ever happens.
Nope. Not universally true. It depends on randomness of the rng, the type of task, the agent, and also the current state of AI. Right now for frontier models... what you're saying is generally true only in the minority of times ime.
>Not a liar, and I'm sorry to say, but AI really doesn't take much skill to use. People who say such statements give me the impression that their ceiling for skills is quite low.
It does take a little skill. Very little and it requires new habits that are harder to pick up. For example. I never work on one project at a time anymore. I work on 5 projects and context switch between all of them. Prompt, switch, come back, prompt, switch, prompt switch, review... etc. That takes getting used to.
>I make templates I can copy and tweak to do this faster than it takes to tell an agent what to do.
I have a huge change, and within that change the agent does this automatically.
>Don't do that... Sure get it to write you some SQL to update a table, but don't give it DB admin access for fucks sake.
You can fuck off prick, don't fucking talk like that to my face. I do it and I have no problems with it. If you don't want to, that's your own fucking prerogative.
>Tell me, how is your agent able to deploy code more effectively than hitting merge on a PR? Or do you simply mean setting up CI/CD for you? That's usually a set and forget thing that doesn't take much time, so I'd rather do it myself.
Because the agent merges for me. Prompt: "Complete task A". Agent: "Task completed", Me: "reviewed and good to go"
The agent then does it's thing. Of course there's always some adjustments and more conversation then this but that's the jist of it.
I interpret "faster than AI" to include writing the prompt. For me (scientific computing) it is more often than not faster to write out a simulation or design in a language I know inside out like fortran or mathematica than explicate the requirements to an LLM to request the code. Obviously if someone wrote out a prompt to me and the LLM it would be way faster, but I don't think that's what the commenter had in mind.
If you're good at SQL, or SQL-like languages like Linq, it might be more efficient precisely writing a reasonably complex query than trying to explain it in detail to an AI.
I am very good at SQL, I worked half my life with SQL and teached it and know all kinds of SQL flavour. But good luck getting ahead of AI on a complex query with recursive CTEs, left outers, 625-column tables that change semantics conditional to certain prop, and then some obscure Oracle package APIs.
No way U beat an LLM on this, even on trivial ones. LLMs are better at that since at least 2024, if you haven't noticed, then you're not doing enough SQL perhaps.
But, of course it took years for people to realize they cannot outpace Visual Studio in the 90s by being very good at x86 assembly.
Not the parent but I've had this happen when debugging for sure. Sometimes I ask Claude Code to help me debug something and it makes a wrong assumption and just churns in circles burning tokens. While it's doing that I realize the problem and fix it.
What I meant is that only sometimes I am faster than Claude with debugging. When it's a standalone problem, a report in Sentry, and I just know immediately where I need to go to fix it. Then it's faster to do myself, than telling Claude what's the problem and where to look and wait.
Bugs happen during feature development, as you say, but then Claude is in the context, and I don't need to tell it where to go, it sees the bug with failing tests, or smth similar.
BTW. One thing that helps my Claude with debugging harder problems is that I tell it to apply scientific method to debugging. Generate hypotheses, gather pros/cons evidence, write to a journal file debug-<problem>.md, design minimal experiments to debunk hypotheses.
You can add that as a skill, and sometimes it will pick it up automatically, but it works wonders just as a single sentence in the input.
..but then you ignore all other times CC got it right, and statistically I would put my bets CC does it right (or Codex (or PI)) than you would, and more often is right than tis not.
besides it is a system that you query, it responds. I'm sure your dbs are not always 'right' and particularly when you as the wrong questions.
In my experience AI can write _something_ from scratch, but often edge cases won't be handled until I go through and read the results or test it. Usually when I'm writing by hand I will naturally find the majority of edge cases as I go.
By the time I've read through the results and fixed said edge cases, I usually would have been faster just doing it myself.
My experience is the opposite: AI takes too many edge cases into account and guard against even the most unlikely thing. The upside is that it often handles edge cases that I either didn't think about or was too lazy to implement.
I can with full confidence say that the code AI writes is more robust and safe than if I would have done it myself. The code definitely becomes more bloated though.
My experience has been that it wraps all the obvious things, and even some obscure things, in error handling. In this sense it is safer.
It also fails to write abstractions unless they're carbon copies of a well established pattern, and when abstractions already exist, it needs babysitting to ensure it will use them appropriately. It won't introspect about its current direction unless forced to by the user or by an error, and when forced it will happily "fix" non-issues just because you pointed them out, since it's a happy little yes-man.
Because of this, code written by a good engineer is more likely to start out broken but converges towards correctness as more abstractions get built, while code written by AI duplicates abstraction layers, leaks between them, and never converges towards anything.
I've definitely had a lot of these same experiences (in fact I've been fighting it on one particular issue the past couple of days and I'm pretty much just giving up and going back to solving it manually now).
But it still seems to get it right (or at least close enough to right that I keep using it) more often than it gets into these traps.
This has been my experience thus far. Yes, a complete prototype can be made, but.. you don't really know until you read the code and test it. Just yesterday, small things came up in terms of Qt screen focus that wouldn't have come up otherwise save for initial testing.
I think, and I recognize it is mostly against the 'agentic' push, I will stick with slow iteration.
That is not true in startups, where people are getting work done. Maybe in later stage companies where 'stakeholders' are 'synergizing' in meetings over the Q2 roadmap.
Which is still false and not serious. It's one of the dumbest rationalizations I've seen. AI has many flaws but pretending that it's useless because of that is not it.
You can definitely be faster than frontier models. The number of tokens per second is not that high and they require a lot of tokens for thinking and navigating things.
Especially if you use auto-complete AI, ironically. You type a few characters, the line fills out in less than a second, as opposed to a reasoning model that takes maybe a second per 2-3 lines it writes out.
if you've never had the experience of handing something off to someone else being more laborious and slower than doing it yourself due to having to set constraints and define success, then you simply haven't held a senior enough position to comment on this with any authority
as i understood it he's referring to the overall time it takes to build a complete finished piece of software, accounting for the refactoring and bug fixes and all that. cause handn't you understood the tools you're using you would be running into roadblocks and that adds up
Your views might carry more weight if the crux of your rebuttal wasn't manufactured outrage that I used a laughably accurate nickname for a type of software.
> I like thinking, solving problems and typing out code myself.
I get this, I totally do, and I kind of hate relegating myself to doing "project manager" work instead of "software engineer" work, but the productivity gains make it no contest on whether to use AI here. Once I comprehensively validate the spec for a new feature, Codex just one-shots it basically every time. I'm talking thousands of lines of code in a single 3-hour session, with much of my time being spent browsing the internet while I wait for Codex to run in 15-20 minute sessions.
I'd estimate at least a 20x speedup in my ability to ship.
(and before you say it, yes, I review every single line of code before merging anything, so no - it's not AI slop)
Sure, it's been done before, and I'm sure not just limited to SGI, but no one does this for regular apps these days - never heard of it before. I just find it neat that Codex came up with this - not something I ever would have.
Edit: I'm not saying no one does checksums to compare files (lol). I'm saying no one takes screenshots at specific timestamps within an app or game's lifecycle and then compares them to ensure they're identical.
Edit 2: Whoops, looks like I'm wrong and this is apparently a pretty common thing (but not at the startups I've worked at, /shrug). I still think it's cool that Codex is doing it without being told to, though.
> but no one does this for regular apps these days - never heard of it before
Everyone does this to match files as identical, be it sha, md5, or something else. I cannot imagine any other method such that it would first come to mind easily you would be doing to check if two files are the same.
I don't mean to offend but I quite literally mean everyone does this. Every software updater, game patcher, checking if two binary files are identical (pixel perfect/lossless in this case: BMP, PNG created by same encoder off same inputs would qualify, JPG would likely not), all of them do exactly this.
GPT-Analysis or a similarity and image chunk hashing would not be the first thing you turn to if what you wanted was exact identical pixel perfect. I am curious what your background is if this is the case.
No one that I've seen takes automated screenshots of webapps or games or what have you at pre-determined timestamps to make sure the app looks pixel-identical with every change.
(regardless of the method; the SHA'ing isn't the point here, the point is that it's a shortcut instead of "inspect the image for any regressions", since we don't need to inspect the image at all if it is identical)
> No one takes automated screenshots of webapps or games or what have you at pre-determined timestamps to make sure the app looks identical with every change.
I'm confused. We have done this at every place I have ever worked, it's very standard. Set timestamps, post-action, pre-action & on dozens to hundreds of combinations of OS and rendering engines. This includes pre LLM, using similarity and perceptual hashing, screenshot-ing single DOM elements during hover and off hover, both fuzzy and pixel perfect.
Huh! Well, I stand corrected. I've never seen that done (but I've only worked at startups with < 20 headcount for my entire software career so far, so that might be why).
Huh. Were they anywhere that pixel perfection was necessary such as games, or required constant browser universal testing for compliance, accessibility, being required to support cross platform?
Have any of your places used a service such as Saucelabs or Browserstack or rolled their own similar inhouse, or seen such as https://percy.io/how-it-works (random example; not affiliated or recommending this)?
I am hope I was not being too rude about it, not my intent, mostly surprising to me because a service like Browserstack is a decade and a half old already and the concept predates that.
I was wrong & you called me out on it, not rude, all good.
My first software job out of college was actually a QA Automation / SDET position, wrote an automated framework with Ruby + Selenium + Browserstack which did take screenshots of the app, but the app loaded dynamic content and there were frequent feature adjustments so no two screenshots were ever identical.
All other jobs I've had since then have been writing smart contracts for Ethereum apps - 100% backend, (I hate having to deal with frontend) so all our tests were just units & coverage & what have you.
I suppose if your environment holds constant and your features don't change frontend structure or behavior (eg refactors), then this is what you should expect.
Though, do note that this only works because my app is based on a tick/game-loop system without callbacks; if this was the standard game-development pattern of callbacks & message handling (especially w/ React / JS) to invoke events, it wouldn't work, because the timing would be slightly different each time, and an enemy would be a few pixels to the left/right of its position in the past run.
Obligatory https://m.xkcd.com/1053/ reference, but you're taking this in good stride and that's excellent. :)
If you want to go further down this direction there are all kinds of cool things you can do. There are ways to like XOR bitmaps so pixels which aren't identical show up as white and the rest are black, and the like; if you're working with something else you can look into perceptual hashing although that's a lot more computationally expensive.
Oh! And edge detection! Canny edge detectors are cheap and deterministic and wonderful for all manner of this storm.
Oh yeah, I did a deep-dive into neural networks (both artificial and human) for vision processing, it's super dope stuff. The human vision processing system is remarkably similar to some of the AI stuff we've built for image processing!
> the SpåraKoff's seats are not comfortable, but its scheduled route covers most of Helsinki's usual sights, its drinks are reasonably priced, and there is no irritating commentary
You're right - the odds of success are incredibly slim. But we may only have one shot, because the first mid-air collision in this analogy could literally be the end of the world.
How exactly? Explain how an AI would cause the end of the world? Are you suggesting we would turn over all of the world's nuclear arsenals to AI to deal with? Maybe it's just me but lately it seems like everything is being labeled as "dangerous" to the point of absurdity. It seems to be following the same line as US political rhetoric where everyone is either a Nazi or communist bent on destroying the country depending on what side you generally align with.
And the monkeys thought, “how could a human be dangerous?” “Would they clobber us with stones? Surely we are stronger!”
The problem is that the toolset available to someone who is much more intellectually capable is beyond what we can think of.
People are not afraid of AGI because it will behave like a very smart human, people are afraid of AGI because the capability gap will be more like the one we experience between humans and other animals.
The toolset available to humans when dealing with monkeys is literally incomprehensible to the monkeys.
The toolset available to AGI is similarly incomprehensible to humans.
Part of the issue is that taking AI alignment seriously does require some level of intellectual humility — a quality that the HN comment section famously lacks.
This has been written about in numerous places. There are multiple possible ways an AI might go about this if it saw that as its task; the probability of any one specific method being is of course lower than the total probability of the whole set. So any one method would be an unlikely and speculative scenario. The method in question could range from nuclear, chemical, biological, sabotaging agriculture, mass-producing CFCs or other pollutants, triggering wars, or other unforeseen approaches. Most scenarios allow that (a) the AGI is very smart, deceptive, creative, and resourceful, and can pose as a human or corporation to execute transactions; (b) the AGI is able to gain control over some means of funding, either legitimately or illegitimately, and thereby pay unsuspecting humans to perform seemingly-innocuous tasks like protein synthesis or package delivery; (c) you wouldn't see it coming, any more than you see the chessmate approaching several moves ahead, because the AGI would be appearing to be friendly and helpful along the way, and perhaps earning you lots of money, while it is secretly outsmarting you for its own ends.
For a nuclear approach, the AI would only have to hijack the least-hackproof of US, Russian, or Chinese arsenals in order to trigger an exchange from all sides. But it would probably opt for a different method that would do less collateral damage to its own resources.
This has been an issue raised since at least the early 2010s if not before, and so (arguably) predates the most recent round of US political polarization. The core arguments are unchanged, but became more urgent as AIs broke through several milestones thought to be decades out, such as defeating top human Go players, cracking the protein folding problem, and passing the Turing test with flying colors.
Forget the idea of an "AI" then, because the idea of "intelligence" makes the argument harder. Just think of a "new technology."
Is it possible that a new technology could destroy the world? Of course. It could've turned out that nuclear weapons would incinerate the atmosphere upon detonation, as some were worried they would. It could be that the next technological innovation will kill us, there's nothing prevent it in the laws of physics.
AGI is a specific technology we are worried about, because the whole premise is "once we build something that is extremely capable at a variety of things, one thing it will be capable of is destroying the world. Even by accident."
We're already using AI techniques to help with problems in biology like protein folding. Take it a few dozen iterations forward, and these systems will be helping design medicines and vaccines that no human can do by themselves. At that point, what's to stop the system from creating a super-flu that kills everyone? Forget about intent here, how about a bug?
ChatGPT often misunderstands queries, take something like ChatGPT but 100x more capable, do you really think people won't be using it to do things? And given that they will, it could easily have a bug that "oops, incinerates the atmosphere" as a side effect.
The only training required in our method is to con-
struct linear models that map fMRI signals to each LDM
component, and no training or fine-tuning of deep-learning
models is needed.
...
To construct models from fMRI to the components of
LDM, we used L2-regularized linear regression, and all
models were built on a per subject basis. Weights were
estimated from training data, and regularization parame-
ters were explored during the training using 5-fold cross-
validation.
What engine or framework did you end up going with? I looked into Unity, tried Godot for a few weeks, but landed on just making a Typescript-powered canvas game with PixiJS for graphics rendering. Found it much easier doing it this way instead of having to learn a game engine.
reply