It’s relatively simple to use llama.cpp/server to spin up a local LLM to work with Claude Code or Codex-CLI. The required llama server settings are often scattered all over so I maintain a set of instructions here for several popular open LLMs:
Do you use that as a daily driver? Claude Code' prompt is huge and causes you to spend a long, long time on prompt processing for local models, then running out of context shortly after.
Yes CC prompt can be ~30K tokens. I definitely do not use this as a daily driver. I did use it a few times for sensitive document work with Qwen3.6 MOE.
A related idea is to have the LLM quiz you, Socratic-style about a topic of interest. It persists in asking questions at deeper levels until you arrive at the answer yourself. This forces you to think hard about a problem, and this effort helps with understanding, learning and retention. Of course I made a Socratic-quiz skill for this, to use with any coding agent or similar:
For example I’ve used this to better understand counter-intuitive things about diabetes/insulin, dopamine and motivation, Claude’s implementations, etc (to combat so-called cognitive debt).
Strong LLMs are surprisingly good at this type of quizzing, they display a semblance of “theory of mind”.
I think this might be useful if you are supplementing your learning from actual sources. Like you casually said you understood counter-intuitive things about diabetes/insulin, dopamin and motivation etc but these are very complex topics and require a lot of study to fully "understand". Its okay if you just see it as curiosity-driven learning but I dont see this as a way to learn anything that is actually important or serious.
Traditional method of looking up stuff, going through guided lessons etc are just more streamlined and faster than this method.
This is really good! Thank you for this skill, it seems like this method of learning is working really well for me, it is much more engaging, and I have just learned new things about my team's project by using it.
If it's a well-known concept (like pretty much anything you can find from undergraduate textbook), the LLM doesn't need the whole context to teach you.
If it's something actually novel, no matter how much context you provide it'll still hallucinate.
Yes, it can be used to run any script in another pane, so it can spawn Claude or codex for example. Also see the Tmux-cli “launch” sub-command , it will create a new pane and run a command.
I try to counteract this "comprehension-debt" by having the code-agent quiz me, Socratic-style, about the core problems, possible solutions, why certain approaches wouldn't work, and why the specific approach was implemented. When I don't answer correctly, the agent drills down and asks more questions, until I am led to the answer myself.
I find this surprisingly useful. The quiz forces me to put in effort in thinking through the problem and solutions. And this effort likely helps in learning, understanding and retention. I also find frontier LLMs are very good at this type of Socratic quiz; they give a very good semblance of having a "theory of mind".
I made a Socratic Quiz skill as part of suite of code-agent productivity tools:
Most of the narrative is about how AI is writing all/most code, but I’d wager that the fraction of human reviewed code is approaching zero far faster than anyone is realizing or willing to admit.
Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.
Especially if you're having an LLM write non-interactive scripts to calculate complex things from large datasets, glancing at the output is not enough to know if the output is remotely accurate (unless the output is so trivial you could literally do it in your head).
Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.
Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.
Code needs tested. I'm glad that the bar of entry has been lowered but now we just have a huge amount of people that haven't yet learned anything about how to test and verify that the code meets the expected requirements.
Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
Calculate the engine power of a 2015 VW polo when travelling 70 mph on a flat road behind a box truck. Draw a chart of drag Vs follow distance. How significant is humidity on the result?
This is fine for one off tools and I do the same. But building long-lived "professional grade" production software this fails real quickly.
My team is using AI for most of the code, but the human review layer is crucial and unavoidable if you're interested in things like reliability, uptime, controlled feature rollouts, the integrity if your user's data, etc.
Pretty much. For my home IT projects I have been playing around with various means of implementing agents.
I’ve looked at the outputs here and there - and holy hell would it never pass review if I were trying to make something robust and anti-fragile. But since I can just have AI spit out a fix for the horrific “code” when it breaks in a totally predictable manner it’s just not worth my time to try to actually sit down and get it done right. Or even fight with AI by providing a good specification and design guidelines.
I imagine this is how things are going in the real world, given 30 years of working with various levels of humans. So long as the output is “good enough” it is the extreme minority of folks who care about much else. And that’s for mid-level to senior folks who have the experience to know better. Juniors wouldn’t even be able to pick out most of even the most obvious anti-patterns AI tends to spit out such as putting configuration within code, etc.
Refactoring is just in a new world too, that us olds probably have a hard time with. It’s no longer examine the code, identify design gaps, find high leverage places to start fixing, etc. It’s now “this is broken, rewrite from scratch” when it eventually turns into too much spaghetti.
In some ways being entirely focused on the outcomes is freeing in a way. But man under the hood is crazy and a whole new world.
The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s). But now even that's unreliable because libraries are being slopified at an unreviewable pace too.
> The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer
I don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.
And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".
It's weird that you think humans weren't slopifying code until LLM's came along. At least now they are implementing tests and CI and far more documentation, updating API versions, etc. OOMs above the amount they did before.
I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.
Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.
I’d say the increased scrutiny has merely exposed the difference in care between the different groups in the industry. Seems to explain pretty well why both sides are equally confounded by the other’s expectations.
Which people? I’ve never worked at a place where reviews weren’t taken seriously. For small changes a cursory glance, sure, but anything medium-sized meant checkout+local test. If anything we’d spend too much time on code reviews or pair programming?
People keep saying this like it’s some meaningful point, but the reality is many people in different projects have a shared need for that code to work correctly, and there is a social proof involved in used open source libraries. That is why people look at downloads and dependent projects as heuristics of stability and correctness. That is not the case with (and cannot be obtained with) code authored by generative AI.
Yes it can, the code will be ran and you will have the proof that it ran well. Or it won't run well and you'll re-do it. Same as with some imported library.
Running your code once or even dozens of times is not proof of correctness, see decades of software engineering practices around testing.
I'm sorry, but all you guys pretending like you don't have to review code now are going to get seriously burned by this. I also have to ask: do you think you provide any value to your company?
I can see where it's coming from. Putting it starkly, at a high level, the broad effect of AI is this:
devaluation of expertise,
whether in coding, or drawing, or music composition, or writing, or translation, or so many other areas.
College students working hard to gain expertise in specific areas are faced with the prospect that this very expertise is being "democratized" by AI, putting it in the hands of literally anyone. Sure, true expertise is still needed to "validate" (and train) the AI, etc, etc, but that's a small consolation.
Relatedly, a year ago I was excited to learn the Rust language. Now I don't see the point (And I'm building tools with Rust). I'm sure this sentiment extends across fields.
No, expertise is more important when working with AI, because it can make mistakes. Expertise is the ability to predict, understand, and mitigate such mistakes.
In science for example, anyone can do an experiment about gravity. In fact millions of high school kids do every year. What makes an expert scientist is the ability to understand all the many ways such experiments can fail to accurately measure the underlying reality.
Or consider an AI writing a press release. A PR expert will catch nuances of wording that will confuse readers, or leave fodder for others to attack or mischaracterize the announcement.
College students know this because they are working with AI. And what makes them mad is the human-driven false notion that AI devalues expertise. AI looks like magic to non-experts. But it’s not, it’s more like a “junior engineer” or “PR intern” to people with actual expertise to evaluate its output.
You and I know this. The people making hiring decisions do not. Managers and CEOs are too enamored by the thought of reduced labor costs to see reason.
Facts don't matter, only what the person making the hiring decision believes to be true, or has been fed.
College grads are angry because their job prospects are bad due to AI hysteria. It has nothing to do with how good AI is, the hysteria is what is causing problems.
> College grads are angry because their job prospects are bad due to AI hysteria. It has nothing to do with how good AI is
I doubt it. If there was nothing behind the hysteria then there would be nothing to be afraid of.
If I was entry level I would be genuinely worried, because hysteria or not, I now have to compete with AI and prove I'm worth hiring. Not an easy thing to do.
So I don't think the anger is about not being able to find a job in the field today, it's about not being able to find one ever.
I agree with this (and the earlier comment about perceived expertise vs actual expertise), and I think it goes beyond hiring managers.
The core demoralizing fact is that when people perceive that AI can give results at least as good as human experts, they choose AI, because it is faster and/or cheaper.
Expertise is more important if you care about a good end result. People pushing for AI often don't care about the end result at all. They care about quantity over quality.
This can be really frustrating for someone who spent time getting experienced. They get hit twice. First they don't get a chance to do a job because "AI replaced you, sorry". Then they look at the result and what they see is low quality slop.
> Relatedly, a year ago I was excited to learn the Rust language. Now I don't see the point (And I'm building tools with Rust). I'm sure this sentiment extends across fields.
I'm in a very similar boat! I've had rust on my to-do list for a very long time, but never found the bandwidth in the personal life to actually dig in enough to get proficient. Since AI has come around, I've been able to write a lot of tools in rust and just learn little pieces as I need to. My first couple results were not very great as I didn't know what I was doing, but I've learned enough about structuring good rust apps from the experimentations that I can crank out something pretty decent now.
The AI is so good at holding my hand that it has fundamentally changed how I approach unknown languages and stacks. I used to pick the best stack that I was proficient with for the job. Now I pick the best stack for the job, and become proficient in it. Pretty wild times we live in.
That's fair, I should have defined proficient a little better. By proficient I mean, I can read Rust code and roughly understand what it's doing. I understand idiomatic patterns, and can identify when something especially an AI, has gone against those. I am familiar with the toolings capabilities and limitations, and I can make use of them directly. I can write rust code without having to use AI, though I do still need to lean on documentation, but I don't consider that to be non-proficient as I am definitely at least proficient in C++, elixir, Ruby, JavaScript/type script, and a few others, having written many non-trivial applications in all of those over the last 20 years, and I still reference documentation all the time. I can look at a rust project's organization and infer details from it, and spot areas where things look janky. I can read and understand the details and code examples in the rust book without having to look up earlier sections on syntax and things like that. The point where I would consider myself proficient was when I was able to read the ownership sections and understand them.
Note that when I say proficient, I in no way mean mastery. It will take years to get to that point. Rust is still one of my weakest languages overall, but I've been surprised at how quickly I've been able to get up to speed with AI assistance.
At this point, nothing I've written with AI is something I don't think I could have written by hand if I had significantly more time to do so.
On a side note, one thing that I have not enjoyed about the Rust community, is a general attitude that rust is hard. I personally find rust to be a whole lot easier than c++ was/is. There's definitely a lot to learn around the ownership model, but it's not rocket science. One of the things I love about Rust is how expressive it is, without compromising on performance and developer empowerment. I'm not implying that this is what you did with your comment as I have no idea what your intentions or thoughts were, just making an observation that this is something I haven't liked.
AI is definitely not a silver bullet for anything, but it has bridged a gap that kept me from diving fully into rust in the past, which is that at the end of the day I need to actually ship something. I learn languages for fun, but also for practical use. A theoretical language that I never use is not interesting to me because it's not useful to me because I can't ship anything with it. AI lets me ship actually useful things for just myself as part of the learning process, and it also gives me a great opportunity to debug Rust code that I know the exact intention of. When trying to clone someone else's project and review or debug that, there's a massive upfront step of understanding what it's supposed to do. When it's my code AI generated, I know what it's supposed to do because the requirements/prompts came from my own mind. That's hugely powerful and something that a lot of other old school developers don't seem to understand yet.
I would frame it more that AI will cause the value of knowledge to plummet.
College provides knowledge but never provided expertise. That comes from experience in the real world. Capturable value has always been in the application of knowledge.
Experience will possibly become more valuable as a pipeline of people stop entering many industries. Some of that will be very industry specific in terms of market forces and has still to play out.
I see your point but it's the wrong framing I think. The etymology of education is “to train, mold, nurture”, “to draw out.". Task output can be emulated more cheaply, sure.
I don’t understand how this, in the context of people like Eric Schmidt lecturing people about AI, is putting it in stark terms. Starkly is to contrast these millionaire’s/billionaire’s ambitions to put them out of a job, permanently. But as usual the Silicon Valley tech disruption is put in terms of “democratizicing” X (scare quotes or not), just like taxi side hustling has been democraticized I guess.
People aren’t afraid of being out of a job, they say. It’s the usual jealously guarded guild expertise, by people who haven’t even entered any professions yet.
In the context of coding agents, one thing I find surprisingly useful is this - rather than have the code-agent explain to me what it did and why, I have it quiz me Socratic-style: it presents a scenario or problem, and asks me why a certain idea would not work. It forces me put effort into thinking it through, and even if I answer wrong, it avoids giving away the answer and persists in drilling down with further questions, until I eventually arrive at the answer. This type of effortful thinking likely helps both learning and retention. I find the frontier LLMs very good at this type of quizzing. I made this Socratic Quiz as part of my suite of plugins for Claude-Code or Codex:
https://pchalasani.github.io/claude-code-tools/integrations/...
reply