I feel like I'm using Claude Opus pretty effectively and I'm honestly not running up against limits in my mid-tier subscriptions. My workflow is more "copilot" than "autopilot", in that I craft prompts for contained tasks and review nearly everything, so it's pretty light compared to people doing vibe coding.
The market-leading technology is pretty close to "good enough" for how I'm using it. I look forward to the day when LLM-assisted coding is commoditized. I could really go for an open source model based on properly licensed code.
I also use it this way and I'm overall pretty happy with it, but it feels like they really want us to use it in "autopilot" mode. It's like they have two conflicting priorities of "make people use more tokens so we can bill them more" and "people are using more tokens than expected, our pricing structure is no longer sustainable"
(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)
I feel like they are making it harder to use it this way. Encouraging autonomous is one thing, but it really feels more like they are handicapping engaged use. I suspect it reflects their own development practices and needs.
This is something I've thought of as well. The way the caps are implemented, it really disincentivizes engaged use. The 5-hour window especially is very awkward and disruptive. The net result is that I have to somewhat plan my day around when the 5-hour window will affect it. That by itself is a powerful disincentive from using Claude. It has also caused me to use different tools for things I previously would have used Claude for. For example, detailed plans I use codex now rather than Claude, because I hit the limit way too fast when doing documentation work. It certainly doesn't hurt that codex seems to be better at it, but I wouldn't even have a codex subscription if it wasn't for claude's usage limits
Wow, weird to see someone mirror my experience so closely. At the $100 plan my day was being warped around how to maximise multple 5 hour sessions so that it felt worth it. Dropped down to the $20 plan and stopped playing the game as I know I'll just consume the weekly usage in the few days I have free. Meanwhile codex gave me a free month, their 5HourUsageWindow:WeeklyUsageWindow ratio feels way better balanced and it gets may more work done from it. Similar to you, any task involving reading/reviewing docs [or code reviews] now insta-nukes claude's usage. My record is 12 minutes so far...
Another big one for me is that they dropped the cache TTLs. It is normal for me to come back to a session an hour later, but someone "autopilot"-ing won't have such gaps.
not just the cache though. every time you stop and come back, it basically reloads the whole session. if you just let it keep going, it counts like one smooth run. you hit the wall faster for actually checking its work.
It was probably the bug about cache getting purged after 5min rather than 1hour. You can review things pretty well within an hour. 5min is a real crunch. 5min doesn't mix with multitasking or getting interrupted.
I think the culty element of AI development is really blinding a lot of these companies to what their tools are actually useful for. They’re genuinely great productivity enhancers, but the boosters are constantly going on about how it’s going to replace all your employees and it’s just. . .not good for that! And I don’t mean “not yet” I mean I don’t see it ever getting there barring some major breakthrough on the order of inventing a room-temp superconductor.
I agree with you, the "replacing people" narrative is not only wrong, it's inflammatory and brand suicide for these AI companies who don't seem to realize (or just don't care) the kind of buzz saw of public opinion they're walking straight towards.
That said, looking at the way things work in big companies, AI has definitely made it so one senior engineer with decent opinions can outperform a mediocre PM plus four engineers who just do what they're told.
Do you have any good resources on how to work like that? I made the move from "auto complete on steroids" to "agents write most of my code". But I can't imagine running agents unchecked (and in parallel!) for any significant amount of time.
Right now, I'm finding a decent rhythm in running 10-20 prompts and then kind of checking the results a few different ways. I'll ask the agent to review the code, I'll go through myself, I'll do some usability and gut checks.
This seems to be a good window where I can implement a pretty large feature, and then go through and address structural issues. Goofy thinks like the agent adding an extra database, weird fallback logic where it ends up building multiple systems in parallel, etc.
Currently, I find multiple agents in parallel on the same project to be not super functional. Theres just a lot of weird things, agents get confused about work trees, git conflicts abound, and I found the administrative overhead to be too heavy. I think plenty of people are working on streamlining the orchestration issue.
In the mean time, I combat the ADD by working on a few projects in parallel. This seems to work pretty well for now.
It's still cat herding, but the thing is that refactors are now pretty quick. You just have to have awareness of them
I was thinking it'd be cool to have an IDE that did coloring of, say, the last 10 git commits to a project so you could see what has changed. I think robust static analysis and code as data tools built into an IDE would be powerful as well.
The agents basically see your codebase fresh every time you prompt. And with code changes happening much more regularly, I think devs have to build tools with the same perspective.
It will come naturally! I have started with autocomplete as well. I was stumbling upon different problems and was fixing them by implementing best practices. Current stack is:
1/ Claude Code with yolo mode
2/ superpowers plugin
3/ red/green tdd
4/ a lot of planning and requirements before writing any code
It feels like you always touch this edge of capability of models and your current workflow. Delegate more complex task, and system fails. Delegate more simple and system works great. Improve your workflow and move this complexity to a higher level.
But... I am llm power user for more than a year and a half now. I cant delegate exactly because ive reviewed a lot of llm's code, and it is never good enough for me to step down from reviewing everything manually. I can understand how you can vibe code dashboard or tests, but vibe code your entire backend without checking it thru carefully? Madness.
For me you open a markdown editor and draft up a code plan and details of what you'd do as a coder at a high level then bust into whatever tool in planning mode (I usually fire this into the opus 4.5 model) and have it break it down into concise steps and then hand it off to a simple model (gpt spark, sonnet, composer or whatever) to execute. when I feel frisky I'll just have opus one shot it and it can be done in a few minutes.
I use Claude “on the web” or Google Jules. Essentially everything happens in a sandbox - so yolo isn’t a huge risk. You can even box its network access. You review the PR at the end or steer it if it’s veering off course.
I have Max 5x and use only Claude Opus on xhigh mode. I don't use agents, or even MCPs, and stick to Claude Code.
I find it incredibly difficult to saturate my usage. I'm ending the average week at 30-ish percentage, despite this thing doing an enormous amount of work for (with?) me.
Now I will say that with pro I was constantly hitting the limit -- like comically so, and single requests would push me over 100% for the session and into paying for extra usage -- and max 5x feels like far more than 5x the usage, but who knows. Anthropic is extremely squirrely about things like surge rates, and so on.
I'm super skeptical of the influx of "DAE think Opus sucks now. Let's all move to Codex!" nonsense that has flooded HN. A part of it is the ex-girlfriend thing where people are angry about something and try to force-multiply their disagreement, but some of it legitimately smells like astroturfing. Like OpenAI got done pay $100M for some unknown podcaster and start hiring people to write this stuff online.
I was in the same boat until last few days, where just a handful queries were enough to saturate my 5h session in about 30 mins.
Recently I've gotten Qwen 3.6 27b working locally and it's pretty great, but still doesn't match Opus; I've gotten check out that new Deepseek model sometime.
Yea, I never got how people are even able to hit the weekly limits so consistently. Maybe it's because they use it for work? But in that case, you would expect the employer to cover it so idk.
>I'm super skeptical of the influx of "DAE think Opus sucks now. Let's all move to Codex!" nonsense that has flooded HN. A part of it is the ex-girlfriend thing where people are angry about something and try to force-multiply their disagreement, but some of it legitimately smells like astroturfing. Like OpenAI got done pay $100M for some unknown podcaster and start hiring people to write this stuff online.
A lot of people are angry about the whole openclaw situation. They are especially bitter that when they attempted to justify exfiltrating the OAuth token to use for openclaw, nobody agreed with them that they had the right to do so, and sided with Claude that different limits for first-party use is standard. So they create threads like this, and complain about some opaque reason why Anthropic is finished (while still keeping their subscription, of course).
If only OpenAI spent a significant amount of money on some kind of generative software that was predominantly trained on internet comments that'd be able to do all the astroturfing for them...
A bunch of green accounts would be a bit of a tell. They need to use established accounts, ideally pre-llm, for astroturfing. This is going to be increasingly true.
> the day when LLM-assisted coding is commoditized
Like yesterday? LLM-assisted coding is $100/mo. It looks very commoditized when most houses in developed world pay more for electricity than that.
My definition of LLM-assisted coding is that you fully understand every change and every single line of the code. Otherwise it's vibe coding. And I believe if one is honest to this principle, it's very hard to deplete the quota of the $100 tier.
> Like yesterday? LLM-assisted coding is $100/mo. It looks very commoditized when most houses in developed world pay more for electricity than that.
But, it's not $100/mo. I think the best showcase of where AI is at is on the generative video side. Look at players like Higgsfield. Check out their pricing and then go look at Reddit for actual experiences. With video generation the results are very easy to see. With code generation the results are less clear for many users. Especially when things "just work".
Again, it's not $100/month for Anthropic to serve most uses. These costs are still being subsidized and as more expensive plans roll out with access to "better" models and "more* tokens and context the true cost per user is slowly starting to be exposed. I routinely hit limits with Anthropic that I hadn't been for the same (and even less) utilization. I dumped the Pro Max account recently because the value wasn't there anymore. I am convinced that Opus 3 was Anthropic's pinnacle at this point and while the SotA models of today are good they're tuned to push people towards paying for overages at a significantly faster consumption rate than a right sized plan for usage.
The reality is that nobody can afford to continue to offer these models at the current price points and be profitable at any time in the near future. And it's becoming more and more clear that Google is in a great position to let Anthropic and OAI duke it out with other people's money while they have the cash, infrastructure and reach to play the waiting game of keeping up but not having to worry about all of the constraints their competitors do.
But I'd argue that nothing has been commoditized as we have no clue what LLMs cost at scale and it seems that nobody wants to talk about that publicly.
> I think the best showcase of where AI is at is on the generative video side. Look at players like Higgsfield. Check out their pricing and then go look at Reddit for actual experiences. With video generation the results are very easy to see
Video is a different ballgame entirely, its less than realtime on _large_ gpus. moreover because of the inter-frame consistency its really hard to transfer and keep context
Running inference on text is, or can be very profitable. its research and dev thats expensive.
My point wasn't the delta in work between video and text generation. It was that the degradation of a prompt is much more visible (because: literal). But, generally agree on the research/dev part.
> fully understand every change and every single line of the code.
im probably just not being charitable enough to what you mean, but thats an absurd bar that almost nobody conforms to even if its fully handwritten. nothing would get done if they did. But again, my emphasis is on that im probably just not being charitable to what you mean.
You're most likely being pedantic, like when someone says they understand every single line of this code:
x = 0
for i in range(1, 10):
x += i
print(x)
They don't mean they understand silicon substrate of the microprocessor executing microcode or the CMOS sense amplifiers reading the SRAM cells caching the loop variable.
They just mean they can more or less follow along with what the code is doing. You don't need to be very charitable in order to understand what he genuinely meant, and understanding code that one writes is how many (but not all) professional software developers who didn't just copy and paste stuff from Stackoverflow used to carry out their work.
How is that an absurd bar? If you're handwriting code, you'd need to know what you actually want to write in the first place, hence you understand all the code you write. Therefore the code the AI produces should also be understood by you. Anything else than that is indeed vibe coding.
A lot of developers don't actually understand the code they write. Sure nowadays a lot of code is generated by LLMs, but in the past people just copied and pasted stuff off of blogs, Stack Overflow, or whatever other resources they could find without really understanding what it did or how it worked.
Jeff Atwood, along with numerous others (who Atwood cites on his blog [1]) were not exaggerating when the observed that the majority of candidates who had existing professional experience, and even MSc. degrees, were unable to code very simple solutions to trivial problems.
its an absurd bar if you are being a uncharitable jerk like i was, the layers go deep, and technically i can claim I have never fully grasped any of my code. It is likely just a dumb point to bring up tbh.
I saw your reply to another comment [0], I see what you mean now. By "understand each line of code" I meant that one would know how that for loop works not the underlying levels of the implementation of the language. I replied initially because lots of vibe coding devs in fact do not read all the code before submitting, much less actually review it line by line and understand each line.
Well that is how it mostly worked until recently... unless if the developer copied and pasted from stackoverflow without understanding much. Which did happen.
I do. If you don't, maybe you shouldn't be writing software professionally. And yes, I've written both DBs and compilers so I do understand what is happening down to the CMOS. I think what you are doing is just cope.
nah, you're kinda encapsulating what i viewed in my mind:
at what level of abstraction can you claim to actually "understand" the code?
You're claiming to understand down to the CMOS, but you are failing to even engage with what level understanding should be accepted. is "down to the CMOS" the bar? because then you're gonna be on an uphill battle as potentially the only human who traces a simple hello world python script down to it, because thats not how people develop software with high level languages.
is understanding the print()'s underlying code the bar? seems fairly gatekeepy, its kinda intuitive what a print does, everyone trusts its gonna do what its designed to do in the same way we trust the water that comes out of our faucets.
>LLM-assisted coding is $100/mo. It looks very commoditized when most houses in developed world pay more for electricity than that.
this is a small nit, but you still have to pay your electric bill, the $100/mo is on top of that. if you're doing cost accounting you don't want to neglect any costs. Just because you can afford to lease a car, doesn't mean you can afford to lease a 2nd car.
Commoditization will be complete for my purposes when an LLM trained on a legitimately licensed corpus can achieve roughly what Opus 4.5+ or the highest powered GPTs can today.
I anticipate a Napster-style reckoning at some point when there's a successful high-profile copyright suit around obviously derivative output. It will probably happen in video or imagery first.
In industry, the cost is more than 100/mo for engineers. With increased adoption and what I know now, I expect full time devs to rack up $500-$2000 usage bills if they're going full parallel agentic dev. Personal usage for projects and non-production software is not a benchmark IMO
I work with a lot of full-time devs, and it is very hard to go beyond the $200 max plan. If you use API credits, and I think the enterprise plan kind of forces you to do this, you can definitely incur this much, particularly if you're not using prompt caching and things like that.
But I and others in my company have very heavy usage. We only rarely, with parallel agentic processes, run out of the $200 a month plan.
And what do I mean by "hard"? I mean, it requires a lot of active thinking to think about how you can actively max it out. I'm sure there's some use cases where maybe it is not hard to do this, but in general, I find most devs can't even max out the $100 a month plan, because they haven't quite figured out how to leverage it to that degree yet.
(Again, if someone is using the API instead of subscription, I wouldn't be surprised to see $2,000 bills.)
Business/Enterprise accounts are billed at $20/seat + API prices, not subscription prices. You can give them a monthly dollar quota or let them go unlimited, but they're not being subsidized like in team. And team can't get a 20x plan from what I can tell.
I do. Do you? A company providing a cheaper subscription plan is not a subsidy.
I assume you meant loss-leader. We can’t know that without knowing their financials. The actual marginal cost of inference is demonstrably less than $200/mo though, so it’s not clear whether they are operating at a loss. Without seeing their books we can’t know.
Fascinated, a bummer that DeepSeek does not offer a DPA or opt-out for training. This renders it unusable for my use cases unfortunately. At least z.ai GLM has a somewhat DPA in Singapore.
The provider is a massive issue. People moving off Claude tend to assume this is solved.
Claude's uptime is terrible. The uptime of most other providers is even worse...and you get all the quantization, don't know what model you are actually getting, etc.
OpenRouter and I'm toying around with Hermes. Seems good so far, but haven't really gotten into anything heavy yet. Though the "freedom" of not sweating the token pause and the costs not being too high is real.
Thx. I'll try with my personal projects (because dues to the data collection and ToS most providers are forbidden in my company), if I can opt out of training on my input.
I'm just getting a but tired of using Opus 2.6 which eats my whole allowance and then some £££ going through the 4kB prompt to review ~13 kB text file twice - and that's on top of the sometimes utter bonkers, bad, lazy answers I'm not getting even from the local Gemma 4 E4B.
I don’t have the prompt at hand but basically I told Kimi (paraphrasing): I have these Claude code skills, and I know it uses different tool calls than you but read them and re-write them as your own tools.
I also created a mini framework so it can test that the skills are actually working after implementation.
Same. Never hit a limit. Use it heavily for real work. Never even thought of firing off an LLM for hours of...something. Seems like a recipe for wasting my time figuring out what it did and why.
Similar with the copilot and not autopilot usage. I find its the best of them all. Mostly i just use it as an occasionnal search engine. I've never found LLMs to be efficient to actually do work. I do miss the day when tech docs were usable. Claude seems like a crutch for gaps in developer experience more than anything.
Honestly, it sounds like, assuming you have no ethical qualms, you could get by with a Mac or AMD 395+ and the newest models, specifically QWEN3.5-Coder-Next. It does exactly as you describe. It maxes out around 85k context, which if you do a good job providing guard rails, etc, is the length of a small-medium project.
It does seem like the sweet spot between WallE and the destroyed earth in WallE.
I have ethical qualms to varying degrees with most LLMs, primarily because of copyright laundering.
I'm a BSD-style Open Source advocate who has published a lot of Apache-licensed code. I have never accepted that AI companies can just come in and train their models on that code without preserving my license, just allowing their users to claim copyright on generated output and take it proprietary or do whatever.
I would actually not mind licensing my work in an LLM-friendly way, contributing towards a public pool from which generated output would remain in that pool. Perhaps there is opportunity for Open Source organizations to evolve licenses to facilitate such usage.
For what it's worth, I would be happy to pay for a commercial LLM trained on public domain or other properly licensed works whose output is legitimately public domain.
thats pessimistic. do the calc assuming Cloud provider X changes your nondetermistic output every Y Months by Z probability and increases prices by 10% every 6 months.
slow and steady is worth exponentials. keep slopppping it my boid.
The market-leading technology is pretty close to "good enough" for how I'm using it. I look forward to the day when LLM-assisted coding is commoditized. I could really go for an open source model based on properly licensed code.