Hacker Newsnew | past | comments | ask | show | jobs | submit | tyingq's commentslogin

I see what you mean, though ITAR restricted software has been around for decades. It classifies some software as "munitions" :)

Once the end-user fail-safe of "use a real heuristic ad blocker that's hard to get around" is gone...the incentive for ad platforms to get around the relatively easy hostname based blockers goes WAY up. They know it won't drive people to more sophisticated ad blocker if the technical barriers for those are high.

Google's playbook of slowly eeking out this stuff so that you don't notice you're in the boiling pot has played out several times.


Much, much, earlier if you squint a little. 1998.

http://infolab.stanford.edu/~backrub/google.html

"The Anatomy of a Large-Scale Hypertextual Web Search Engine"

-> "Appendix A: Advertising and Mixed Motives"

-> "The goals of the advertising business model do not always correspond to providing quality search to users."

Not browser specific, but they understood the problem well.


Thank you for sharing this. I imagine at first it wasn't clear that the browser will become our primary interface with the web. Microsoft hoped it would be the OS itself with Windows 98, if I'm not mistaken.

It will work fine until Google and others stomp out all easy paths to smart, heuristic based ad blockers.

The ad folks don't work around ubo lite now, because they understand it drives people to ubo actual.

Once that option is not really reasonable for anybody not highly technical... That's when the fun begins.


Were things like "300 employees" and descriptions of the deliberately low key hdq out there before? That counts as actual information to me.

Low-key HQ is normal, as they often share office buildings and external signs need approval/permits. I expect tha5 is also the case in the US, right? I am sure they have a name badge at the first floor, as that is common.

> 300 people What did you expect, thousands?


It mentioned the lobby also, not just external signage. Yes, it's unusual to be that low profile.

I did not have expectations about the number, but now there is a number.


> it's unusual to be that low profile.

Perhaps I got too used to this ;-)


It will absolutely cause some non-trivial number of customers to shift their configs away from Anthropic.

It's worthwhile to remember that this is only true of Mythos/Fable and other future models of "similar or higher capability levels" (ant is treating this as a new tier of model above Opus). Anyone who's already been happy using Haiku/Sonnet/Opus on Bedrock will not be affected by this at all.

Yes and no. Anthropic controls what is determined to be "similar or higher" and when models are deprecated. Will sonnet 4.7 be "too powerful"? Because once it's released. 4.6's days are numbered.

This created a huge future risk for our org and we're already scheduling meetings over it. Regulated industry, we can't lose control over our data governance or residency controls, let alone the lack of visible audit trails that could reveal customer or PII.

Just an absolute bomb of a release


>Anyone who's already been happy using Haiku/Sonnet/Opus on Bedrock will not be affected by this at all

It is still adding operational overhead because we now need to vet all models and deny access to any retaining data

Previously it was "use and experiment with anything Bedrock offers--the data stays in AWS so we are not concerned"


So basically all models going forward?

I don't think anyone currently thinks the Haiku/Sonnet/Opus models are "good enough" such that they would not want improvements. Users may be cost conscious, but almost every task could be done better.


+1 to other commenters here. * They forced Bedrock for instance to change the existing settings for ZDR / ZOA. It used to be enough to have a default. Now we must set to 'none' and pray it does what it says. * And then there is that BS about "contact your account manager, we will decide account/model retention and sharing individually" Just this creates so much uncertainty that Bedrock has become "glowing in the dark". * We have already moved everything to Gemini on Vertex.

PS: this is what you should see as an error from Bedrock. Anything else is not enough today: "AWS Bedrock Error: An error occurred (ValidationException) when calling the ConverseStream operation: The model returned the following errors: data retention mode 'none' is not available for this model"


Which will work for the several weeks it takes for the other commercial providers to follow suit.

The tides are turning. AI companies are IPO'ing. They've gotten where they are by selling $5 bills for $1, to update the old VC adage. I think we can look forward to them rewriting the contracts, both literal and social, on AI going forward to capture a lot more of the value. Or, to put it in more HN-friendly terms, it may not be immediately obvious on a casual viewing, but you're looking at the beginning of the enshittification process hitting AI. The term is a bit deceptive in some sense, because it's not like anyone ever sets out with a terminal goal of making something shitty. It's downstream of trying to capture more value in the customer/vendor relationship by not giving the customer any more value than is barely necessary.

How's coding with qwen doing? The only thing that's going to stop the AI providers from extracting all the value until it's just barely worth using is the free competition.


Bedrock supports many models. Open weights models aren't far behind, maybe a year, 18 months.

Given they could have done this with data residency rules being respected and chose not to suggests all I need to know - this is for Anthropics IPO, not for user safety


>Open weights models aren't far behind, maybe a year, 18 months.

No, open weights are always a year behind +. By the time that year passes Anthropic/OpenAI/Google will have some new model that is ahead of the open models by a year.

Looking at computer security for the last 30 years, no one gives a fuck about user safety. Companies care about profits, and individuals don't care enough for strong laws.

We'll be back here in another year on HN talking about why we should give our retina sample and blood to Anthropic to use the model with a ton of people doing it. It's just the way humans are.


Surely some provider will see the then open opportunity and offer something to capture it.

You’re underestimating how much companies are willing to bend over backwards if they can “get ahead with a god model” compared to their competitors.

No, I'm not. Yes, those companies exist. And, so do many companies on the other end. Where they bend over backwards to ensure their data only lands in places where they have the exact contractual language they want. Any stodgy F500 typically falls in that category. They would not likely be using Anthropic through the AWS "bridge" in the first place if they were chasing latest/greatest.

Would be really curious what the internal market share for Kiro is. Not a really good look for it if it's just smattering use here and there.

It's the worst harness i ever used. And they are selling it like crazy to all their enterprise customers who don't know any better. A true money printer at this point.

i like it a lot. why dont you?

The abrupt swing in many non-technology company IT departments from "hey developer, you aren't using enough tokens" to this is just too funny.

And I'm seeing almost no self-awareness from leaders. They are making decisions about things that they just don't understand. And are completely unworried about it. Just blindly following whatever the news cycle is about AI.


The closer people live to the consequences of their decisions the more rational they become. Until leaders(and I use that term loosely) are held accountable, the insanity will continue.

Their only accountability is to the stock price. The insanity will continue.

As long as our stock price continues to... Continues to rise... Which... Hmm... I'm just now reading our balance sheet. Is this number right? Great, thanks.

As I was saying, you're all fired.


I’m willing to bet that most of us here are capable of acquiring pitchforks and torches.

I predict that will be their comeuppance; it will begin a new era in history.


In addition to being true, this observation is profound. When designing any multi-step system that relies on humans making decisions, whether in governance, organizations or economies, placing root causes as close to end effects as possible is almost always better.

I’m sorry you are used to working with out of touch leadership. Not all companies are like that. Even big ones can have smart, empathetic leaders. Although very often money gets in the way of empathy.

Money alao has the problematic tendency to warp the people around you, it's its own kind of gravity. The more powerful you are the more you attract yesmannerism and the more you lose touch with what's going on.

Also notably these attributes don’t make one infallible. I see a lot of engineers judging from the sidelines without any sense of how to run large orgs and how you have to make tough calls with imperfect info all the time.

Being out of touch is the default state for leadership. They mostly just parrot the news with a multi-month lag.

You hiring?

I've been enjoying journalist Ed Zitron's recent diatribes about how impossible it is to find a business leader who had a plan for measuring their ROI from adopting AI coding.

What he says he's consistently hearing from them mirrors what I saw at my own employer: they thought they had ROI metrics, but they actually only had usage metrics such as "lines of code committed" or "number of pull requests". The only way those could possibly work as an ROI measure is if your business charges customers by the line of code.


What they really means is they previously had no valid metric to measure productivity of developers before either. AI or not.

Measuring productivity of developers isn’t really in line with what needs to happen, either. A team can be incredibly productive and still generate negative 100% ROI if what they are building so industriously is stuff that nobody wants to buy.

Which reflects another thing I’ve seen at work. A lot of what AI coding has enabled is diving headfirst into quagmires. Our costs have spiked - not just because of the token spend, also because we gotta pay the cloud platform to run all these new services, operators to operate them, marketers to market them, etc. - but revenue hasn’t budged.


But at least pre AI, most managers presumably subjectively measured devs on relevant performance. Using systems where employees who burn the most tokens ($) per week ‘win’ is crazy - just ask the AI to spin up a subagents to implement every conceivable approach to a task, then spin up n agent judge to pick the winner, and repeat. You've immediately got 50x or whatever your previous usage from that alone.

I had cynically done this sort of tokenmaxxing for a while as a burnt offering to the token-hungry non-leadership.

Eventually I got tired of it and got back to work.


During ZIRP they discovered that the way to lead companies nowadays is to become a maxxer of whatever current fad is, and the more you maxx the better. And then when things change and you're wrong, you'll be a strong leader and, in ZIRPs case fire everyone you over-hired, with AI will be similar.

Why be a normal guy that waits to see what happens and is measured and pragmatic when you can get attention basically through the whole cycle by being the earliest adopter, adopt it to the maxx, then also be the loudest big brain when the tide changes and be praised for "taking hard decisions" when you revert everything you said so far?

The fakemaxxing economy.


A special case of the more general cringe economy we're in. The dumbest, most outrageous ideas win, amplified by social media. Say stupid sh*t loudly, be wrong, profit.

Groups resist to change - the bigger the group, the most resistance there is.

As a leader, pushing for rapid change cannot really be nuanced lest the push dissipates into the organization's entropy.


Perhaps, but the change you get (if any) is most likely to be what you push for and reward/punish.

It's irrational to push for tokenmaxxing (literally "please increase our AI spending") and not expect that this is the result you are going to get. You won't get productivity increase, since that is not what you are pushing for - you will get token usage maximization (engineers running inane agentic tasks against your code base to increase usage, using company paid AI for their side projects, etc, etc).


The evidence suggests that many tech leaders do not realize that an immediate result of heavy handed uninformed top down decision making is transforming the “work together, succeed together, giving quality” ethos into a cynical game theory minimax effort to game whatever stupid arbitrary metrics are used to implement the top down fad of the quarter; do it consistently and you get a work force that can be given a metric and immediately, instinctively, tell you how the work flow will be adjusted for the new metric, and where the difficult problems will be shunted to.

I'm not sure the leaders would disagree with what you're saying. They tokenmaxxed to understand what it looks like when AI gets into every corner of the business; now they feel they've gotten enough info (or at least that more info wouldn't be worth the cost), so they're adding in cost controls. As the article says, this is not great for AI model providers trying to predict what their future revenue is going to be, but it's not obvious that there's any mistake here for AI users.

> They tokenmaxxed to understand what it looks like when AI gets into every corner of the business

Perhaps that is what they were trying to do, but the reality is that all they will have got is a large token bill. The decision makers may have hoped that tokens would be used in most productive fashion possible so they could evaluate if the cost was worth it, but what they will have actually got is what they asked for and measured, high token usage (applied to whatever people needed to do to get their usage stats up, regardless of productivity).

The other business-as-usual factor is that there will be false reporting up the chain, so if the company understands the CEO want to see high AI usage and productivity gains, then s/he will see high AI usage (a large token bill) and will be fed success reports of corresponding productivity gains.

In a typical corporate environment, if all your peers are reporting success, achieving what the CEO wants, do you want to be the only one reporting failure? So - everyone reports high AI usage (easy for the employees to make happen), and most everyone also reports productivity gains if they understand this is the expectation.


I’m imagining a lot of programmers suddenly being given the impossible task of reporting what worked and what didn’t, and middle management making up some retrospective evaluation with fat PowerPoint decks and meaningless graphs in an effort to present to C-levels some measures of success other than token use.

As the saying goes "figures can't lie, but liars can figure".

If you want to report productivity gains or cost savings from some initiative (increased AI usage or whatever) and need some stats to point to, then you just point to whatever is working, for whatever reason, and attribute the success to the new initiative.

In a company I used to work for, one manager, when pushed to increase machine learning usage (a few years back, before ML became AI), just renamed his product from foo to foo-ML (with ZERO ML usage), and reported how well it is working. He has since been promoted twice.


It’s not clear companies were measuring anything but token usage. What information could leadership have collected to determine what worked, what didn’t, and what needs more data? Other than the balance sheet and revenue, do companies actually have sufficient information to understand the results?

Were they trying to measure other things? Definitely. The COO at Uber, one of the examples in the source article, has talked publicly about how they've searched for (and so far failed to find) a link between micro-level metrics driven by AI and concrete improvements in high level project velocity.

Do these measurements have sufficient information? As much as any, I'd guess. It sounds like you already know that it's pretty hard in general to measure the productive output of software development organizations.


I have no doubt a few companies, like Uber, were measuring other things and had applicable metrics in place before adopting Clod or CoPilot or whatever automation. I'm speaking in the general sense of companies adopting the latest hype without reflection.

I feel like most successful businesses have such a moat of required capital to compete with them that even tho in theory poor decisions like this is supposed to give opportunities for entreprenuers to hit when the big dogs make a wrong move, it doesn't end up happening.

> leaders

Don’t play their game and call them leaders. They are management, bosses, executives.

> They are making decisions about things that they just don't understand. And are completely unworried about it.

Clowns, even.

> Just blindly following whatever the news cycle is about AI.

But followers might be most apt.

——

This is such a huge pet peeve of mine. Describing management goofs using their language that makes them sound all-so-brilliant. We constantly watch these people do the dumbest shit and then they go around describing themselves as “thought leaders” and “servant leaders”. When, really, most are just clowns with fragile egos.

And, while I’m rambling, they’ve tried to take away the fact we are workers by calling us individual contributors. Using language to attempt and hide the hierarchy and power dynamic at play. It just…bothers me so much.


I don't hear them refer to themselves as "job creators" much these days.

And many of them still claim they are "risk takers", but have effectively insulated themselves from risk by socializing losses.


> Don’t play their game and call them leaders. They are management, bosses, executives.

You're falling into a common trap here: the ambiguity of the English language.

"Leader" means multiple different things. Yes, it means someone who has leadership qualities—who genuinely inspires those around them to do better, or who boldly marches into the unknown and gets people to follow them.

But it also means "someone in charge of a thing."

Now it's certainly true that many people in charge of things who are also really bad at actually inspiring or getting people to follow them (aside from with threats of destitution) also play on that ambiguity to try to convince people that because they're in charge of things, they must also be Good Leaders, and that's crappy...but yelling at others for using the term casually is very much an "old man yells at cloud" situation.


The worst part is that techies can still work around the insanity if they keep their opinions private. For the serious average Joe the AI mandates must be feeling like hell on earth.

I once worked in a company that had soviet-level efforts to push LLMs into everything, someone eventually made the classic "Natural Language -> SQL Query -> Magic Result in webpage" and got promoted, the tool got mandated for every non-tech employee as part of an AI-boosting effort (people pushing metrics up).

One day I wake up with a product person in despair because the tool couldn't handle what looked like a very simple aggregation, I stopped what I was doing, crafted a 30-line SQL query over HORRIBLE TABLES, a couple CTEs and window functions here and there got him what he wanted. I found out later that single query that took 30 minutes to make saved him from inheriting a 6-month effort to create a microservice dedicated to patching said tool.


I've never seen self-awareness from leaders. They always lead on vibes.

Understanding this was one of the most important things in my career.


That's nothing new though. It's just very obvious this time.

Having studied control theory I think it makes perfect sense. When trying to make a system target a new level it's quite natural for there to be overshoot that needs to be reigned in. It's also natural for the correction to go too far and need to be corrected in turn. This is not indicative of stupidity it's completely normal.

It would only be laughable if they waited way too long to reverse course, but I don't think that's the case.


Suppose I'm driving at 20 kph, and I set my cruise control to 40 kph. My car then goes WOT, overshoots my target speed and hits 120 kph, at which point it slams on the brakes[0], dropping my speed to 15 kph. It repeats until it finally settles at my target speed. (Rhetorical question) would that be considered "completely normal"?

Over/undershoots and corrections are of course unavoidable and normal; the absurdity is at the magnitude and rate of change. Furthermore, this is giving it the benefit of the doubt, that measuring AI spend is a good indicator; that's arguably also in dispute. To stretch my car analogy a bit more: it would be like the cruse control system has to hit the target speed, but it only has data from the O2 sensors.

[0] I know that the "classic" cruise control system cannot apply the brakes, but hey no analogy's perfect.


It's not like they accidentally overshot, they were telling people to tokenmax, they didn't even know you could overshoot they thought it was exponential gains all the way. Subtle ideas like balance were not on their minds.

Intentionally overshooting can be a legitimate strategy.

The actual cost is going to drop 99% in ~4 years.

How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.

Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.

No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.

It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???

Wrong...

But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.


Going from Opus 4.5 to 4.7 secretly required 6x more compute to run. 4.8 is apparently 30% more on top. I haven't seen any optimizations lately aside from distillation. Nobody's optimizing, they're just scaling up.

> Nobody's optimizing

The Chinese, since they lack computing hardware due to US export controls, are.


And our export controls are going to turn China into a winner in the AI arms race if we're not careful.

I retired a few years ago, but I still write a fair bit of code. I was using Copilot's code completion before I retired, but coding agents hadn't come around yet. I've been wanting to try them, but I kept putting it off, and now the price increases make it hard to justify.

So I just started trying CodeWhale (https://github.com/Hmbown/CodeWhale) with DeepSeek V4. I expected to be impressed by the abilities (which still require plenty of oversight). I didn't expect to be completely shocked by how cheep it is. After most of a week of using it 4-8 hours a day, which would amount to a full week of coding in many jobs after you account for non-coding activities, I'm about to hit $3 in total usage. So we're talking $10-20 per month for single-agent use by a full time software developer? And I'm sure some of my usage is waste as I'm still getting my head around things like compaction. If I take a break for a few weeks, I pay nothing because there is no subscription.

If DeepSeek and Xiaomi MiMo stay within a few months of the US-based models in terms of capabilities and US companies don't figure out how to drastically cut prices, I can't see how China hasn't already won. Protectionism would be one reason, but that might be ceding 50-90% of the total addressable market, and bring us closer to moving knowledge work out of the US the same way we did with manufacturing because it's too expensive in the US.


Holy F.. $3 .. once I'm done with my base cursor allocation, each nontrivial question costs $5 . And yes, I'm now switching to a mix of codex and ds4pro

How are you using it? More to complete specific functions or scripts, or for larger architectural design and longer implementation runs?

My initial use was in a repo where I create models for 3d printing using a library called build123d. There are a handful of parametric models and then many instances of those models with parameters (one that's 24 mm in diameter with a cutout, another that's 42 mm in diameter but no cutout, etc.). I tend to be in a hurry when I want a new parametric model, so I've ended up just copying the one that's the most similar and changing what I want to be different.

The first big task was to find the common bits and abstract them out. It did a great job of creating a plan, summarized in a table, that gave a name to shared chunks, the line numbers in various files where they appeared, line counts of new functions vs. removed bits, and some pros/cons about splitting out each chunk. It was very well "thought out", so I told it to go ahead. It did a nice job other than straying from my coding conventions. That gave me a chance to build out my AGENTS.md file (it helped with that, too).

Once that was done, I had it create automated tests for the newly abstracted parts. I think this is probably a bad practice... I believe humans should at least define what the tests are testing so that there is a deeper understanding of what oversight is in place. But I was just trying things. It surprised me how well it did. The biggest surprise was that the tests seemed quite inspired by vision. It would try different parameters and then have comments about making sure the shape protruded in a certain way, then code that did that. I expected it to refactor a bunch of the code to make it more testable. It found a way to not touch the code while testing everything I asked it to with just two simple mocks - I hadn't foreseen that, but it felt quite practical. It was passing around several opaque tuples in the tests and accessing items in them by index. I prompted it to replace the first one with a frozen, kw-only dataclass. Then a second. On the second request, it saw the pattern and did the rest without me asking. It created 44 tests across a handful of files.

The next part is where I was the least happy. I use ruff and ty to check my code with almost all checks enabled. It was mostly good about the ruff issues. But for the type checking, it just wanted to disable 6-8 rules for the entire repo in pyproject.toml, or at least for all the tests. I had to repeatedly tell it not to and it kept telling me it wasn't recommended. When it finally gave in, it fixed most of the type issues (build123d has lots of types specified, but many operations result in type conflicts because things are so deeply overloaded). The things it didn't fix, it just left a comment to ignore type checking altogether on that line. After I did a little more brow beating, it finally changed the comments to only disable specific rules. To be fair, and unlike most of my other repos, I've had to spend way too much time getting types right in this repo myself.

My last task involved a small library management system for our little town library (tracking library cards, books, DVDs, check-outs/check-ins, etc.). I inherited it from someone who had built the entire web app out of bash/awk/troff scripts with the data in text files burdened by a lot of schema changes that he didn't really know how to deal with. I'm halfway through moving it to Python/FastAPI/SQLite. I asked it to do a security audit of the entire code base, both the newer parts and the old parts that are still in bash/awk/troff. It found everything I knew about and a few things I didn't know about. It made a decent assessment of the risks/impact of each issue. It also called out design decisions that were good security practices. One of the next big tasks will be to see how it does at continuing the migration - it has enough examples of how I've done it that I suspect it can do something fairly consistent with my thinking. I'll probably have it do one or two web pages. When I feel like it understands what I'm after, I'll tell it to use sub-agents to do the rest. I'll be very happy if I don't have to tease apart any more troff scripts that are generating PDF files!


DeepSeek and Alibaba would like to have a word.

Hasn't everything DeepSeek and Alibaba created thus far been distilled from the results of many, many accounts logging into Claude and ChatGPT? And that's why there's so much bot detection now at US frontier labs? Doesn't that make the Chinese labs dependent until some unknown point in the future on advancements of US frontier labs? While what they currently provide is cheap, it seems like it's artificially cheap and somewhat static because they took others' intellectual property (no comment needed about US frontier labs stealing the world's knowledge... that's a separate topic).

> Hasn't everything DeepSeek and Alibaba created thus far been distilled from the results of many, many accounts logging into Claude and ChatGPT?

I doubt it is really any different to what the US labs do [1]. I never really bought the "they were basically all just distilling from us" shtick from Anthropic, I just assumed they were either comparing or also creating training data as basically any lab is doing.

[1]: https://www.reddit.com/r/ClaudeCode/comments/1tqaist/opus_48...


> The actual cost is going to drop 99%

Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.


DeepSeep V4 Pro is 99% cheaper than similarly performing models were 2 years ago (if such a model even existed).

Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.

My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.


Moore’s law is dead.

It died before AI came around and today's coding agents are somewhere upwards of twice as competent as whatever the state of the art of automatic coding was in 2020. 8I

A good chunk of that was one-time gains from shifting GPU and memory architectures to better match what LLMs need at scale as well as some algorithmic improvements. Most of the low-hanging architecture optimization has already been harvested. We'll certainly have more algorithmic gains but the consensus is they'll generally be smaller and less frequent.

There's always a chance we'll have some dramatic gains far larger than DeepSeek's optimizations a year ago, but it hasn't happened again yet at even that scale. It would be nice but I certainly wouldn't count on it.


I don't see how this is even remotely true. Unless there's some super breakthrough into a fundamentally different architecture, there's not really a path to a 50% reduction in price, much less a 99% reduction.

In fairness, I think _current_ capabilities will be cheaper. So the models of today will be run drastically cheaper in 4 years.

And yet 90% drops for the same level of quality every 18 months have happened like clockwork...

And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.

Just look at DeepSeek's pricing...


What makes you think prices will drop? Everyone I’ve spoken to believes they will only skyrocket. Genuinely curious

The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.

Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).

Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.

If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.

The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...

Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.


Sure, the price will come down a lot, even if we can argue about the timeline.

I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.

Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.


This is great food for thought, thank you

Additionally, on the context front -> all the labs are aware that for many tasks you can get 10x+ increases in output quality by feeding better context.

See https://arxiv.org/abs/2604.04364.

This won't really show up in benchmarks, but it will impact real world usage on the most common use cases.

I'm doing a study right now on the impacts of better context for small models to fix bugs.

A very dumb algorithm can make small models perform at 10x+ model sizes. I'll be surprised if it can't get to 20x+


This is mostly slop. But you may be directionally correct

I didn't take you seriously initially but after reading this, i think you are the real deal.

Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.


Prices have been very obviously trending up, not down. Even open weights models are becoming more expensive with every release. Computer hardware is ballooning in price.

Prices are going up for BETTER quality -> not for the SAME level of quality.

People are willing to pay more for BETTER quality.

You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...


Maybe so, but that becomes irrelevant when you consider that the new, better quality instantly becomes the expected baseline. So the price of the "baseline" quality is going up regardless.

Let's look at GPU prices as an example. Around 12 years ago, I bought a GTX 970 for around $350. That was considered a very good GPU at the time. Today, the "equivalent" GPU model (RTX 5070) now costs almost double. Of course, the newer GPU is much more powerful (more than double, in fact), but all the things you'd use a GPU for have also advanced and now expect an entirely new level of performance as a baseline, such that the older GPU is fairly worthless today. So most people agree that GPUs in general have become more expensive.

Regarding DeepSeek's price: it's obviously subsidized, and unlikely to match the actual inference cost right now.


Just wait for the next model and the next model architecture. Just wait for it, bro.

Gemini 3.5 flash is 25% cheaper than 3.1 pro, and outperforms it on almost every benchmark, most by a pretty wide margin...

It's still 5x more expensive than 2.5 flash

Cool.

There has never yet been a new model which actually improved over the previous ones. They suck just as much, and in the same ways, as the models of 3 years ago.

Grab a 5090 and run Qwen 3.6 35b on it (6 parameter seems to work best for me).

Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.

Whilst you're at it spring for a Claude subscription too and GPT.

Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.

Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.

It is ridiculous how cheap this stuff is now. It's affordable at third world prices.


None of that is cheap.

> spring for a Claude subscription too and GPT.

You started with some random pricing then veered off into impractical hand waving. Far above third world prices...unless you count the USA as third world, I guess.


The extra subscriptions are optional. You can do nearly all of it with just a DeepSeek subscription and switch between Flash and Pro.

If you have the $$, do the extra stuff. People who like to play video games often have a very fancy graphics card that sits idle during their work day.


> The actual cost is going to drop 99% in ~4 years.

And fusion power is just 2 decades into the future!


Full self driving guaranteed here before the end of the year (every year).

> The actual cost is going to drop 99% in ~4 years.

We have little visibility into current frontier model costs at mass scale. As a broad historical trend, tech costs tend to fall over longer time periods but your claim far exceeds Moore's Law rates in its heyday - and that heyday is long gone.

In 2021 TSMC announced it was increasing it's price per gate for new nodes for the first time in its history. In the past five years cutting edge nodes have delivered ~8-15% real-world performance gains on average at costs at least 10-20% more than the last node. If you're positing a string of unprecedented efficiency breakthroughs in LLM algorithms - such extraordinary claims require extraordinary evidence.


The pro-MCP arguments sound a lot like the same ones for SOAP, J2EE, "Enterprise Service Bus" and other "once-dominant, now dead in favor of dev driven simpler solutions" tech.

You could borrow the output of the perl scripts from openssl.

https://github.com/openssl/openssl/blob/master/crypto/aes/as...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: