Hacker Newsnew | past | comments | ask | show | jobs | submit | antirez's commentslogin

In this game, who wins - in the long term - is who has the best model: so far OpenAI is ahead, so in the long term this is what matters. However, for the same reason, if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon. The game they play only make sense if their SOTA models do things that other models can't do at a comparable level.

> if in the future open weight models will be very near the quality of frontier labs, Anthropic and OpenAI will be out of business very soon

> Why would a business pay for Slack when IRC exists?

> Why would a business pay for Dropbox when FTP exists?


AI is not a product per se, it is a technology you can decline into a product, and the product has a lot less value than the technology itself. Who has the best LLM can copy any product idea and make it a lot better. Similarly if open weight LLMs are everywhere and powerful, open source products in the space of agents are too simple to replicate for people to pay big money to a few companies: not everything is alike, not every parallel makes sense. The pi agent is good as a replacement for Codex and Claude Code if you wire frontier models to it. And when products are complex and matter a lot, like complicated AI-powered design suites for instance, there is no reason why OpenAI / Anthropic will win this space instead of a random startup. So either a few companies retain frontier AI, or those companies will die.

About IRC / Slack: other than the fact IRC was abandoned, Slack is about control, not product. The product is terrible.

FTP / Dropbox: this comparison does not make sense.


OpenAI and Anthropic have the know-how for building much larger models that will be a lot smarter and run on datacenter-scale compute. This is a natural 'moat' that will be inherently hard to replicate for on-prem compute or small neoclouds running open-weight/local AI. They can easily coexist with a robust local AI scene.

IMO bad take.

You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant.

I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.

You might have a subpar product (for the price) but the reputation and history is what makes people open their wallets.


> I'd bet you that after 20 years OpenAI and Anthropic would still be around and kicking.

Depends. The bigger the bubble, the bigger the pop.

Only a few unicorns from the dot-com bust came out the other side (Amazon, Google, ... anyone else?), and that was a piddling affair compared to this one.


Yahoo is still around and kicking. Even Lycos' corpse is still warm.

> You can theoretically do most things AWS does most of the time, yet people pay premium for it and keep paying for it, even though alternatives are cheaper, simpler and more performant

It's going to be debated forever whether wiring your own open source tech has a lower development cost than the equivalent AWS bill. For me, that's too broad a statement, as I have seen it go both ways. What is true: There is only some knowledge overlap between maintaining an AWS stack and having your own Prometheus logged, ceph backed set of boxes.

That is not the case with LLMs. At least, not right now. They roughly work the same and are easy to pick up. They are about as straightforward of an interface as it gets, and using them in "advanced" ways could be summarized on an index card. They are relatively fungible.

I don't see a world where OpenAI runs on brand recognition alone. It needs to be more convenient to run than local LLMs. They've done that by buying so much of the worlds hardware that it becomes more expensive to run these things locally.


this is like saying the car with the better engine wins, but all we're doing is commuting to work

Comparisons like that give the impression of reasoning about things, but it's a weak tool to understand reality of very different things.

I have the same impression. Strange to see this being downvoted & it was after reading the comment that I read the username to find out its antirez!

Now, I think that with these companies IPO'ing and Nasdaq and other bending themseleves and their rules to cater to them (as in case of SpaceX), these companies are very close to an IPO.

So for the employees, they are probably gonna get good evaluations, atleast in the short term and perhaps they are having a problem which is worth having.

But as you have suggested, I feel like the whole thing might be flaky especially given open source models. I believe that OSS models are at worst close to literal SOTA ~6 months ago.

So OpenAI & Anthropic have to somehow always be on the edge to get better models to not lose this (imo) very small time grip that they have, all while losing billions of dollars and having to worry about profitability & so many other concerns in it of itself.

I don't think that there is any other thing inside CS or any industry where two pieces of software being almost comparable enough with not much moat around except a diff of 6 months best, is something on which trillions of dollars float around on. We don't know how things will pan out but if I have to guess, It might not be looking good for OAI, Anthropic over especially the longer horizon.


I really want Europe to be part of the AI development and research. And I strongly cheered for Mistral. But they are accumulating too much technological delay. This needs to be fixed, otherwise it will turn into yet another proof we are not able to run large tech with good results. Basically any Chinese lab is doing much better. It's not Mistral that created I don't want to say DeepSeek, but MiMo 2.5, Minimax 2.7, and so forth. There are only weaker and/or larger and slower (no MoE) models. Not good.

https://en.wikipedia.org/wiki/Artificial_Intelligence_Act#Pe...

Europe shot itself in the dick with this hastily implemented at the height of mass hysteria bullshit and now no sane company will build anything there. an AI startup in the US or China can be a boy and his computer. in Europe, the boy needs a dozen lawyers.

Mistral's sinking into irrelevancy despite the head start they had, the very promising early models they released, and the funding they receive, might very well be the consequence of trying to comply with all that crap.


You don't compete with anthrophopic from the basement. For that you need either a shit loads of money, or a government which are not afraid of getting very very involved.

There is a lot of Europeans working on AI, it's just that a lot of them work for American companies. Because of money.


I think both of you are correct.

Possibly yes but let me remember you that France, Italy Germany were against the AI act, so here something very odd is happening, that the EU funding nations are getting marginalized by the countries they welcomed on key topics for our future, and I believe corruption could be a big part of what is happening, both internal to those three countries and at an even more alarming rate in other countries.

EU big nations getting marginalized: haha. The only reason there’s no US-like tariff on Chinese cars is because Germany was too scared it would lose its access to Chinese market.

> the EU funding nations are getting marginalized by the countries they welcomed

Thank you for reminding us that all animals are equal, but some are more equal


It's not a matter of importance, but that bad actors as they tried to do in Italy, corrupting EU parliament, may be doing the same with counties are have less visibility. A weak EU is not also in the best interest of countries that wanted the AI act, and surely not in the interest of their citizens, but there could be pressures.

I understand your point, I just object to the language and dividing the EU into more and less important blocks. If a voting mechanism is broken, that's where the issue is.

Who put a nepo-baby lawyer in charge of the big €95bn AI fund? EU bureaucrats living the 6-figure high life with chauffeurs and private jets in a bubble completely isolated from reality.

I hate the fake European foreign-backed right-wing parties but they didn't cause the current situation.

But I'm afraid it might be too late as the cancer spread and did too much damage. Insane regulations, no energy, looming demographic/pension crisis, tax hell, and collapsing industries.


Way more important than this act are the police raids. Someone used your SaaS to send phishing (see today's front page HN)? They'll just take all your servers away. Goodbye business. Unless they think the general public would riot, so established companies are okay. You can't build a castle on a foundation of quicksand.

Well , there isn’t also the opposite take from TechCrunch where they say: Why Paris may be the most important AI city outside Silicon Valley. [0]

While the EU loves its regulation, I still feel it’s too early to write it down in the AI race. It will not replace Anthropic or OpenAI any time soon, but even Google and Meta fail to do that.

If AI continue to grow and expand, there is enough space for many more unicorns.

[0] https://techcrunch.com/2026/05/28/why-paris-may-be-the-most-...


As someone who has actually experienced the hiring market in Paris, I have a hard time believing this. The salaries are, unfortunately, pathetic.

Did you read even a summary of the AI Act?

The gist of it is very simple - depending on the risk of what you're doing with AI, you have to document why it did what it did, and be able to explain it; or you can't use it at all. So if you're using AI for mass surveillance, you can't; if you're using it for treating loan applications you need to be able to explain why it approved/denied; if it's a customer service chatbot, do whatever, nobody cares.

Not only is burden of the legislation fairly low (and a lot of it hasn't come into force yet), it is extremely reasonable. No, sorry, we don't want a UnitedHealthcare using a broken algorithm on purpose to deny as much care as possible and hiding behind computer says no.


It's yet another time when EU is killing our own possibilities to build real competition to US or Chinese tech.

And yet another time they will be thinking aloud in few year "what happened that we are fully dependent on USA?"


So you're saying AI models should be allowed to freely "manipulate human behavior"?

That is almost a meaningless sentence. Cats and traffic lights both manipulate human behavior.

You know exactly what the intent of the lawmakers was.

No, I definitely dont. And neither do they, its hundreds of law makers. Meaning is usually sussed out through case law.

The problem is that statement is a bit too open to interpretation. Ever had Claude piss you off by being stupid and talking in circles? Sounds like manipulation of human behavior!

When it comes to MoE, to me, I remember Mixtral model that showed the viability of MoE for the first time. I was impressed by their technical report. To be clear, MoE idea was already out there, if I am not mistaken. If they have pushed Mixtral model family further, who knows they might have achieved the reputation of what the current Qwen family has. A missed opportunity.

> But they are accumulating too much technological delay.

How so? Catching up is easier and cheaper than spearheading the lead.


Compared to the UK Government which recently announced 10 million GBP for AI research, which will likely be scooped up by consultants. I think Europe is doing fine considering.

The first step would be indeed to join forces with UK, in order to don't be two entities, which is very unnatural to me.

That Brexit ship sailed. It’s very difficult to do anything with the UK currently.

No, we don’t need US’s Trojan horse in the EU

Interesting. Could you elaborate. As a pro Europe Brit I'm interested to understand this viewpoint. Is it a widely held perspective do you know?

I think that while y'all were appreciated members and definitely had a lot to offer, you also had a lot of annoying carve-outs and kept stalling needed measures to federalize and strengthen the EU more so we can be a proper superpower in our own right.

Maybe it's good you left for now, maybe we can finally get these things done. And once that's accomplished and enough of the gammon has died off, you can always rejoin :-)


Jumping in and most people in Germany wouldnt see UK as an American trojan hourse. I dont think anti American countries like France and Danemark have a problem with UK being in the EU per se.

I can see most people want that UK wouldnt just get special treatment any more.


This is by far the best definition of AI slop I ever read, and the blog post itself is the contrary of AI slop: a short post where each word matters. The creation of an output that is at the same time large and lacks fundamental motivation/understanding is what creates AI slop, not the use of AI itself. This distinction allows us to have a mental model to don't blame AI itself but its continuous misuses. This also creates a formal model to understand why continuous AI steering during AI-assisted coding is so important. The sum of all the prompts provided, if they form a cohesive view of the software intent, constitutes the seed and specification that can generate good, useful code. Try to put together instead the sum of all the short prompts that prey the AI to retry "it does not work, retry", and see what you obtian.

I agree. I have gotten frustrated by a lot of recent anti-AI rhetoric, not necessarily because I entirely disagree with the premise but because it is too generic in its form. It has started to sound to me like the people who complain about "chemicals" in their food and water.

The real complaints are about specific aspects of AI and its use, and this essay does a really good job of articulating one of them. It is something we can actually discuss and address.


> The sum of all the prompts provided, if they form a cohesive view of the software intent, constitutes the seed and specification that can generate good, useful code. Try to put together instead the sum of all the short prompts that prey the AI to retry "it does not work, retry", and see what you obtian.

What are you driving at with this statement? I think there is value in both types of prompts so I'm unclear.


See also Hank Green's take on the definition of slop: https://youtu.be/dT5IJExTUR4?si=mjkHK024MUqCId0k

The tl;dr is pretty similar. Intent and care are the functional variables. A human can produce slop without AI and they can produce art with AI. AI just enables slop at an industrial scale.


Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

Not sure I follow. Anthropic included benchmarks where GPT 5.5 outperforms Claude 4.8. Sure maybe that is a strategic error, but that doesn't seems to indicate benchmarks can't be trusted (I personally don't trust them, but not because of this).

Sorry how does their addition of GPT 5.5 in their blog post invalidate benchmarks? Also whether or not the marketing department decided to put it in a table benchmarks are an easy thing to measure independently

> LLMs generate statistically plausible continuations of text

Jesus, it's fucking 2026. Even LeCun would never say this again.


LLMs are Markov Chains [1]. "Emergent abilities" of LLMs can be explained by decrease of perplexity in text prediction [2].

  [1] https://arxiv.org/abs/2410.02724
  [2] https://arxiv.org/abs/2304.15004

Odd not including French and Italian recipes.

As soon as you start adding our beloved french recipes, frogs, snails and other oddities might substantially increase the 1,790 ingredients count

French and Italian languages. There are many recipes from both cuisines written in English which, I assume, will have been included.

Indeed, but I bet many were never translated.

it's a clickbait title

I have the feeling posts like that should be 1/4 the size, at max. At this point I don't care if it is AI-slop or human-slop: they are surprisingly alike. Information must be more dense, each sentence must carry some truth.

This lacks the sharp idea the Zero had. I have the feeling that in order to do something different, and not an evolution, the result will be borderline useless: a portable ARM computer with Wifi / satellite connection / ... And, then? What I can do with it? The evolution that I could like is a Zero with more CPU power, SDR and LoRa. Then let's implement all the cool protocols that it is possible to implement.

I agree, but on the other hand I think most people who bought a Flipper Zero didn't really have a use for that either. The most commonly cited use case is doing something with RFID tags, which was already achievable with much cheaper hardware.

There's a big category of tools that people buy because they're cool and feel like they come with limitless possibilities, but then end up in a drawer. Raspberry Pi became this for a lot of people. It took a lot of years and a lot of market saturation before everyone realized that they're not a good deal if you have a specific need for a general purpose computer, despite their usefulness for specific applications.

The Flipper Zero felt like a tool with infinite possibilities, but it takes a while for most people to admit that they don't have infinite use cases, or that application-specific hardware can often do a better job for less. Exactly like when everyone was buying Raspberry Pis as general purpose computers. But it's a cool product and it had a lot of viral marketing going in its favor.


I wish they took the Zero, added the Linux, SDR and 5G and the fancy case upgrade. Leave the AI out. That'd be sweet.

The AI is effectively free with the NPU/GPU just sitting there. I could see the possibility of models that are tuned to network/radio analysis that could enhance the value proposition.

That sounds amazing to me

Do you look at your desktop and think what can I do with it? What an odd statement to make about a computer

So they made a phone?

More evidence of the Smartphonification theory. Much like how all life trends towards crab, or all software towards reading mail and including a bespoke Scheme implementation, I posit that all hardware eventually becomes a smartphone.

Examples: - the cellphone (obviously) - my TV - my refrigerator - my oven - music players - tablet computers - laptops (well on their way) - cash registers->PoS sales machines - handheld game consoles


> Much like how all life trends towards crab

Carcinisation (https://en.wikipedia.org/wiki/Carcinisation)

The same seems to be true for trees too (arborescence).


A full Linux phone with an M.2 slot, Ethernet, hardware buttons, SDcard etc.

This is what you've all been asking for, right?


I have to agree baby steps.

They design a completely new product and suddenly announce a collaboration?

Not a fan but the new project looks cool.


Token/sec only makes sense once you tell me three four things:

1. decoding t/s, that is, when the model is generating text in the autoregressive fashion.

2. prefill t/s, that is, prompt processing speed.

3. What is the slope of those two numbers as the context size increases. An implementation that decodes at 50t/s with 2k context but decodes at 7t/s at 100k context is going to be a lot less useful that it seems at a first glance for a big number of real world use cases.

4. What's your use case? Reading a huge text and then having a small output like, fraud probability=12%? Or Reading a small question and generating a lot of text? This changes substantially if a model is usable based on its prefill/decoding speed.

For instance my DS4F inference on the DGX Spark does prefill at 350 t/s and at 200 t/s on already large contexts. But decodes at 13 t/s.

On the Mac Ultra the prefill is like 400 t/s and decoding 35 t/s.

The two systems can perform dramatically differently or almost the same based on the use case. In general for local inference to be acceptable, even if slow, you want at least 100 t/s prefill, at least 10 t/s generation. To be ok-ish from 200 to 400 t/s prefill, 15-25 t/s generation. To be a wonderful experience thousands of t/s prefill, 100 t/s generation.


> For instance my DS4F inference on the DGX Spark does prefill at 350 t/s and at 200 t/s on already large contexts. But decodes at 13 t/s.

You should run a multi-session batched decode on that DGX unless your 13 t/s decode is already running into thermal or power limits, which I don't believe it is. (To be clear, this is a real issue on Apple Silicon machines: batched decode does not seem to unlock higher aggregate tok/s unless you're specifically trying to mitigate the drawbacks of slow streamed inference. Especially on the M5 laptops, thermal/power throttling places an early limit on your total compute.

The jury is still out on Strix Halo, but I think batched decode may turn out to be quite useful there since the bandwidth bottleneck is even more constraining there.)


Agreed. Prefill kills me for local model work. The model reads much faster than it writes, but I'd love to get a sense for how fast the model can read large source conversations.

How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: