Hacker Newsnew | past | comments | ask | show | jobs | submit | sigmoid10's commentslogin

The chances are actually often way better than 1/4. For the words I didn't know, I was almost always able to exclude one or two options. Sometimes even three, finding the solution by exclusion.

Does the Swiss rail not receive public funding? It seems to me that undercharging would only necessitate more public funding, not some fundamental change where taxpayers suddenly have to pay for something they didn't before.

You should note that while a single use (like to kill mould) may be fine, regular use on stone, metal or wood (i.e. most stuff in a bathroom) is not recommended because it is a powerful oxidizer that will considerably damage these surfaces if used regularly. That's because it releases hydroxyl radicals that destroy not only molecular bonds in stains and microorganism cell walls, but also attacks treated surfaces and corrodes metals.

Duly noted.

Being able to skim, filter and comprehend large amounts of text is much more rare than you might think. More than half of Americans read below sixth grade level and a fifth is functionally illiterate, struggling even with the most basic reading tasks. Videos are the only way for these people to consume any kind of information.

Well, the people who bought the SpaceX IPO essentially footed the entire bill here. And they might still make money on it, depending on how the stock goes from here on. I don't see anyone who could lose here, even if the bubble bursts, apart from the Cursor people. And they are likely still going to make a huge amount of money.

> Well, the people who bought the SpaceX IPO essentially footed the entire bill here.

It's hard to say that they footed the bill here, but they basically gave SpaceX a number to say "well our stock went IPO and it's at this price, so here's 60B at this price"

A good tactic from SpaceX as after the inital surge of a big IPO, the stock price usually comes down and finds it's correct balance, which is usually always lower. So if they had of waited the 'cooling off' period of a year for example, and the stock price went down to it's 'correct' valuation, then they would have had to issue a higher number of stocks.. At least that's my thinking, but I'm terrible with money.


Interesting how well a panel of Fable 5 + GPT 5.5 beats the frontier of either one, but if you add Gemini into the mix the panel of three performs worse, not better. To me that sounds like Gemini is worse at the given tasks but better at convincing judges of its solutions. Oh and a Panel of 2 Opus 4.8 models is almost exactly as good as one Fable 5. That smells suspicious. Do we know if that might simply be what Anthropic is doing behind the curtain?

> Oh and a Panel of 2 Opus 4.8 models is almost exactly as good as one Fable 5. That smells suspicious. Do we know if that might simply be what Anthropic is doing behind the curtain?

I wouldn't be surprised if Fable/Mythos is a model distilled from a Panel/Council of Claude instances. Recursive self improvement is something all AI labs must be working on in some way or another.


Yeah, GPT 5.5 + Fable beating either individually is belivable, but 2x Opus > Fable is what makes me a bit dubious about the whole thing. They might be measuring skills that are too specific or benefit a lot from more tokens being thrown at them. Also Claude Code (the harness) is not the best at the moment, that might be part of it as well?

What throws me off is DeepSeek beating both Opus 4.8 and GPT 5.5.

That definitely doesn't sound right.


> Interesting how well a panel of Fable 5 + GPT 5.5 beats the frontier of either one, but if you add Gemini into the mix the panel of three performs worse

I'm not seeing that? Did you maybe misread the #2 ranked one as Fable + GPT + Gemini? It's actually Opus + GPT + Gemini.


Yeah, for law I imagine these "nice" beginnings were 2000 years ago at best. If they even existed at all. But all these jobs where talking to other humans is paramount will be dominated by extroverted quacks by default. Same goes for the capital raising college dropout pseudo-tech-bros. They were never nice, they just weren't so engaged in public discourse before, when billion dollar net worths still meant you actually had revenue and not just a vague trendy idea.

Not that far. Lawyers had a great deal of influence in creation of all modern nation-states, human rights, international law and maintenance of the core social contract in the modern society.

Similarly lawyers/bankers were the ones who built in trust in capital, contracts, businesses and protection of investor rights. Delaware c corp is not an outcome of bad guys.


That’s a system though that seems to be at the breaking point.

Yes. It might be a general property of all human organizational structures, to degrade over time in terms of intent drift and erosion of public goodwill.

It offers some predictive power if so, like OBAFGKM + luminosity is enough to determine where a star is on its lifecycle. Maybe there's a similar domain that maps some human coordination structure onto a deterministic trajectory from birth to death.

If that were the case, I wouldn't be surprised to see venture capital--as an organizing principle for the tech industry--reaching a later stage of life.


Hey it's the one that Turchin managed to have not named... psychohistory. Fair enough, that's not absolutely about a one-way life-cycle (From not overly much to sometimes less than zero, or its counterpart There and back again).

For VC, go back to the Medicis? Obsession with Roman over Athenian (ie, not even Macedonian or ahem Florentinian, whom, as you know, were blessed-- by themselves-- with nonTrumpian^W (-"anti"-)papist perturbations) can only push them closer to all of extant theory. They don't get to have Germanics explain their early policy failures, eg

Update: for the record, I hold the opinion that libertarianism and liberalism (social, political but not to forget the far less trendy "fiscal") should be right next to each other in parameter space. One may be allowed to define a metric that puts a firewall, or, if you are less amused, an event-horizon _around_ them. I'd suggest "make things people want"


Sure, but Turchin's overproduction of elites doesn't suggest a birth and death cycle right?

It's a system that gets out of balance and needs to adjust over time. He calls the process that moves the system out of whack the "wealth pump". I don't think tech oligarchs are responsible for the wealth pump, they just benefit from it.

Whatever causes organizational structures to decay seems like something more general than that. Or maybe it doesn't exist at all. Or maybe it's just some Nth level effect of entropy itself. Except so far removed from simple physical measurements that it feels intellectually lazy to just label whatever is happening as "entropy" and move on.

Unknown. Fun to think about. It also makes aging a little more interesting, because it creates a framework for me to the world events I live through within--even if it's all bs in the end.


https://youtu.be/B5cMfyFqKmM?t=25m37s

You can play with his model

To your other points, I need to clear cache for them overnight, not now.. maybe I can have a response tomorrow

I haven't played with the model myself, it's possible that there's energy getting pumped into the model (unlikely to be the same thing as the wealth pump--- that seems to be internal. This energy pump _could_ take care of entropy but in all likelihood it does not

Eyeballing

https://youtu.be/B5cMfyFqKmM?t=24m8s it seems to be buried in the g(S) term but I'll have to get into the weeds)

Whatever is not (vaccuosly) true about T's model should be something he hasn't yet anticipated. Ime with other proven-relevant models

what about Seldon's :)?


Louisiana (or say. Crimea) X GPT

Asimov trashed that word!

Foundation is so disappointing because the science of civilization turned out to be a front for mind controllers armed with the psychic powers that

https://en.wikipedia.org/wiki/John_W._Campbell

insisted stories in Astounding Science Fiction had to have. (Also disappointing because it wasn't really finished but filled in with 1980s sequels that revealed more lies behind the lies instead of following the thread through the 1000 year interregnum)



You don’t have to go back that far. Read To Kill a Mockingbird for an example of a really nice lawyer.

I mean, it is a work of fiction.

A work of fiction which has been revealed to be semi-autobiographical.

I worked in this field since long before LLMs. Nobody outside of the field really cared about GPT2, and even insiders knew the "too dangerous" part was a PR gag at best and the first dig of the moat at worst. After all, they released smaller versions of it along with detailed instructions on training it in the paper, so anyone with a lot of compute and a bunch of internet scrapers could try to recreate it. But basically noone did, even though it would have only cost ~50k back then (and less than 3k today). A few normal users started to take notice with GPT 3, but even then it was super limited. Even instructGPT didn't cause real shockwaves, despite being very close to the final product. Only ChatGPT/3.5 finally lit the fuse and people suddenly cared about having this too.

Since we’re doing anecdotes I definitely agree GPT2 lit the fuse. It woke up a sizable chunk of people paying attention. GPT3 is when I and many others got into a full blown existential crisis - it was the bang after the fuse. Then we got a long tail of laggards and people without vision. Even today you can find a significant chunk of folks in denial still.

Gemma is amazing with tools for anything that is not crazy complex. I think a lot of people have a wrong perception of it because Google's new prompt format broke implementations like llama.cpp and it took quite a while to get everything sorted. But even the tiny variants running on edge devices are surprisingly capable when used right.

The frontier will probably keep moving for a while, but it will be increasingly disconnected from normal human use. In the future, if you're not trying to solve a research level math problem, you'll probably do it locally and fully privately. Which also means the payday when they will fundamentally no longer be able to reach a billion users with frontier models will come soon for the labs. Even if they do get their IPO out, it will probably crash and burn at current valuations.


Do you guys actually work with these models?

I have to use GPT 5.4 Mini at work. It benchmarks higher than that Gemma 4 model.

In my experience it's next to useless. It cannot even move 20 existing lines of code from A to B without breaking them half of the time.

If you tell it to look something up in your dependencies, it's 50/50 on whether the answer is correct, incorrect, or it simply didn't perform the search at all.

I find it next to useless, and I'm mostly better off doing the work manually.

It's a night and day difference to even Sonnet, not to mention the SOTA.


Counter: I use 5.4 mini all time for coding. No trouble letting it implement features. Entire new screens, APIs and various components.

It ain’t the best for sure, but if you have trouble letting it move 20 lines I don’t know what’s the cause but that’s not my experience at all. I do make pretty extensive use of guardrails and proper instructions in my AGENTS.md.

I also value super boring code bases with an as much as possible uniform shape. I guess that’s also helping out.


>It benchmarks higher than that Gemma 4 model.

Depends on what you look at. Gemma 4 31B without reasoning benchmarks significantly higher than GPT-5.4 without reasoning on artificial analysis. Even the new Gemma 4 12B beats it. And while GPT-5.4 with xhigh reasoning beats the reasoning version of Gemma 4 31B, the question is why you would throw such a complicated task that needs so much reasoning at such a small model to begin with. So if you do coding, you'll probably not have much success with either model. But for actual simple tasks that these models were made for, they are extremely capable. E.g. hook it up to the Atlassian MCP and have it do all the stuff that is supplemental to coding in big enterprises.


Like I said in my original comment, it’s fine for non-coding tasks, meaning I primarily use it to answer questions

The MoE variant was perfect for speedily generating hundreds of vocabulary mnemonic flash cards for my daughter to study for the SAT. "Ant bait abates our ant problem" and "A droid adroitly fixes things around the house," for example.

We also used z-image to generate accompanying illustrations.


“Moving lines of code” is a very peculiar eval tbh. I’ve never used Gemma for agentic tasks, but did have it write code, including multi-turn, and I was very positively surprised how well it performed.

It wasn't so much an eval, I really just wanted a small change moved out to another branch.

GPT 5.4 mini couldn't do it. Not even on the second attempt, where it went from obviously wrong to a subtly wrong copy.

In the end I had to manually copy and paste the 10-20 lines over.

If it can't even do that job, I seriously doubt it's going to be adequate for implementing a plan, like people often seem to suggest it could do, in order to save output tokens of a better model.


Like I said, I never really used it for agentic work. I had previously evaluated locally runnable models with opencode (such as qwen3-coder), but found that it wasn't really feasible.

Since then I've adopted a different philosophy, and I actually prefer it this way.

I still very much enjoy doing most coding myself, but when I tried using tools like Claude Code, it felt very difficult to return to the codebase after letting Claude make some changes. Maybe that's just because of poor AI-use discipline, I don't know. But with smaller models, that's not even an issue. I can't just let it do all the coding and thinking for me, however if I can describe a function I want to great detail in plain english, then Gemma can write it for me, and it will most likely work. It's perfect for boilerplate.

I also recently worked with a web framework I'd never worked before, though I'm deeply familiar with other ones. So I asked it "I know how to do this in Y framework, what's the best-practice approach to doing it in Z framework?" and it was incredibly helpful, even pushing back on some of my 'bad' attempts at solving a problem.

I think GPT5.4 mini might fall into a similar category, in that it probably performs best when not overwhelmed with too many tools/ skills/ mcps, instead being given clearly defined tasks by an orchestrator model. I call those my token burners, as they're super cheap to run and have high tokens/second.


Cursor 2.5 is essentially kimi and I find it eminently usable.

i use for tasks like object recognition in my family photos and cooking videos . seems to be fine

I think the PR from an agent sounds legit, but the whole part once the alleged operator joins in sounds fishy. Wouldn't be surprised if someone saw the PR comments and used the username mentioned by the agent to troll around in the chat. It would also mean that the AWS creds were probably stolen and their expiration date was truly a hard limit for the whole operation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: