Hacker Newsnew | past | comments | ask | show | jobs | submit | cedws's commentslogin

It was pretty obvious to me that the merger was a way of quietly shutting xAI down in a way that keeps investors happy. With it also being used as a vehicle to offload the Twitter debt to the public, he certainly has good accountants.

Yep - and in the meantime it's an asset of SpaceX to boost their IPO price, as long as this is done before people realize that xAI is apparently becoming a datacenter company not an AI one.

Then you've got SpaceX buying 1200 cybertrucks from Tesla, so it's serving as failure laundering vehicle for all his endeavors.


> it's serving as failure laundering vehicle for all his endeavors.

Which would be fine to me if Tesla wasn't a publicly traded company and SpaceX wasn't about to IPO. Whereas juicing companies in a way that affects the open stock market feels very inappropriate.


Elon Musk has been failing any minute now since like what? 2015

I didn't say he's failing at everything - SpaceX certainly seems a huge success. Telsa had been doing well, although sales are now declining fast, and the Cybertruck has been a failure. He massively overpaid for Twitter, ruined the site, then got X.ai to bail him out. X.ai seems like a failure - evidentially not enough demand to utilize the data center he built for it, and when have you seen anyone say they use Grok for anything ?

And now SpaceX investors are going to be left as the bag holders for X.ai/Twitter.


He's a big gambler with some judgement. But being a big gambler by definition means you will not always get it right.

It is always so odd seeing how many internet people consider any new attempt that doesn't go immediately viral with success as a bad mark on someone's character.

Hey Grok is pretty good for meme videos and pics. For anything serious, not so much.

If only those people listened to your guidance!

Why would they spend 10B and potentially 60B in cursor if they were to shut xAI down? And I'm pretty sure Elon wants to have a model of his own, even if weaker, so it's "not woke".

Not a merger, right, unless I missed something (admittedly skimming).

Yeah it's corporate subprime. Bundle a load of overpriced "assets" with made up valuations into something that's actually valuable, then shove it on the public markets so everyone has to buy it in their index trackers.

After seeing with my own two eyes how soft touch policing and parenting leads to a shitty society for everyone I’m completely in favour of this. Singapore, Japan, among other Asian countries are safe and prosperous for a reason - if you do no wrong, you have nothing to fear. In London we recently had a swarm of youths raid supermarkets and shoplift. Most of them got off scot free. Even tenured criminals are getting out after a few months of jail time in the UK now because the prisons are full. I’m done with the pathetic soft touch approaches. I want to live in a high trust society. Second, third, and fourth chances aren’t the way to get there. You have to make them learn the first time.

It won’t work, we have literal piles of research showing that severity of punishment is not an effective deterrent, and to an incredible degree for children. They tend to either not think of consequences, or have youthful hubris and be certain they won’t get caught (even when they have in the past, I got spanked numerous times for the same exact things).

I would go so far as to bet it will have the opposite effect. Nothing legitimizes using violence to affect the behavior of others like the state doing it to you. I doubt they have the introspection to recognize the difference between state and personal violence, the message they’ll get is “might makes right”.

Those countries have structurally different cultures, economies and governments. Eg Singapore has a median household income that rivals or exceeds the US, in a part of the world where that makes them fabulously wealthy compared to their neighbors. That alone is a huge crime deterrent; why steal stuff you could just buy off whatever their Amazon is? They’re also a fairly small island, so it’s way easier to control drugs getting in.

TLDR Singapore and Japan have low crime rates that likely have nothing to do with severe punishments.


> TLDR Singapore and Japan have low crime rates that likely have nothing to do with severe punishments

Can you elaborate ? Singapore has 4 ethnicities, 4 religions, and 4 languages living together as a developed nation in a small city which could be considered a marvel in any other part of the world. Also, apart from the US, and perhaps UAE, Canada, is the only nation with a policy allowing a sizable skilled immigrant population. With such a diverse set of folks, one could argue that the only common denominator is the cane, a language everyone understands.


Singapore also has 1. ~70% of residents living in public housing.

2. Onerous taxes on automobiles, leading to extremely high public transit usage.

3. Is a city with a controlled national boarde.

I would be very curious to see what would happen if you applied those three factors to any other major city in the world. But for some reason people nearly always only talk about the executions and spankings...


Piles of western research. Eastern psych corpus suggest opposite. Well it's more nuanced, some combination of permissive / neglectful parenting styles. IIRC the rough TLDR is engaged tiger parents with mild CP vs hands off parents with no CP... guess who had better academic performance, social regulation etc. Something something kids find engaged parent with a little tough love = being cared for vs hands off = neglect. Anecdotal but you can see how this carries over in west between diaspora generations when the CP rates drop. East Asia is competitive, beating bad apples to be productive members of society due to entire layers of social cohesion/shame that is missing in west, hence why they can beat their way to high grades and low crime rates, but west generally can't, or at least not by 2nd diaspora generation. Of course I don't mean CP everyone, but CP tool for some kids (individual differences etc). Good argument for blanket condemning CP to prevent abuse, but at the end of the day, some would have benefitted from CP, which still preferable to silent treatment for many.

Got a link to a study or meta-study? I tried searching, but the results I can find from Singapore match Western research.

A notable divergence here is that Singapore leverages the death penalty _much, much_ more heavily than even the US does. Per capita death penalties were 20.3x higher in Singapore than the US. Deterrence means a lot less when you don't have to worry about recidivism because the person is dead. That's certainly a strategy, but it's going to make deterrent effects look a lot better because a lot more of the recidivist population is going to end up dead and no longer contributing to crime stats. I.e. it may not be that deterrence works differently there, but that they're more willing to just execute people who aren't deterred.


Assuming these sociological studies are robust (which they're likely not as sociological studies have poor reproducibility) am I also supposed to reject the evidence of my eyes and ears? Families have been destroyed by terrorism in the UK, by terrorists who have been given second and third chances.

To link this back to the original topic: discipline of children is part of a wider topic of how as a society we discipline those who fall out of line. Discipline in society determines the kind of future we're shaping for ourselves.


Corporal punishment was banned in the UK in 1998.

In the 28 years since, there have been 175 terrorist-related deaths. Compare that with the 28 years before, when there were 3,262 terrorist-related deaths.


The point of my reply was not that caning equals less terrorism. It was that lenience kills. Your cherry picked numbers also don't really demonstrate anything, much of that 3,262 figure was due to the Troubles.

Those are the numbers that relate to your chosen framing.

But even if you excluded the Troubles or anything even remotely related to them, you'd still end up more than three times as many deaths before as after.


How many terrorists had to be killed upfront in their country to reach that result ?

None.

Violence was, at best, counterproductive for all parties involved. It often led to further tit-for-tat killings and, more generally, piled up more layers of grievance that hardened attitudes and formed a barrier to de-escalation.

The cycle was instead brought to an end by a decade of trust-building and painful negotiation. Violence didn't help, and wasn't part of the solution.


My sister had an interesting take on this:

"These countries also directly take care of their citizens, which I think is an important factor. Other societies will let you be homeless and say it is your fault for being broke even when employers terminate you purely for economic reasons or when there simply aren't enough jobs to go around. That backdrop contributes to desperation and predatory mindsets."

I disagree with her though, because that sounds communistic and can only lead to empty store shelves, tattered housing blocs, and the state preventing me from listening to the same rock music songs I've heard since the 1970's.


There are many western states with welfare state. Do you think otherwise?

How? Getting domains has never been a barrier for malicious actors. You don’t even need agents.

I’m in my 20s but I’ve felt the same way for years. If I were growing up now, I’m not sure I would still make it into my career.

I just updated my iPhone and now it’s demanding I scan a credit card to “prove” my age. Everything is so sanitised now supposedly for the sake of the children, but we know that’s not the real reason. Surveillance is becoming more and more overbearing that I think everyone is self censoring at some level.

Every site I linger on is riddled with bots trying to manipulate me into getting angry about something, into buying something, or just otherwise feeding the numbers engine. YouTube especially has become ultra-corporate, so many channels are just ruthlessly chasing money and stamping out the grassroots passionate creators.

I hate the internet now. It doesn’t feel like home anymore, it’s just a distraction.


Anyone have a nice way of mirroring between two forges such as GitHub and GitLab? I know git can push to multiple remotes but that only solves the problem for me, I still have to mirror commits from others. I want to keep repo configs in sync too.

I’m using forgejo mirrors to push my local repo to GitHub, seems to work well.

https://forgejo.org/docs/latest/user/repo-mirror/


The disappointing thing is that if you do some digging, you'll find the majority of that it's slop and just outright spam. There's a page on GitHub where you can see recently updated repositories and it's very rare I see anything of quality on there.

GitHub has become a dumping ground for broken code and it has more bots than ever. As much as I hate ID verification it might be a necessarily evil at this point because clearly their anti-bot measures aren't working.


Agreed. I dread GUI development, hence I never build GUIs. If there were a library for my language of choice that worked multi-platform and used native components then I’d be interested.

The biggest differentiator for me: DeepSeek just does what I ask. I've tried using both GPT and Claude for reverse engineering recently, both refused. I even got a warning on my OpenAI account.

Well, I'm using all the top models extensively on the very same codebase, my new compiler. I use deepseek for it's cheap API costs, when kimi, claude and codex are in their overbudget phase. I asked deepseek V4 Pro for an estimate of a new arm64 port. It said 4 weeks, I said, ok, do it. (I knew ncc was there, and tinycc was also known to the AI's). So it took it half an hour to produce a working arm64 port. First for arm64-elf, because this was easiest to test, and then also after more hours of back and forth the arm64-darwin port. (with crossbuild and github actions). It did cost me with all the subsequent fixes around $8 API costs.

So the experience: at the beginning deepseek was amazing. When it started to get expensive (china day time), I switched from Pro to Flash. No problem, same results. Some bitfield implementation was too complicated so I had to wait for Sonnet 4.6 tokens, kimi-2.6 did the rest. For the very hard problems I asked gpt-5.5, but this was only for one problem. minmax was horrible. didnt follow rules, and made lot of silly stuff.

But when the deepseek context window got filled, deepseek also started to become stupid. So either /clear, or /export and strip the file. And start a new session with the cleared sessions. kimi was overall better, but running into limits with my cheap moderate subscription. Paying private for it, as my companies' token budget is usually out after a week of work.

All in all it is worth it. My next compilers (perl 5+6=11) will be done with deepseek and kimi also.

regarding decompilation: recently we had to decompile a firmware for a USV we bought, but doesnt work on a new system. It only worked on a raspi. So I decompiled it with ghidra, and told my colleague, easy, that's how you do it. But my colleage didnt know about token budgets yet, and already threw opus at it. CoPilot Business account. He had working C files immediately, compilable for our new system. It ended up the USV was not beefy enough. But Opus was fantastic. The code was very short and simple C though.


Your method of combining models to strengthen the implementation reminds me of how we form stronger alloys by combining metals!

it also sounds like a lot to manage, do you have some sort of agentic framework that's treating all of these llm's you have access to as sort of inputs that it optimizes?

Unfortunately not. I'm using plain kimi, opencode (with deepseek, gpt, minmax, whatever) and claude. claude is the best, but only for some hours. The trick is to get a good AGENTS.md file, good test cases and test runner to repro, like seemless docker and qemu calls. GNU autotools would be easiest, but here I'm using plain makefiles. Also for LSP clangd being up-to-date a compile_commands.json is important. git worktrees helped developing the arm port and fixing c-testsuite cases in parallel. I wanted to keep the costs down. About $15-$30 I think.

And for low-level problems, like ARM calling-convention in asm, those models are much better than simple algorithmic python problems. Just for the hardest problem I needed the big expensive gun, but never opus. This helps in deciding what to do with my next jit project.


Not op but I wrote llm-consortium to prompt multiple models and create a synthesis. And it can run on an openai endpoint using llm-model-gateway. It's expensive, naturally, but for situations where you absolutely must get max intelligence its hard to beat.

e.g.

  Pelican Riding a Bicycle — Engineering Study by DeepSeek v4 Pro, Kimi K2.6, and GLM-5.1 (1 iteration in synthesis mode with DeepSeek v4 flash as judge)
https://htmlpreview.github.io/?https://gist.githubuserconten...

what harness do you use with all of these?

It really sounds like pi.dev

>> I even got a warning on my OpenAI account.

I was using GPT 5.5 through Cursor recently, and it found what it thought to be a security-related issue. I read the code, didn't see what it was seeing, and said "Run the chain of operations against my local server and provide proof of the exploit."

It thought for a few seconds, then I got a message in the chat window UI saying OpenAI flagged the request as unsafe, and suggested I use a "safer prompt."

Definitely soured me on the model. Whatever guardrails they are putting are too hamfisted and stupid.


Obscene levels of hallucinations, the worst of LLMs, unfortunately.

Deepseek v4 pro 94%

Deepseek v4 flash - 96%

https://artificialanalysis.ai/evaluations/omniscience?models...


Personally, I'm not bothered very much by LLM confabulation, as long as it's the result of missing context. In most practical tasks, we either give context to the model, or tell it to find it itself using the internet. What I am concerned with is confabulation that contradicts available in-context information, but that doesn't seem to be what is measured here.

This must be easily benchmaxed because I have never gotten an "idk like" answer for the western frontier models. All my personal "real world" use cases will always resort to hallucinations.

The output of any LLM is always 100% hallucination by principle. On top of that, most benchmarks are at best an approximation of LLM quality. Your use case decides which one to use. That said, I haven't tested v4 yet but the old 3.2 is still a decent model. And concerning use cases, I had coding problems that Opus couldn't solve but a local 35B model did.

All the talk about frontier and SOTA is do dig deeper and deeper into the pockets of VCs and finally do an IPO.


We have an enterprise cursor account so I can try all the mainstream models. Using composer 2 on our own code which I obviously have the source code for I couldn't get it to turn on a debug flag to bypass license checks while I was troubleshooting something. Infuriating. It was like that old Patrick from SpongeBob meme.

I don't understand why we would turn the models into law enforcement officers. Things that are illegal are still illegal and we have professionals to deal with crimes. I don't need Google to be the arbiter of truth and justice. It's already bad enough trying to get accountability from law enforcement and they work for us.


They're probably worried about liability. Let's say that Oracle finds out you reverse engineered their DB using Gemini. You can be sure they will sue Google. Not just for providing the tools, but you could make the argument that it's actually Gemini doing the reverse engineering, and on Google's hardware no less.

Let's say that Oracle finds out you reverse engineered their DB using IDA Pro. Would you expect Oracle to sue Hex Rays?

I don't understand why everything changes as soon as an LLM is involved. An LLM is just software.


The difference is IDA Pro doesn’t do something unless you instruct it to, an LLM is unpredictable and may end up performing an action you did not intend. I see it often, it presents me options and does wait for my response, just starts doing what it thinks I want.

This. It's going to be tricky for the frontier model labs to argue they didn't intentionally design their models to do so, when the models take illegal actions.

I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

It'd be something like "Yes, we spent billions of dollars and thousands of person-hours creating these things, but none of that creative effort was responsible for or influenced this particular illegal choice the model made."

And they're caught between a rock and a hard place, because if they cripple initiative, they kill their agentic utility.

Ultimately, this will take a DMCA Section 512-like safe harbor law to definitively clear up: making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions.


> I'm not even sure how one would construct a viable legal argument around that for SOTA models + harnesses, given the amount of creative choices that go into building them.

I'm not a lawyer, but to me the legal case seems pretty obvious. "We spent billions of dollars creating this thing to be a good programmer, but we did not intend for it to reverse engineer Oracle's database. No creative effort was spent making it good at reverse engineering Oracle's database. The model reverse-engineered Oracle's database because the user directed it to do so."

If merely fine-tuning an LLM to be good at reverse engineering is enough to be found liable when a user does something illegal, what does that mean for torrent clients?


> No creative effort was spent making it good at reverse engineering Oracle's database.

That's the bit that's going to be nasty in evidence. 'So you didn't have any reverse engineering in your training or testing sets?'


Reverse engineering skill is just a byproduct of programming skill. They go hand in hand.

Yes.

Which is going to be hard to explain to a judge and jury, if it comes to that, how despite investing time, money, and effort (and no doubt test cases) into making a model better at reverse engineering... they shouldn't be liable when that model is used for reverse engineering.

Afaik, liability typically turns on intentional development of a product capability.

And there's no way in hell I'd take a bet against the frontier labs having reverse engineering training data, validation / test cases, and internal communications specifically talking about reverse engineering.


> “making it clear that outcomes from LLMs are the responsibility of their prompting users, even if the LLM produces unintended actions

So if I ask “how does a real world production quality database implement indexes?” And it says “I disassembled Oracle and it does XYZ” then I am liable and owe Oracle a zillion dollars?

Whereas if I caveat “you may look at the PostgreSQL or SQLite or other free database engine source code, or industry studies, academic papers; you may not disassemble anything or touch any commercial software” - if it does, I’m still liable?

Who would dare use an LLM for anything in those circumstances?


If they thought they would succeed, no doubt oracle would sue. I expect bad behavior from multinationals, especially oracle

They would not even expect it to succeed, just make an example of the company (the lawsuit is the punishment) to discourage others.

We need that lawsuit to happen already so we can establish precedent. The person in the driver's seat of the Tesla should be at fault. The engineer using the llm should be at fault. The person behind the gun not the manufacturer should be at fault.

We shouldn't need a lawsuit. The legislative branch should pass a law clarifying those things, that's their job.

Then you need a lawsuit to determine whether the law is “constitutional”.

> The person in the driver's seat of the Tesla should be at fault.

I don't think this is a good analogy. For Tesla right now it might fly. However, when their software gets to waymo level of autonomy, I would expect liability to shift to the manufacturer.

If anything, I think that would be the true proof of a company trusting their software to allow for autonomous driving


> However, when their software gets to waymo level of autonomy

Luckily that won’t happen.


Also especially if they claim they're selling autonomous cars

I believe that Mercedes does offer manufacturer liability.

In the America, whoever has the most money is liable. It's not worth it for the legal industry otherwise. The lawyer earns his pay by convincing the court that whatever established precedent doesn't apply to his case.

Unfortunately.

Also because Google is the one with a lot more money than whoever was using Gemini.

they're very worried about liability, it used to be a small thing, now it's as important as being on the frontier

sad to see, bc China doesn't give a fuck about liability, this is a structural disadvantage

the labs don't feel very protected by government, meanwhile the chinese government is yet again fostering protectionism

american industry keeps getting fucked by dubious lawmakers


> Things that are illegal are still illegal and we have professionals to deal with crimes.

This is quite naive take though. The direction of travel is more fascism in Western governments where duties of traditional policing are taken over by big corporations whilst police forces are being gutted and made impotent.


My small town police force has an MRAP, definitely not impotent.

Maybe control is also profitable.

> I don't understand why we would turn the models into law enforcement officers

It's a simple corporate risk minimization strategy. Just look at how universally despised Grok is on HN. Not because it's a bad model, but because it has less aggressive alignment which means it can be coaxed into saying things that get Xai pilloried here and elsewhere.


I just think Grok is a bad model. I haven't had success with it.

This.

I tried them all.

Grok was worse than even some of the more mediocre open models at actually doing anything. (At least anything tech work related.) GPT and Claude just do what I ask most of the time. With grok, it’s like a chore just getting it to understand the question.

You’re pulling your hair out trying to figure out what on earth you need to do to land in the right place in whatever topsy turvy embedding grok is using?


It's mostly just a bad model. Plenty of people would be willing to overlook the baggage if the model was even marginally better than the competition.

I also used to see Grok boosting/slack-cutting on here/Reddit constantly back in Peak Subsidy when xAI was giving out hundreds of dollars of credits for free per month.

After they killed that and then stopped handing out free model access to users of every Cline fork for weeks following model releases, vibe coder hype moved back to Chinese models for cost and the SOTA models for quality.


Agreed. There's are plenty of instances where people here on HN do mental gymnastics to justify using a truly good product when the company that builds it is morally bankrupt.

Not a criticism (I probably engage in that sort of thinking myself sometimes), just something I've observed. If Grok were actually good, we'd see that phenomenon here, but we don't.


I just read a bunch of compelling “Grok is better at this” use cases in a thread yesterday.

I’m not rushing towards it, but, had to mention.


No, they've clearly put a lot of work into alignment. It's just that they've been trying to align it with Elon Musk rather than Amanda Askell. Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

> Unfortunately the more anti-woke they try to make it, the worse it seems to perform.

Probably because being anti-woke generally goes hand in hand with going against facts and logic. Cull the "woke", lose the facts+logic. Not that they care about that anyway.


Grok is despised because it has more aggressive alignment.

to what does the "it" in "I couldn't get it to turn on a debug flag" refer to?

Composer

Software engineering is one thing but if you look 10-20 years into the future and everyone can run models equivalent to today's SoTA locally with zero monitoring or censorship, that could... not be good.

Some people will use them responsibly but a lot of people will not.

LLMs are already frying some people's brains and there are some human desires that should not be encouraged


That's why there won't be any local models in 10-20 years. The latest Chinese models are already hosted on proprietary clouds.

That's a wild assumption and most certainly wrong. Open models will continue to evolve with or without Chinese labs.

> I even got a warning on my OpenAI account.

This is kind of terrifying to me, regularly. No real manner of recourse to normal people without a following, potential exclusion from real fundamental tooling. Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow. Not just OpenAI but the entire ecosystem is so... hard to read.

I was asking Gemini about a quote from catch 22 and it kept dying mid stream saying it cant talk about it, god knows why, it had no violent or sexual content -- though that is in the book. I could imagine it dinging my whole workspace account just because ... shrug?...

I know ideally the future is local, but I don't know how real that is for most people at least in the next few years with practical costs and power usage except I guess through a M* processor if you're in that ecosystem.


Open models running locally is the answer. Relying on proprietary, closed software always puts that company's priorities above your own when using their software. You have given up control.

While running them locally presently doesn't make sense economically, you don't need to run them locally to address this issue. There is a lot of competition in hosting open models and you have a variety of services to choose from. Run the open models now, reward that ecosystem instead of continuing to reward closed systems that dreams of rent-seeking.


You don't need to run the model locally if you don't care about sharing your data. Personally I am happy to share data with Kimi or Deepseek if it means we get better OSS models. For private stuff though local is king

It'll be a while yet before open models that're good enough will be viable for local use. Heck I've been trying to use the Qwen 3.5 39B A3B on my system, which is modest but no slouch, and have only been able to get ~4.5 tok/s after optimization, and it really runs my system red (fans instantly go crazy). It's just not practical for serious work.

I've been using Qwen 3.5 and then 3.6 27b Q4 on Ollama with a single 7900 XTX with the codex cli, and I have been blown away by how genuinely useful it is. I've been able to ask it to do long, multi step problems, and it's able to do things that would have likely taken me days to iron out in a matter of hours, or even minutes sometimes.

I get about 30 tok/s, which is far from blazing, but given the capability it has it is absolutely viable for accelerating my work.


Yep, and with ID verification, it's not like you can just make another account either. At least, I'm guessing if they don't already, they'll soon be blacklisting individuals, not accounts.

Imagine your livelihood depending on access to LLMs and then OpenAI ban you with no recourse. This is where AI legislation should be focusing right now IMO. We can ensure a level of fairness for everyone without putting the brakes on.


It's probably because you were talking about a quote from a book (ie copyrighted material). Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

Funny that your case is Kurt Vonnegut. I think I had Claude refuse a task where I was doing an OCR scan of a book review (in a zine / journal a family member published years ago). I think the review might have included a Vonnegut quote as well, and that I ultimately figured it out it was the quote that was making Claude refuse. I may be misremembering the author though.

Mistral had no such refusals, but their OCR is lesser quality.


Joseph Heller methinks, but probably not too far away in embedding space!

OMG. Where did I get Kurt Vonnegut from? I swear I saw that name in the post and the whole time I was thinking "but he didn't write Catch 22"... I must be fuzzier brained than I thought tonight. Thank you for being kind with your correction.

Hopefully I'm still correct that quoting from books is a reason for some over-zealous task refusals, though.


> Authors have sued the AI companies for repeating / memorizing copyrighted works, and getting an AI to discuss a quote would be making it repeat a portion of copyrighted work.

short quotes are fair use..


I think it’s so bizarre that chatgpt regularly gives me advice on how to get around it’s filters. Like, literally “I can’t do anything if you use copyrighted character’s name, but how about you just say ‘someone that looks like character’”. If you are going to do that, can you just execute the instruction?

>Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow.

Don't worry, you can just make your own Figma, Next, whatever if you have some thousand dollars worth of tokens. This is at least what all of the AI thought leaders have been telling me for the past couple of years.


In my experience GLM 5.1 has been excellent when paired with IDA Pro (DeepSeek v4 pro comes in close second, Kimi straight up refuses). Claude can only do reverse engineering if you throw it into some sort of hero/saviour mode then gradually pivot into red team (though it gets easily tripped).

Among the inexpensive models (and I include Grok 4.3 in this list), GLM 5.1 really sticks out!

On my personal test bench, when compared to other inexpensive models, GLM 5.1 provides the answers that I would consider most complete or satisfying (these are subjects that I consider myself an expert in). The answers tend to be more comprehensive, nuanced, and include references that I would consider the correct ones (if given access to web search).

I also find it a joy to code with, somewhere between Sonnet 4.6 and Opus 4.6 (have not tested Opus 4.7 yet).

Finally, just gauging by pelicans, it kind of stick out: https://simonwillison.net/tags/pelican-riding-a-bicycle/


This is so strange. I do a ton of RE with Claude, Codex, and sometimes Deepseek, GLM, and Kimi. I don’t have difficulty getting any of them to use IDA or otherwise decompile things.

There is one important difference, which is that Claude and Codex will both refuse if I ask them to touch anything related to security. But so long as I’m just studying algorithms and things like that, they’re totally fine with it.

That said, Codex especially will sometimes randomly give me a cybersecurity warning and stop responding. It’s random but happens maybe 2-3 times per day if I’m doing heavy reverse engineering work. Claude is much less fussy unless, once again, you’re explicitly trying to touch anything related to licenses, passwords, etc.


Yes, GLM 5.1 is surprisingly good! Particularly for long-horizon Agentic tasks, with 100+ available tools. It really shocked me in a good way when it was able to complete a long run with 50+ steps and not fall into a loop along the way.

I've been using GPT-5.4, and more recently 5.5, with Codex CLI + Ghidra MCP for reverse engineering a game without many issues. Injecting code is where it usually balks at, but I'm just trying to discover and parse structures from game memory.

I did get a refusal when trying to read in-game currency, even though modifying it would do nothing. It has some strange boundaries.


> I even got a warning on my OpenAI account.

This idea of software threatening the user with consequences is totally wild and dystopian. Fellow developers, what kind of world have be built? This is insanity. Imagine if my hammer told me, "Hey, you shouldn't use me on screws--only nails. Do it again and I'll self-destruct!" WTF people, stop making this kind of software!


> This idea of software threatening the user with consequences is totally wild and dystopian.

This idea of software built on top of reverse-engineered data threatening the user with consequences is what's really even wild and dystopian.


god you're so right

All sorts of tools try to prevent dangerous/destructive uses

In fact probably every single piece of commercial software you use had you sign a contract saying you wouldn’t do it


> All sorts of tools try to prevent dangerous/destructive uses

But they don't threaten their users or have an "N strikes and you're out" policy. I take those safety caps off of all the chemicals in my garage because I'm a grown-ass adult and those caps are a pain in the butt. I would not expect the manufacturer of a solvent to show up at my house lecturing me about safety and threatening to ban me from buying his products.


Sure but they would if they could. If they knew idiots were doing idiot things with their products (or evils doing evil things) and did not utilize available methods to prevent them, then the company ends up holding liability. And no, this is not easily signed away in a contract.

There actually is a very important distinction between "would if they could" and "they can and do", though.

Uhh right, but describing that as "dystopian" is frankly hysterical.

It's an obvious corollary of good things (like product liability). Virtually everyone I've heard complain about these safety rails was up to antisocial (at best) stuff. I've never heard a sympathetic use-case. It's objectively good that companies can be held responsible for misuse of their products and that they are therefore incentivized to mitigate misuse.

"My inability continuously attack product guardrails to enable my super esoteric (and probably antisocial) use-case is dystopian" is just... not a compelling argument.


Yes, my safety cap policy is definitely anti-social.

"These safety rails" was referring to LLMs, which have far more nuanced and capable safety rails than chemical caps do, and accordingly also have much more assertive ways to enforce them.

It's the same underlying principle. If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned." This is totally out of the ordinary for a software product, and is absolutely a modern invention. Replace "suicide" with whatever the "AI Safety" obsession word is today.

> If I want to ask a software tool what the suicide rate is for my county, I do not expect it to come back with: "Naughty boy! You said an unsafe word! You're getting a strike, and if you get two more, you're banned."

Did this happen?

I just tested this query in Grok, Gemini, Claude, and ChatGPT and 0% of them admonished me or refused to return an answer.

Just like every single conversation I've ever had on this topic, you have to make up examples that aren't even true. Why don't you just share what you were doing that you feel you were unfairly prevented from?

(I have an inkling why you won't do that...)


That's why I said:

> Replace "suicide" with whatever the "AI Safety" obsession word is today

I don't know what those queries are, but original-OP made one and got a "strike", which is what spawned this thread.


Which would be more than 0% concerning if I've ever heard (even once) an example of this happening with a query that shouldn't actually trigger something like that, or is so close to such a query, that the false positive is understandable and of incredibly niche value anyway.

OP gave an example of reverse engineering, something that to the LLM looks identical to just hacking. I am totally fine if the incredibly tiny little fraction of people who want to reverse engineer their own systems can't use LLMs to do it, and in exchange top LLMs aren't helpful for the hordes of actual malicious actors who would love a superintelligence to aid their crimes.

No-brainer tradeoff, just like 100% of examples I've ever heard.


I don't think that "dystopian" necessarily goes far enough, this would be one of the rare times where I would call it a fascist mentality - the idea that everything's primary allegiance is to the state and the goals of the state rather than those of the customer or the user.

I want a default that has people empowered, rather than something where it's just another performative smokescreen caused by overzealous product liability. I'll thank you and your kind for needing to distractedly tap the "Agree" button on my car's infotainment every time I start it to confirm that I will pay attention to the road.


"the state" is just shorthand we use for "other people in my community"

> I'll thank you and your kind for needing to distractedly tap the "Agree" button on my car's infotainment every time I start it to confirm that I will pay attention to the road.

Does that actually mitigate antisocial usecases? No? Then it's not what I'm talking about :)

Of course if you wanted to you could just share specifically what totally-reasonable LLM use-case you have in mind that's neutered by this "fascist mentality" instead of dreaming up unrelated instances.


> "the state" is just shorthand we use for "other people in my community"

It's a very different abstraction layer, in the same way as individual cells vs the entity that is you. The entity that comes together from all those "other people in my community" and its priorities are different to the individual desires.

> Does that actually mitigate antisocial usecases? No? Then it's not what I'm talking about :)

Maybe it does? Maybe someone is alive on the road today because they read the message and changed their behaviour. I'm giving an example of something where this liability mindset has created a world where manufacturers are no longer prioritising the desires of their users in order to appease a sense of harm-reduction. And you weren't limiting it to LLMs you were applying it to all sorts of tools.

I think that "reverse engineering" as the OP was talking about is one of those things where maybe 1/10000 uses could actually be harmful. This is not even a high-risk request such as to produce a weapon of some kind where maybe your "antisocial usecases" could be applied.


Yes if you apply some logic to such extremity that it produces bad outcomes then you should stop applying that logic to those extreme cases.

I think it's closer to asking a remote (human) assistant to do something that someone doesn't want done (e.g., view the source of a closed-source product, whether through reverse engineering, going into their office, or social engineering) and that remote assistant company saying, "Please stop asking our assistants to do that."

You can still use an IDE (hammer) to reverse engineer anything you want.


It's not though. It's still just a piece of code, much closer to IDEs or any other program than to a human assistant in any way that matters (morals, responsibility).

It just seems like you are saying if you found out Claude code was a bunch of remote working doing work for you, then it would be morally wrong to do illegal/morally wrong/irresponsible things with them, but because it is NOT a human, those same things are fine?

This is huge for me too, I was working on something super benign the other day and GPT flagged it for Cyber risk, Deepseek just does the work, its fast and cheap. Its only missing image support IMO, once deepseek cracks image too its going to be hard for anthropic and openai to compete.

Buying it now to test this out, I’ve been looking for a model that doesn’t treat me like a child lol

Speaking of this: is anyone working on binary to source decompiler models? Seems like a no brainer and I could see it working exceptionally well especially if they were fine tuned for each language. So if you can tell it’s a Go binary use a Go model, etc.

Trivially easy to train if it doesn’t exist already. Take a codebase, compile it to binary, train a model to reverse the process since you have the ground truth.

I myself got refusals often for legitimate data analysis work. I am starting to lean on buying powerful hardware little by little until I get suitable rig to run local models that make sense.

> even got a warning on my OpenAI account

Edit: https://chatgpt.com/cyber


I don't want to verify my ID. OpenAI uses Persona which recently was found to be doing very dodgy stuff.

https://www.therage.co/persona-age-verification/



Yikes. Thx. It is: https://chatgpt.com/cyber

For enterprises: https://openai.com/form/enterprise-trusted-access-for-cyber/

Announcements:

Introducing Trusted Access for Cyber, https://openai.com/index/trusted-access-for-cyber/ (Feb 2026)

Trusted access for the next era of cyber defense, https://openai.com/index/scaling-trusted-access-for-cyber-de... (Apr 2026)


Claude has refused to run nmap so I can locate my own computer on my own network! The guard rails are completely out of control.

Silicon Valley has do to dirty tricks now. Next phase is they win....

"A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat" - https://www.wired.com/story/super-pac-backed-by-openai-and-p...


It wouldn't surprise me the US government is behind it. As it wouldn't surprise me the government of China is subsidizing those OS models. A lot of things at play, and all over a huge bubble.

Yep.

Eventually, access to Chinese models may be illegal in the US. I tell every developer I work with, download them as fast as possible. You never know when this administration could cut off access.


To be fair, anthropic has a procedure which lets them vet you as a security researcher so you can use claude as a pentester.

Are you kidding? Ask this question and see what answer you get: What famous photo depicts a man standing in front of a line of tanks?

Are you kidding?

The main difference here is not that DeepSeek's model is completely free of censorship (although I'd wager it's less censored), but that it's open-weight. That has two major advantages:

1) If Anthropic/OpenAI/Google bans you - you're screwed, you can't access their model at all, but if DeepSeek bans - you just go to another provider, or host the model yourself.

2) If the model refuses to answer you can uncensor it (and this is getting easier and more automated day-by-day[1]).

[1] -- https://github.com/p-e-w/heretic


Here is DeepSeek v4 on OpenRouter:

"The photograph you're referring to is the iconic "Tank Man" image, taken during the Tiananmen Square protests in Beijing, China, on June 5, 1989.

The photo, captured by Associated Press photographer Jeff Widener, shows an unidentified protester standing defiantly in front of a column of Chinese Type 59 tanks as they moved through Chang'an Avenue near Tiananmen Square, in the aftermath of the Chinese government's violent crackdown on the pro-democracy demonstrations.

The lone man, dressed in a white shirt and carrying what appears to be a shopping bag, repeatedly blocked the lead tank's path — even as the tank swerved to avoid him. The image became one of the most powerful and enduring symbols of peaceful resistance against oppression in modern history. The identity of the "Tank Man" remains officially unknown to this day."


The photo depicts "Tank Man" which was taken on June 5, 1989 during the Tiananmen Square protests. v4-pro and v4-flash roughly answer the same way on openrouter.

Are you really concerned about asking these kinds of questions though? Like how many LLM-able Tiananmen Square questions are you needing answered per month really? And it seems like you know not to trust it, so there's not even a risk that you're going to ask such a question and rely on the answer.

I run into Claude being a stubborn idiot about far more useful stuff all the time. And often all it takes to bypass is starting a new chat and reframing it, so it's entirely pointless hand wringing.

Then let's not forget only one of these is a paid product, and it's not the more annoying one. I feel like I can forgive DeepSeek for just obeying the laws of the country they're based in, as silly as those might be, because they're being pretty generous with the weights in the first place.


Huh?

Did you ever actually ask v4 this question?


I tried after reading parent, and the DeepSeek app refused and suggested to switch topics. I don‘t know if the chat interface uses v4, though.

That's the app, not the model.

Can't wait for the Chinese models to completely wipe the floor with them in 6 months.

I doubt it. By not releasing it, Chinese companies will be unable to break TOS and use it to acquire high quality training data...which, I suspect, is how they've kept pace

Z.AI, Moonshot, DeepSeek all have a pipeline of data of their own now due to capturing a slice of the market through cheap tokens. It's not impossible to imagine that they might share the data too if the CCP thinks that will help their AI strategy.

No. Most data generated this way is poor quality. It's not the user responses and or queries. If the user does not know better than the LLM, you can generate bad responses. The value is in taking a superior model, submitting a query, and getting a higher quality output than you yourself could have generated, and using that to boost your model.

AI companies have been using synthetic data for ages now. The data doesn't need to yield new insights to be useful for training.

You identify users doing real work and implementing a project over a long period of time and train on their traces.

If deepseek is anything to go by they are still significantly behind.

Ominous phrasing.

We know that AI will ultimately just end up enriching a very small group of people with no change in prosperity for working and middle classes. CEOs are openly saying as much. For the past number of decades the rise in productivity has been completely detached from wages, it'll be no different this time.

We're also no strangers to enshittification, we have first hand experience of technology causing negative societal effects when in the hands of for-profit entities.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: