Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now, we have definitely had such things happen with package managers, as people pull repos:

https://www.bleepingcomputer.com/news/security/dev-corrupts-...

And it's human nature to be lazy:

https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...

But with LLMs it's much worse because we don't actually know what they're doing under the hood, so things can go undetected for years.

What this article is essentially counting on, is "trust the author". Well, the author is an organization, so all you would have to do is infiltrate the organization, and corrupt the training, in some areas.

Related:

https://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_...

https://xkcd.com/2347/ (HAHA but so true)



Exactly! It's not sufficient but it's at least necessary. Today we have no proof whatsoever about what code and data were used, even if everything were open sourced, as there are reproducibility issues.

There are ways with secure hardware to have at least traceability, but not transparency. This would help at least to know what was used to create a model, and can be inspected a priori / a posteriori


Exactly. You can't do a simple LLM-diff and figure out what the differences mean.

afaik




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: