Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations. If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous. The question then becomes “for what purposes (if any) are the models profitable, even if they occasionally hallucinate?” Whoever solves that problem walks away with the market.



> If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

This isn't true.

You're conflating whether a model (that hasn't been fine tuned) would complete "the capital of Connecticut is ___" with "Moscow", and whether that model contains a bit labeling that fact as "false". (It's not actually stored as a bit, but you get the idea.)

Some sentences that a model learns could be classified as "trivia", and the model learns this category by sentences like "Who needs to know that octopuses have three hearts, that's just trivia". Other sentences a model learns could be classified as "false", and the model learns this category by sentences like "2 + 2 isn't 5". Whether a sentence is "false" isn't particularly important to the model, any more than whether it's "trivia", but it will learn those categories.

There's a pattern to "false" sentences. For example, even if there's no training data directly saying that "the capital of Connecticut is Moscow" is false, there are a lot of other sentences like "Moscow is in Russia" and "Moscow is really far from CT" and "people in Moscow speak Russian", that all together follow the statistical pattern of "false" sentences, so a model could categorize "Moscow is the capital of Connecticut" as "false" even if it's never directly told so.


That would again be a "statistical" attempt at deciding on it being correct or false - it might or might not succeed depending on the data.


That's correct on two fronts. First, I put "false" in quotes everywhere for a reason: I'm talking about the sort of thing that people would say is false, not what's actually false. And second, yes, I'm merely claiming that it's in theory learnable (in contrast to the OP's claim), not that it will necessarily be learned.


Am not sure the second part is always true: there might be situations where statistical approaches could be made kind of "infinitely" accurate as far as data is concerned but still represent a complete misunderstanding of the actual situation (aka truth), e.g., layering epicycles on epicycles in a geocentric model of the solar systems.

Some data might support a statistical approach other might not even though it might not contain misrepresentations as such.


The human feeling you have that what you're doing is not statistical, is false.


Based on what research is that universally true? (Other than base physics like statistical mechanics.)


Base physics is all we need to know it is true. Souls are unphysical and we've had reason to be pretty confident about that for at least a century.


Yes, physics determins how phenomena work and aggregate but that doesn't necessarily support the specific claim (and we also don't know "all of physics").


Doesn't support the specific claim that souls don't exist? We know how atoms/waves interact. We have literally no reason to think there is some other soul-based mechanism.

Of course, maybe induction is false and gravity will reverse in the next 3 seconds after writing this comment and God will reveal themself to us. We have no justified reason to think otherwise other than the general principle that things behave the way we observe them to and will continue to do so.


I see no need for a soul - you brought it up, not me.


What would it mean that you are 'what you're doing' is not statistical/arising from base interactions - if not that there is some non-physical entity resembling a soul? You're suggesting some sort of non-material component of humanity, yes?

If not then I'm not even sure what the disagreement is.


Base interactions need not strictly create statistical results in the end.


Good luck philosophically defending this dividing line between 'statistical' and not


Another version/interpretation in this "truth" space is whether a model is capturing multi-part correlations with truthy/falsy signals, much like capturing the "sadness" or "sarcasm" of a statement.

In other words, a model might have local contextual indicators, but not be able to recognize global hard logical contradictions.


but the model doesn't operate on token directly, right? all operations are happening in the embedding space, so these tokens get mapped into manifold and one of the dimensions could be representative of fact/trivia ?


tangent: any reason to assume it gets mapped to a manifold rather than something that is not?


I think "manifolds" in AI are not the same as actual smooth manifolds. For starters I would not expect them to have locally the same dimension across the whole dataset.


Something to chew on for me. But what is a manifold then if not a topological space that is locally the same as R^(some dimension) ?


What I meant is that I can imagine cases where some part of the dataset may look like R2 and then colapse to have a spike that looks like R1, so it is not a standard manifold where all of it has the same dimension.

Appart from that, these "manifolds" have noise, so that is another difference with the standard manifolds.


There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow. LLMs of course don't have the benefit of direct experience, which would probably help them to at least some extent hallucination wise.

I think research about hallucination is actually pretty valuable though. Consider that humans make mistakes, and yet we employ a lot of them for various tasks. LLMs can't do physical labor, but an LLM with a low enough hallucination rate could probably take over many non social desk jobs.

Although in saying that, it seems like it also might need to be able to learn from the tasks it completes, and probably a couple of other things too to be useful. I still think the highish level of hallucination we have right now is a major reason why they haven't replaced a bunch of desk jobs though.


I think we expect vastly different things from humans and LLMs, even putting raw computing speed aside. If an employee is noticed to be making a mistake, they get reprimanded and educated, and if they keep making mistakes, they get fired. Having many humans interact helps reduce blind spots because of the diversity of mindsets, although this isn't always the case. People can be hired from elsewhere with some level of skill.

I'm sure we could make similar communities of LLMs, but instead we treat a task as the role of a single LLM that either succeeds or fails. As you say, perhaps because of the high error rate, the very notion of LLM failure and success is judged differently too. Beyond that, a passable human pilot and a passable LLM pilot might have similar average performance but differ hugely in other measurements.


Overall, excellent points! I would like to add on to that though. RLHF actually does effectively have one LLM educating another. Specifically, human trainer's time is valuable, so they train an AI to express the same opinion about some response as a human trainer would, and then have that trainer AI train the LLM under consideration.

It's both interesting and sensible that we have this education in the training phase but not the usage phase. Currently we don't tend do any training once the usage phase is reached. This may be at least partially because over-training models for any special purpose task (including RLHF) seems to decrease performance.

I wonder how far you could get by learning from retraining from some checkpoint each time with some way to gradually increase the quality of the limited quantity training data being feed. The newer data could come from tasks the model completed, along with feedback on performance from a human or other software system.

Someone's probably already done this though. I'm just sitting in my armchair here!


> There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow.

Isn't this just conveniently glossing over the fact that you weren't taught that. It's not a "model of truthfulness", you were taught facts about geography and you learned them.


I mean, sure. OP implied that "capital of Connecticut is Moscow" is the sort of thing that a human "model of truthfulness" would encode. I'm pointing out that the human model of that particular fact isn't inherently any more truthy than the LLM model.

I am saying that humans can have a "truther" way of knowing some facts through direct experience. However there are a lot of facts where we don't have that kind of truth, and aren't really on any better ground than an LLM.


How exactly can there be "truthfulness" in humans, say? After all, if a human was taught in school all his life that the capital of Connecticut is Moscow...


Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

Inferring truth about a social event in a social situation, for example, requires a nuanced set of thought processes and attention mechanisms.

If we had a swarm of LLMs collecting a variety of data from a variety of disparate sources, where the swarm communicates for consensus, it would be very hard to convince them that Moscow is in Connecticut.

Unfortunately we are still stuck in monolithic training run land.


> Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

> The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

Great! LLMs are fed from the same swarm.


I was responding to the back and forth of:

> If you pretrained an LLM with data saying Moscow is the capital of Connecticut it would think that is true.

> Well so would a human!

But humans aren't static weights, we update continuously, and we arrive at consensus via communication as we all experience different perspectives. You can fool an entire group through propaganda, but there are boundless historical examples of information making its way in through human communication to overcome said propaganda.


The main reason for keeping AI static is to allow them to be certified or rolled back (and possibly that the companies can make more money selling fine tuning) — it's not an innate truth of the design or the maths.


While those are good reasons to keep the weights static from a business perspective, they are not the only reasons, especially when serving SOTA models at the scale of some of the major shops today.

Continual/online learning is still an area of active research.


We kinda do have LLMs in a swarm configuration though. Currently LLMs training data, which includes all of the non RAG facts they know, come from the swarm that is humans. As LLM outputs seep into the internet, older generations effectively start communicating with newer generations.

This last bit is not a great thing though, as LLMs don't have the direct experience needed to correct factual errors about the external world. Unfortunately we care about the external world, and want them to make accurate statements about it.

It would be possible for LLMs to see inconsistencies across or within sources, and try to resolve those. If perfect, then this would result in a self-consistent description of some world, it just wouldn't necessarily be ours.


I get where you are coming from, and it is definitely an interesting thought!

I do think it is an extremely inefficient way to have a swarm (e.g. across time through training data) and it would make more sense to solve the pretraining problem (to connect them to the external world as you pointed out) and actually have multiple LLMs in a swarm at the same time.


Even monolithic training runs take sources more disparate than any human has the capacity to consume.

Also, given the lack of imagination everyone has with naming places, I had to check:

https://en.wikipedia.org/wiki/Moscow_(disambiguation)


I was responding to the idea that an LLM would believe (regurgitate) untrue things if you pretrained them on untrue things. I wasn't making a claim about SOTA models with gigantic training corpora.


Ask the LLM what it thinks of tianenmen and we will understand what truth really means.


I agree that humans and AI are in the same boat here.

It's valid to take either position, that both can be aware of truth or that neither can be, and there has been a lot of philosophical debate about this specific topic with humans since well before even mechanical computers were invented.

Plato's cave comes to mind.


There isn't necessarily in humans either, but why build machines that just perpetuate human flaws: Would we want calculators that miscalculate a lot or cars that cannot be faster than humans?


What exactly do you imagine is the alternative ? To build generally intelligent machines without flaws ? Where does that exist ? In...ah that's right. It doesn't except in our fiction and in our imaginations.

And it's not for a lack of trying. Logic cannot even handle Narrow Intelligence that deals with parsing the real world (Speech/Image Recognition, Classification, Detection etc). But those are flawed and mis-predict so why build them ? Because they are immensely useful, flaws or no.


Why should there not be, for example, reasoning machines - do we know there is no universal method for reasoning?

Having deeply flawed machines in the sense that they perform their tasks regularly poorly seems like an odd choice to pursue.


What is a reasoning machine though ? And why is there an assumption that one can exist without flaws? It's not like any of the natural examples exist this way. How would you even navigate the real world without the flexibility to make mistakes ? I'm not saying people shouldn't try but you need to be practical. I'll take the General Intelligence with flaws over the fictional one without any day.

>Having deeply flawed machines in the sense that they perform their tasks regularly poorly seems like an odd choice to pursue.

State of the art ANNs are generally mostly right though. Even LLMs are mostly right, that's why hallucinations are particularly annoying.


Not my usage experience with LLMs. But that aside, poorly performing general intelligence might just not be very valuable compared to highly performing narrow or even zero intelligence.


Well LLMs are very useful and valuable to me and many others today so it's not really a hypothetical future. I'm very glad they exist and there's no narrow intelligence available that is a sufficient substitute.


Not disputing that, but still think as far as reasoning or thinking machines are concerned it is a dead end.


I see. Well as far as I'm concerned, they already reason with the standards we apply to ourselves.

People do seem to have higher standards for machines but you can't eat your cake and have it. You can't call what you do reasoning and turn around and call the same thing something else because of preconceived notions of what "true" reasoning should be.


suppose there was a system that only told the truth. Then that system would seemingly lie because, for any complicated enough system, there are true statements that cannot be justified.

That is to say, to our best knowledge humans have no purely logical way of knowing truth ourselves. Human truth seems intrinsically connected to humanity and lived experience with logic being a minor offshoot


You are not disproving the point.


If truthfulness doesn't exist at all, then it's meaningless to say that LLMs don't have any data regarding it.


Agree, humans can "arrive at a reasonable approximation of the truth" even without the direct knowledge of the capital of Connecticut. A human has some other interesting data points that allow them to probabilistically guess that the capital of Connecticut is not Moscow and those might be things like:

- Moscow is a Russian city, and they probably aren't a lot of cities in the US that have strong Russian influences especially in the time when these cities might have been founded

- there's a concept of novelty in trivia, whereby the more unusual the factoid, the better the recall of that fact. If Moscow were indeed the capital of Connecticut, it seems like the kind of thing I might've heard about since it would stand out as being kind of bizarre.

Noticeably this type of inference seems to be relatively distinct from what LLMs are capable of modeling.


I was actually quite surprised at the ability of top-tier LLMs to make indirect inferences in my experiments.

One particular case was an attempt to plug GPT-4 as a decision maker for certain actions in a video game. One of those was voting for a declaration of war (all nobles of one faction vote on whether to declare war on another faction). This mostly boils down to assessing risk vs benefits, and for a specific clan in a faction, the risk is that if the war goes badly, they can have some of their fiefs burned down or taken over - but this depends on how close the town or village is to the border with the other faction. The LM was given a database schema to query using SQL, but it didn't include location information.

To my surprise, GPT-4 (correctly!) surmised in its chain-of-thought, without any prompting, that it can use the culture of towns and villages - which was in the schema - as a sensible proxy to query for fiefs that are likely to be close to the potential enemy, and thus likely to be lost if the war goes bad.


Another might be that usually state capitals are significant cities in their state -- not necessarily the biggest, but cities you have at least heard of. Given that I have never heard about Moscow in Connecticut, it seems unlikely (not impossible, but).


I don't understand, this sort of inference is not an issue for an LLM. Have you tried?


> There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations.

I think there are two issues here:

1. The "truthfulness" of the underlying data set, and 2. The faithfulness of the LLM to pass along that truthfulness. Lack of passing along the truthfulness is, I think, the definition of the hallucination.

To your point, if the data set if flawed or factually wrong, the model will always produce the wrong result. But I don't think that's a hallucination.


The most blatant whoppers that Google's AI preview makes seem to stem from mistaking satirical sites for sites that are attempting to state facts. Possibly an LLM could be trained to distinguish sites that intend to be satirical or propagandistic from news sites that intend to report accurately based on the structure of the language. After all, satirical sites are usually written in a way that most people grasp that it is satire, and good detectives can often spot "tells" that someone is lying. But the structure of the language is all that the LLM has. It has no oracle to tell it what is true and what is false. But at least this kind of approach might make LLM-enhanced search engines less embarrassing.


well said. agree 100%. papers like these - and i did skim through it, are thinking "within the box" as follows: we have a system, and it has a problem, how do we fix the problem "within" the context of the system.

As you have put it well, there is no notion of truthfulness encoded in the system as it is built. hence there is no way to fix the problem.

An analogy here is around the development of human languages as a means of communication and as a means of encoding concepts. The only languages that humans have developed that encode truthfulness in a verifiable manner are mathematical in nature. what is needed may be along the lines of encoding concepts with a theorem prover built-in - so what comes out is always valid - but then that will sound like a robot lol, and only a limited subset of human experience can be encoded in this manner.


> To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous.

A more interesting pursuit might be to determine if humans are "hallucinating" in this same way, if only occasionally. Have you ever known one of those pathological liars who lie constantly and about trivial or inconsequential details? Maybe the words they speak are coming straight out of some organic LLM-like faculty. We're all surrounded by p-zombies. All eight of us.


Remember, it's not lying if you believe it ;)

Training data is the source of ground truth, if you mess that up that's kind of a you problem, not the model's fault.


> If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

If it was, maybe. But it wasn't.

Training data isn't random - it's real human writing. It's highly correlated with truth and correctness, because humans don't write for the sake of writing, but for practical reasons.


In reality, it's the "correct" responses that are the hallucinations, not the incorrect ones. Since the vast majority of the possible outputs of an LLM are "not true", when we see one that aligns with reality we hallucinate the LLM "getting it right".


I would agree with you. In general, humans have still not resolved any certain theory of knowledge for ourselves! How can we expect a machine to do that then?

In reality humans are wrong basically most of the time. Especially when you go off a humans immediate reaction to a problem which is what we force LLMs to do (unless you're using chain of thought or pause tokens).

That being said there still is a notion of truthfulness because LLMs can also be made to deceive in which case they 'know' to act deceptively.


I've never been to Moscow personally. Am I then not being truthful when I tell you that Moscow is in Russia?


There’s a decently well known one in Idaho


What you're saying at the start is equivalent to saying that a truth table is impossible.


When I talk to philosophers on zoom my screen background is an exact replica of my actual background just so I can trick them into having a justified true belief that is not actually knowledge.

t. @abouelleill


Are LLMs Gettier machines? I'm confident saying yes and that hallucinations are a consequence of this.

EDIT: I've had some time to think and if you read somewhere that Hartford is the capital of Connecticut, you're right in a Gettier way too. Reading some words that happen to be true is exactly like using a picture of your room as your zoom background. It is a facsimile of the knowledge encoded as words.


Living organisms were optimized on the objective of self-propagation and we ended up with a notion of truthfulness. Why is the self-propagation objective key for truthfulness?


I'm absolutely sure than LLMs have an internal representation of "truthfulness" because "truthfulness" is a token.


Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: