If you divide the number of sentences trained on by the total number of sentence...

If you divide the number of sentences trained on by the total number of sentences in its corpora, the number for most of the top LLMs will be far closer to ~1 than any other integer.

> Also Meta team with llama show that simply training more, more tokens, continues to reduce loss.

Can you source the specific claim you are talking about? More tokens to me generally will mean new tokens unless you are specifying.

from the paper "We train for one epoch over the training data. In earlier experiments, we found that training longer can lead to over-fitting"