LLMs are next token predictors. Outputting tokens is what they do, and the natural steady-state for them is an infinite loop of endlessly generated tokens.
You need to train them on a special "stop token" to get them to act more human. (Whether explicitly in post-training or with system prompt hacks.)
This isn't a general solution to the problem and likely there will never be one.
You need to train them on a special "stop token" to get them to act more human. (Whether explicitly in post-training or with system prompt hacks.)
This isn't a general solution to the problem and likely there will never be one.