It's bad at long running tasks.

bluegatty · 2026-04-15T05:19:37 1776230377

Yes and no. It's bad because of shorter context but it does have auto-compaction which was much better than Claude. If you provide it documentation to work from and re-reference, it works long-running.

Honestly - 'every inch of IQ delta' seems to be worth it over anything else.

I'm a long time Claude Code supporter - and I'm ashamed to admit how instantly I dropped it when discovering how much better 5.4 is.

I don't trust Claude anymore for anything that requires heavy thinking - Codex always finds flaws in the logic.

But this happens every few months.

winrid · 2026-04-15T18:00:33 1776276033

I tried to use 5.4 for something pretty straightforward - create scripts to automate navigating a game UI and capturing the network traffic. 5.4 was super frustrating, constantly stopping and waiting for feedback etc, even after telling it to never wait and just iterate/debug. I quit and switched to Opus 4.6 and it did much more of the work by itself.

bluegatty · 2026-04-16T04:22:41 1776313361

I've never run into that problem, but these were coding solutions in codex with a strong plan, steps to work towards.

It could be that if you're using massive tokens on a 'plan' then then want to limit u in a way, or even if the objective is not perfectly clear they don't want semi-random token use.

See if the token/sub solution behaves differently. Make sure that when it 'compacts' that it re-reads your instructions clearly.

winrid · 2026-04-17T03:10:31 1776395431