> Compilers "failed" because VLIW is fundamentally not a particularly useful idea.
I think the problem isn’t just that it’s not particularly useful, but that it’s an actively bad idea. It statically encodes ILP that we get dynamically in chip designs today, which means that ILP can no longer react to changing conditions, like literally any core architecture changes as the underlying hardware evolves (or even on the same die, say finding itself running on an efficiency core rather than a performance core or vice versa) or running in an SMT environment in which it is not the only instruction stream going through the pipeline.
Adapting to advancing hardware isn't impossible. It's been awhile since I've looked so details are a little hazy, but I know the Mill architecture had an answer to this. I believe they were using effectively two-pass compilation. The first compiled against an abstract version of the architecture, and this is what was distributed. Then this could be further specialized on the user's machine as at that point the limits of the hardware would be known.
I don't recall if they had an idea for SMT as well or not. It's possible they wrote it off entirely, particularly after Spectre and all that.
You are correct that with a lot of effort you can get a compiler to emit VLIW instructions that can claw back some of these problems and have some, but not all, of the information you have available to the processor core at runtime which can make things slightly smarter. And with all of that effort which involves changing software distribution models, jettisoning energy efficient flexible heterogenous architectures, and abandoning SMT, you might be able to get something close to performing as well as… what we already have today without VLIW.
Not sure why anyone thinks that’s an argument for doing it.
I think the problem isn’t just that it’s not particularly useful, but that it’s an actively bad idea. It statically encodes ILP that we get dynamically in chip designs today, which means that ILP can no longer react to changing conditions, like literally any core architecture changes as the underlying hardware evolves (or even on the same die, say finding itself running on an efficiency core rather than a performance core or vice versa) or running in an SMT environment in which it is not the only instruction stream going through the pipeline.