Stupid question for people who know more about these things. Why can’t we fight the warmup time by running an already warmed up snapshot of the program? Or say dumping some data structure when it hits steady state to give hints to the JIT the next time it runs?
Android does this with ART. They use JIT and profiling to generate AOT binaries dynamically. I believe this also lets Android update the runtime in a way that invalidates the AOT binaries. They are simply be regenerated as needed.
If you look for "AOT" or "Ahead-of-time" you'll find examples in both .NET and Java, but as far as I know they're either largely experimental or limited to newer code (not backwards-compatible with all code). But I haven't looked too deeply into it. Dumping data structures to give hints for next time reminds me of something I read recently on the topic, but drat, I can't remember it right now.
I would say those are reasonably easy to solve, you just need to offer platform specific trained binaries and add a realistic set of stress tests with which you can train the VM.
It does seem like you could do JIT with a persistent cache which stores the JIT output along with a key that's a hash of all the relevant system parameters like CPU model and VM parameters like heap size. This would mean that the typical case of re-running a program in the same environment would be pre-warmed.
It’s much harder than that because the JIT is speculating on what lots of objects in the heap are doing, including watchpointing them to constant fold properties. It’s not clear what the key should be in that case.
Still not impossible but I want to be clear on what exactly makes this hard. CPU model for example is not what makes it hard.
IBM OpenJ9 JVM does that with the AOT feature. It's good for startup time and the AOT code can be still further optimized by the JIT. https://www.eclipse.org/openj9/docs/aot/
I would bet that a VM isn’t going to be deterministic enough to produce the same heap twice even without aslr. Just building a deterministic JS engine seems like a super hard problem.
We can and there are many implementations of it, but the relatively rare application of this approach suggests it's not a big enough win to justify the cost.