It would be better if VM comparisons included JSC rather than, or in addition to...

Jasper_ · on April 13, 2020

> - for some program, how long does it take to run that program. Start to finish. No ignoring warmup.

This methodology likely comes from Java, which has long-running server applications. "How long does it run" is often "until someone hits ^C". Here, startup cost can be slow as long as the peak performance is fine. It's accepted that the first minute or two of the server are slow, but that's small compared to the month or so that the server will be running for.

> This tells you how good a VM is.

I think papers like this approach it from the wrong angle. I don't care about the VM's theoretical peak performance. I care about being able to measure and track performance in a reliable way. Put simply, I'm fine with bad codegen as long as I can consistently measure it. Feel free to improve it, but adding to sometimes give me good codegen, unreliably, is much more frustrating than bad codegen. But this seems to be the way the VMs are going, with things like probabilistic profiling.

If I refactor my code and replace for(let i = 0; i < L.length; i++) with for(const i of L), what's the cost? Will performance go up or down? We don't have tools or metrics to handle that right now. How can I ensure my codegen is good won't regress?

I work on a particularly demanding website in my free time ( https://noclip.website/#smg/AstroGalaxy , unfortunately won't run in WebKit due to missing WebGL 2 ), and performance varies drastically from Chrome release to release, and I do extensive testing with node.js to make sure that I'm getting good codegen.

pizlonator · on April 13, 2020

I know that the warmup skipping comes from Java. It was a mistake there. Saying that it’s because Java is for servers is a lame excuse and may be getting it backwards - maybe Java only succeeded on servers because all the tuning ignored warmup.

I hear ya that having tools would be great - but the best speedups do come about from probabilistic methods so it would be weird to rely on whatever a profiler told you.

offmycloud · on April 13, 2020

For those who aren't familiar, JSC is JavaScriptCore, the built-in JavaScript VM in WebKit, Apple's browser engine.

lioeters · on April 13, 2020

Thank you, I was able to find more information about JSC on WebKit's site: https://trac.webkit.org/wiki/JSC

Also a standalone build (kinda old) for various platforms: https://github.com/Lichtso/JSC-Standalone

titzer · on April 13, 2020

Not measuring VM startup time has a long tradition in papers. It was the Original Sin (TM).

evmar · on April 13, 2020

This blog post, and your reply, both touch on how difficult it is to measure performance. But then in the same where you point out that there are many different valid ways you could measure performance, you also make the broad claim that "JSC tends to outperform V8". What are you basing that on?

pizlonator · on April 13, 2020

These days we use JetStream 2 (our design) and Speedometer 2 (collaborative design between WK and Chromium folks) as the main big benchmarks but it’s not the only thing we measure and tune.

V8 used to have their own JS benchmark, Octane, but they retired it at about the same time as we beat them on it. So JSC is fast enough to make other people retire their benchmarks.

And by the way if you are interested in what we think of as good methodology you should read about JetStream 2: https://webkit.org/blog/8685/introducing-the-jetstream-2-ben...