What the author is missing: tail recursion in constant space is an implicit requirement for practical use of CPS transformations. In fact this is probably the single biggest reason for wanting efficient tail calls in a language.
When the continuation gets invoked by the event-handling loop, the stack is reset.
You're not compiling to C here: "the stack" may be a chain of heap-allocated activation records that are still live as far as GC is concerned. What you're describing might work but you'd want to test it to be sure.
EDIT: On reflection I agree that the setTimeout() approach should free the stack, however it's implemented.
You can always use setTimeout to call your tail, which has the neat effect that your CPS "threads" become cooperatively multitasked and can progress alongside each other. Just replace "foo()" tail calls with "setTimeout(foo, 0)".
And the downside that you're nuking your stack every time, making debugging quite a bit harder.
ECMA doesn't require efficient tail recursion. Caveat programmer.