Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What he didn't show in the main article, only in subsequent code, is that std::unordered_map is way too slow to be useful. With a proper fast map, he called it custom_map, it outperforms sort-unique until the data set exceeds the L3 cache size.


Right well, its clear that at least in java, he doesn't pre-allocate the size of the hashmap. Thus, when the hashmap hits its maximum size, of course the resize operation has to copy all of the data into a new map, which makes this no longer O(N) but something like O(NlogN) or O(N^2) depending.


If you do O(N) work every time the hashmap is "full" and double the size of the hashmap, it's still O(N).

Hand-wavy proof:

If your last insert caused a rehash, then you have done O(N) work plus O(N/2) plus O(N/4)... the limit of which is equal to O(2N), and thus only a constant factor more.


No, it's still O(N) amortized. Imagine if the hashmap doubles in capacity whenever it is full or hits its max load factor. Then you end up having O(N) copy operations. O(2 * N) = O(N)


While your objection stands, repeated trainings of a hash table can be very impactful on the actual walltime spent, and it is a bad benchmark not to include the optimization of preallocating.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: