Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you know why they decided to host a zip file instead of just hosting the CSV and relying on HTTP compression?


HTTP compression is optional, so they either have to compress on the fly (wasting cpu) or provide multiple versions (complicating setup and deployment) or make some HTTP clients not work.

simgle zip file is really the easiest solution for cases when the file must absolutely be compressed


Many webservers allow you to serve a compressed file (stored on disk) and _decompress_ when a client specifically can't support the compressed encoding. Since most clients should support compression, this means you only use the CPU for the rarer case where the uncompressed data is required.

Eg, http://nginx.org/en/docs/http/ngx_http_gunzip_module.html


Or you can just zip it explicitly and remove complexity and future issues it they have to move to another web server platform.


But what if an analyst needs to access this data and run their regressions on a potato? Surely that use case is worth adding a few libraries to handle.


I would think running regression should be more demanding on the potato then decompressing ZIP file.


hmm good point, can you draft up an architecture plan using multiple microservices and redundancy via a kubernetes cluster and have it on my desk by Monday please.


What machine could anyone be running an analysis on where unzip is a limiting factor?


you heard them, a literal potato from the ground.

even a big old russet only has, what, like 32 bites?


I mean there are some potatoes that take more then four bytes to eat, but they are rare.

For some, if you slice them, you might even end up with only 16 bits.


Whoosh


Nginx or Apache would both cache these versions transparently. I think they just wanted to distribute as a zip


The business logic here is >15 years old. Http compression was only in early stages then, and you can guarantee that many client side scripts and libraries would jot have supported it. Zip was well known. Compress and place.

If I'm not missing the point here, this was, and still is, about offering the simplest, most reliable solution over a long period. This is a near perfect example of how to do exactly that. No changing formats, no moving requirements, no big swings in frameworks, apis, or even standards. And most importantly, no breaking your customers business workflows.


No edge case dependencies on the WWW server's configuration, and no sudden "why did we just saturate our external connection?"

No emergency change requests from the outage team that has to be impacted by other areas and fit into the infrastructure's teams maintenance windows and their capacity to address that.

No rebalancing of workloads because Jane had to implement (or schedule the task and monitor it) that change, Joe had to check and verify that the external availability tests passed, and Annick had to sign off on the change complete, and now everyone isn't available for another OT window for the week.

Or something.


> "The business logic here is >15 years old. Http compression was only in early stages then"

At least one of us is confused about history here; are you really saying that circa 2008, HTTP compression would have been considered immature or unstable?


Maybe in part because it encourages you to work with a local copy, rather than just hitting the hosted .csv repeatedly?

Just guessing. I have no idea :-)


It it exactly what most api optimization does. For example, batch queries merge several api calls into one to reduce api call counts. And it this case, it is perfectly optimized. You have literally one zip file, and there is no more.


I could see 2 reasons:

1 - save on cpu usage, compress once, serve many

2 - with zip you can have some rudimentary data integrity checks (unzip -t)


Another option would be save it as a gzip file and serve it raw with implied gzip compression.

Nginx does this: https://docs.nginx.com/nginx/admin-guide/web-server/compress...

Last entry: the directive is `gzip_static on;`


By distributing the data in a compressed file, you retain the benefits of the smaller file size at all stages (transfer from source, local storage, local transmission) until the data needs to be used in uncompressed format, all without any additional processing.


The CSV was available at https://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.csv but isn't up to date.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: