Do you know why they decided to host a zip file instead of just hosting the CSV ...

theamk · on Sept 15, 2023

HTTP compression is optional, so they either have to compress on the fly (wasting cpu) or provide multiple versions (complicating setup and deployment) or make some HTTP clients not work.

simgle zip file is really the easiest solution for cases when the file must absolutely be compressed

sethhochberg · on Sept 15, 2023

Many webservers allow you to serve a compressed file (stored on disk) and _decompress_ when a client specifically can't support the compressed encoding. Since most clients should support compression, this means you only use the CPU for the rarer case where the uncompressed data is required.

Eg, http://nginx.org/en/docs/http/ngx_http_gunzip_module.html

nikau · on Sept 16, 2023

Or you can just zip it explicitly and remove complexity and future issues it they have to move to another web server platform.

MengerSponge · on Sept 16, 2023

But what if an analyst needs to access this data and run their regressions on a potato? Surely that use case is worth adding a few libraries to handle.

innocenat · on Sept 16, 2023

I would think running regression should be more demanding on the potato then decompressing ZIP file.

nikau · on Sept 16, 2023

hmm good point, can you draft up an architecture plan using multiple microservices and redundancy via a kubernetes cluster and have it on my desk by Monday please.

CJefferson · on Sept 16, 2023

What machine could anyone be running an analysis on where unzip is a limiting factor?

adbachman · on Sept 16, 2023

you heard them, a literal potato from the ground.

even a big old russet only has, what, like 32 bites?

pantalaimon · on Sept 16, 2023

I mean there are some potatoes that take more then four bytes to eat, but they are rare.

For some, if you slice them, you might even end up with only 16 bits.

londonReed · on Sept 16, 2023

Whoosh

avandelay1 · on Sept 16, 2023

Nginx or Apache would both cache these versions transparently. I think they just wanted to distribute as a zip

anakaine · on Sept 16, 2023

The business logic here is >15 years old. Http compression was only in early stages then, and you can guarantee that many client side scripts and libraries would jot have supported it. Zip was well known. Compress and place.

If I'm not missing the point here, this was, and still is, about offering the simplest, most reliable solution over a long period. This is a near perfect example of how to do exactly that. No changing formats, no moving requirements, no big swings in frameworks, apis, or even standards. And most importantly, no breaking your customers business workflows.

skylanh · on Sept 16, 2023

No edge case dependencies on the WWW server's configuration, and no sudden "why did we just saturate our external connection?"

No emergency change requests from the outage team that has to be impacted by other areas and fit into the infrastructure's teams maintenance windows and their capacity to address that.

No rebalancing of workloads because Jane had to implement (or schedule the task and monitor it) that change, Joe had to check and verify that the external availability tests passed, and Annick had to sign off on the change complete, and now everyone isn't available for another OT window for the week.

Or something.

chrisweekly · on Sept 16, 2023

> "The business logic here is >15 years old. Http compression was only in early stages then"

At least one of us is confused about history here; are you really saying that circa 2008, HTTP compression would have been considered immature or unstable?

firecall · on Sept 16, 2023

Maybe in part because it encourages you to work with a local copy, rather than just hitting the hosted .csv repeatedly?

Just guessing. I have no idea :-)

mmis1000 · on Sept 16, 2023

It it exactly what most api optimization does. For example, batch queries merge several api calls into one to reduce api call counts. And it this case, it is perfectly optimized. You have literally one zip file, and there is no more.

Nux · on Sept 15, 2023

I could see 2 reasons:

1 - save on cpu usage, compress once, serve many

2 - with zip you can have some rudimentary data integrity checks (unzip -t)

pstuart · on Sept 15, 2023

Another option would be save it as a gzip file and serve it raw with implied gzip compression.

Nginx does this: https://docs.nginx.com/nginx/admin-guide/web-server/compress...

Last entry: the directive is `gzip_static on;`

zaxomi · on Sept 16, 2023

By distributing the data in a compressed file, you retain the benefits of the smaller file size at all stages (transfer from source, local storage, local transmission) until the data needs to be used in uncompressed format, all without any additional processing.

rsaxvc · on Sept 15, 2023

The CSV was available at https://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.csv but isn't up to date.