For me, the appeal of local compute is first and foremost confidentiality and having the possibility to run my 200K documents through an LLM just to see what happen without having to consider the cost.
This post conflates scalability and performance. PostgreSQL is fast on smallish systems, but try adding more CPU cores and you'll see performance gains will not be linear at all. Modern server can ship with 256 or more cores and a single instance of PostgreSQL will struggle to take advantage of these.
What kind of cases were you measuring? I would think that, e.g. 256 separate long-lived connections in a setup like that would scale less-than-linearly but not dramatically so?
We had a SCSI zip-drive at our uni and it was a brilliant way to drag megabytes of content home. Even though I had amazing internet (2Mbit shared by 100+ ppl), the zip drive would still be a good way of getting stuff home.
Then I got to experience the click of death and the internet connection was bumped to 100Mbit and I didn't need to replace my zip drive.
You have to deal with a lot more stuff. You have to order/pay for a server (capex), mount it somewhere, wire up lights-out-mgmt and recovery and do a few more tasks that the provider has already done.
Then, say if the motherboard gives up, you have to do quite a bit of work to get it replaced, you might be down for hours or maybe days.
For a single server I don't think it makes sense. For 8 servers, maybe. Depends on the opportunity cost.
Have you done this yourself? If you haven't I think you'd discover server hardware is actually shockingly reliable. You could go years without needing to physically touch anything on a single machine. I find that people who are used to cloud assume stuff is breaking all the time. That's true at scale, but when you have a handful of machines you can go a very long time between failures.
Yes, having done this for decades, it happens often enough that you need to plan for it. You need to have redundancy, spare parts, and staffing or you are basically gambling. All of this has to be tested, too, or you might find that your failover mechanism has dependencies you didn’t plan for or unexpected failure modes (I’ve twice experienced data center hard outages due to the power distribution system failing oddly when switching between mains and UPS power, or UPS and generator).
Using something like AWS can make it easy to assume that servers don’t fail often but that’s because the major players have all of that behind the scenes, heavily tested, and will migrate VMs when prefail indicators trigger but before stuff is done.
If you have failover redundancy of services across your systems of some kind to mitigate then great. With proper setup no worries. I guess it depends how much you want to take on vs hand off.
MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.
I feel like this is captures the point very well. Google removing this software, means that for 99% of the users on the platform, the choice to play this gets taken away from user.
I think it is interesting that these pieces of software are now being inspired by Midnight Commander and are being built by people who never worked with or experiences the original, Norton Commander.
It got smashed by customs. Literally.