Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For anyone interested, any more than about five 9s of reliability is basically impossible anyway when it comes to human intervention. As an example, 6 nines of reliability would allow you 3 seconds of unavailability a year so 250 milliseconds of unavailability a month.

Not sure what the qualifier “when it comes to human intervention” means, but if I ignore it then five nines is quite standard in certain sectors. For example phone switch SLA (back when I was in the biz) was measured in minutes per decade (as in “cumulative unavailability under 1/2 minutes per decade”). Large baseline power plants can and must run uninterrupted for decades.

Of course it’s a systems issue not a point solution.



Ah, apologies.

I realise in hindsight that I was rambling from the point of view of offering five nines for software that is inherently flaky/unreliable. Companies where developers cycle through and knowledge is lost, technical debt accumulates and systems are used almost counter to their intended purpose (ie Redis as a database)

In that sense, it's like constantly massaging applications to stay alive or at a reasonable level of service and so hence the assumption that someone will be paged and respond in order to preempt a failure or restore service during an outage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: