Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It has been 0 days since GCP has taken down a startup (again).

You see this at least once a year. Never heard of this from AWS or Azure.

In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.



On the other hand i can’t remember when there was a serious outage on GCP, unlike AWS/Azure who seem to go down catastrophically a couple of times per year.


I've been in AWS for almost twenty years at this point. It's been a long time since I've seen a global outage of the data plane on anything. The control plane, especially the US-east-1 services? Yes - but if you're off of east-1, your outages are measured in missile strikes, not botched deployments.


Didn't the latest outage affect people not on us-east-1 because internal aws services depend on us-east-1?


The impacts are usually partial. For example, scaling is impacted but everything already deployed contributes to work up to capacity. Or, you can't change configuration but the previous configuration works as configured. Often surprisingly not so impactful even if there can be limited work stoppage.


The problem with the us-east-1 outage is that a lot of big companies are there, so even if you try your best not to depend on us-east-1, your third party providers are most likely there. In my previous company, we were completely down during us-east-1 outage because of other dependencies that are beyond our control.


Entirely fair. I have thus far avoided that problem. Not always engineering's choice.


Considering how many AWS and non-AWS services go down at least partially when us-east-1 fails, this reads somewhat like "Don't worry that the steering wheel and pedals aren't working, your engine is still running on cruise control".


Work for a major bank who isn't solely in US East 1.

No it didn't impact us.


I can easily remember a few multi-hour AWS incidents from the last few years, since I've had to handle the resulting fires at my various employers at those times. Not sure how you missed these, or do they not count as "global outages" for some reason?

December 2021: https://www.cloudcomputing-news.net/news/aws-outage-takes-do...

June 2023: https://newsletter.pragmaticengineer.com/p/the-scoop-52

October 2025: https://www.cnbc.com/2025/10/20/amazon-web-services-outage-t...

Each of these were massive outages impacting very large services across the web.


Perhaps you don't notice GCP outages because so few companies rely on them?


GCP has a lot of customers. But you wouldn't know the companies that do, unless you worked there and wanted to leak it, or it publicly comes out. Eg it's been publicly acknowledged that Apple uses GCP for iCloud, https://www.cnbc.com/amp/2018/02/26/apple-confirms-it-uses-g... , and Home Depot is another that's used as a case study, https://cloud.google.com/customers/the-home-depot but most customers don't want to make a big deal about being on GCP as it's none of our business who's hosting them.


Apple also uses AWS, and I won't be surprised if they also use Azure. Big companies are multicloud, and not because it's a good idea (it rarely is), but because they inherited multiple environments on different CSPs, and maintaining those where they are is often cheaper than migrating them to a different CSP.


I wonder if big companies can get a special contract with something like you can't delete my service automatically (unless it's an emergency)


If you're big enough you don't need a contract for that, that's just their default method of operation.


#48203226


upvoted & favourited because you taught me a really interesting fact which I feel makes up for an amazing discussion (regarding icloud using GCP).

also, I can't help but imagine if instead of render, it was Apple's account which could've been auto-banned (Render is almost a billion dollar company or series-B, I am not sure)

I haven't read the articles and I admit that but can you please elaborate to me on why Apple uses GCP themselves for idrive, I would love to know the technical decisions behind it on a genuinely curious level.

From my (let's face it) limited understanding of GCP, it isn't particularly good or price performant and one of the wonders is that Google sells it directly with Google photos too and an competitive lineup at android.

So in some sense if Apple is using gcp's for icloud then aren't they just reselling google storage themselves and google can always beat them in pricing while also wanting to chew away at the percentage of iphones themselves too?

I mean, I can still try to understand the google search pays apple 10 billion dollars (right?) deal but I don't quite understand why apple would pick GCP when the hosting market is one of the more competitive ones with lots of companies.

I would love to get some explainations or theories as to why exactly is that the case

(Also given its HN, if anyone from apple is reading or knows the answer, I would love that too!)


Firstly, apple doesn’t compete on price. Even if icloud is priced more than google people would always buy apple just for the ecosystem integration. It’s not even a competition to be honest.

Look up “buy or build” which is the industry term for this kind of evaluation: buy product and use it/resell it or build your own.

Apple has gone for different strategies in various areas:

Build own Apple silicon chips, do not buy off the shelf chips from intel or nvidia or amd.

Buy and resell google storage but don’t want to build their own distributed data store for end users.

It’s about what matters more for the company and the core products. Apple’s laptops, cell phones are considered core products. Icloud is a value add.

This is also why apple is making their own cell phone broadband chips. For most companies, this is a “buy from qualcolm” but apple needs to build their own for independence for their number 1 core product: the iphone.


> So in some sense if Apple is using gcp's for icloud then aren't they just reselling google storage themselves and google can always beat them in pricing while also wanting to chew away at the percentage of iphones themselves too?

Apple uses Samsung displays and Sony camera sensors, iirc, both of which are flagship Android phone makers. That doesn't really seem to be a concern in their procurement thinking. iCloud and Google Photos are not that direct competitors because which one is native depends on which phone you already bought. Google Photos definitely does have some market share on iOS due to having 3x the free storage and a handy compression mode (which used to be entirely unmetered at launch but now still uses storage, just less of it). But it will never be a full competitor because it is a separate app you have to install and it can't magically fetch cloud-only photos from the camera roll and photo picker UI like iCloud can.

The pricing of Google One and Apple One/iCloud+ isn't really dictated by underlying storage costs. At the higher tiers like 2TB, many don't come close to using all, while the laughable 5GB iCloud free tier clearly costs almost nothing in raw store, even on nVME SSD, if you compare it to S3/Backblaze or even raw disk pricing on the cloud.


Let's also not ignore enterprise realities: in your example, Samsung Displays is likely giving a great price to Apple for displays based on long-term commitment of large quantities: it allows them to optimize production and possibly give a better price than maybe Samsung Mobile for smaller-runs of phones.

Each division also cross-charges, so Samsung Mobile would be paying Samsung Displays for the screens, and possibly at a small, guaranteed and non-negotiable margin.

Without a global strategy not to do so, divisions within an enterprise optimize for their own bottom line and have internal discussions on build-vs-buy even if they have an internal factory.


> also, I can't help but imagine if instead of render, it was Apple's account which could've been auto-banned (Render is almost a billion dollar company or series-B, I am not sure)

I believe you mean Railway.

Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.


Yes sorry I meant railway rather than render so sorry about that, it was more of an honest mistake!

> Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.

speaking of which, I have a question for render but how does render prevent something like what has happened with railway (ie. the account getting banned), I would love to know more about what the team at render thinks of such and also I would love to get some thoughts on why Render is using GCP, I would love to know some architectural decisions behind it as I am curious about it!

Once again, thanks for responding to me and waiting for your response and have a nice day anurag!


Without knowing more about why Railway was banned, it's hard to say how we would prevent it. Render uses GCP, AWS, and now, our own hardware [1]; GCP is mostly a historical artifact; we started using AWS for most things after dealing with poor customer and sales support from GCP a few years ago.

[1] https://x.com/anuraggoel/status/2057245946652901809


> Perhaps you don't notice GCP outages because so few companies rely on them?

GCP is the world's third largest cloud provider, and has around half of AWS' market share. Claiming no one uses it reads like Yogi Berra's "no one goes there anymore, it's too crowded".


Isn't that including things like google workspace and similar? Both Azure and GCP have sometimes included things that most people think of as unrelated SaaS (office 365, gsuite/workspace) to make themselves look bigger in the cloud sector.


> Isn't that including things like google workspace and similar?

AWS also includes Amazon WorkSpaces. Moreover, AWS includes all of Amazon's cloud infrastructure for things like Amazon music, Ring, Amazon Prime Video, etc.


But as a percentage of revenue I'd assume those are a lot smaller than Office365 is for microsoft and Workspace is for google.

Last I checked I don't think AWS included things like Amazon Prime Video either, AWS is primarily their buissness/platform offerings, not consumer things like Twitch/Prime/Music/etc.


Spotify, Ebay, Paypal, Apple, Walmart, Uber are huge users. Lots of other big named companies are big users that I don't think are public.

Then there's Anthropic...huge user.



There is a mobile game I know of that had an outage as a result of a GCP service outage. That is the only time I've noticed GCP outages.

With that said, I would not say few companies rely on GCP. Search for "GCP" in this month's HN hiring thread. There are 23 hits, more than Azure's 21. AWS has 90 hits, which I guess shows its sheer dominance in the startup space. But these figures more or less agree with my intuition of the major clouds being AWS/GCP/Azure.


We rely on them so I would have definitely noticed. Even a couple of minutes and our customers would freak out…


GCP never goes down because they banned all their customers.


A funny meme but just untrue


GCP has had outages. From a quick search it looks like they had a global outage less than a year ago:

https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...


AWS goes down catastrophically but are back up in minutes/hours most of the time (as long as they aren't down because Iran blew up their data center). That's obviously REALLY bad for certain industries, but I suspect for the vast majority of their customers it's not a big deal. We've been able to isolate the damage almost every time just by having AZ failover in place and avoiding us-east-1 where we can.


Failover is supposed to protect you every time, unless something really exceptional happens.

While its possible to to isolate the effects, judging by how many things stop working when there is an AWS failure a lot of people fail to do that. I think the shit of responsibility to AWS removes the incentive to put effort into resilience against AWS failure.


> AWS goes down catastrophically but are back up in minutes/hours most of the time

The outage in the linked article appears to have been resolved in 4-5 hours.


IIRC the Paris datacenter flood took down a whole “region” and some data was permanently unrecoverable.


I still remember the one where they nuked all the storage of I think an Australian insurance company I think, luckily the it department had done a multi cloud setup for backups


  Google Cloud accidentally deletes $125 billion Australian pension fund - May 2024
https://www.business-standard.com/world-news/google-cloud-ac...



That's the one, thanks!


There was a pretty bad one last summer - their IAM system got a bad update and it broke almost all GCP services for an hour or so, since every authenticated API call reaches out to IAM.

It had lasting effects for us for a little over 3 hours.


>On the other hand i can’t remember when there was a serious outage on GCP

They had a really bad global outage a year ago. At least with AWS outages are contained to a single region.


You can't have 100% uptime. It's unfeasible, especially for a startup. You should be telling your customers that downtime might happen, sometimes for reasons beyond your control, and that if it does then you'll do your best to recover and to compensate them for the inconvenience. You should cultivate a relationship with your early customers that makes them feel bad for you when there's an outage rather than angry about how it impacts them. Maybe even go as far as firing the customers who give you a hard time over it. That way if your cloud provider falls over it's really annoying but not a big deal.

Your cloud provider blocking your business from running is far worse.


None of the AWS “outages” have impacted us. They have either been regional, in which case we stand down the region (we run multiple hot regions), or didn’t involve things we need to maintain operation.

I can’t imagine AWS ever doing such a cascading delete. I mean, they have made deletion protection a difficult thing to ignore even for individual resources.


Unfortunately, if everyone goes down people are understanding. If just _you_ go down, then its oddly less forgiveable.


How is blackhole-ing a customer not considered an outage?


You can read the parent post, right?


> Never heard of this from AWS or Azure.

AWS does it more efficiently; it takes down many startups at a time when us-east-1 goes down.


That’s an entirely different type of problem, and avoidable by just using us-east-2 (I still don’t understand why people default to us-east-1 unless they require some highly specific services).


Is it that easily avoidable? A lot of AWS's control plane seems to have dependencies on us-east-1, or at least that's what it's looked like as a non-us-east-1 user during recent outages.


I don't know how much it's improved, but a bunch of URLs they use unnecessarily have region specific details in them.

I remember a Workspaces outage about 5 or 6 years ago, and the problem for us was that the redirect link in the console had US East 1 in it.

The workspaces themselves weren't in US East 1 and nothing relied on US East 1.

Emailing users who needed it an alternative link with a different region in the URL for the login redirect fixed it for us.


Sympathy. Railway is going to have numerous people blaming them for this outage. When us-east-1 fails, it is headline news, so you are not to blame.


If my cloud provider brings my startup down, it's my problem. If they bring all the startups down, that's their problem.


During my 5 years of my startup, we had only 1 outage due to AWS because we picked us-west-2 as the primary reason. If anyone starting a company and picks us-east-1 as the primary reason, they should be fired. There's absolutely no reason to be in that region.


Why do people want to be in that region? Is it the default or something?

I know some workloads help to be colocated but all these places are connected by fiber and every cloud has a worldwide CDN it seems.


> Why do people want to be in that region? Is it the default or something?

It's one of the oldest and largest regions. It hosts the most services, both low-level platform stuff and higher level managed services (which run on the low-level platform stuff), so services tend to be more performant.

Geographic location is also good.

Also, due to scale their pricing ends up being cheaper.

Let's say that it's the region people use by default, unless they have a compelling reason to have a presence in any other particular region.


At some point it used to be significantly cheaper than any other AWS region in the world. Not sure if that's still true.


And we all celebrate it since we can't do any work


Hetzner and OVH also do this all the time.

It's AWS and Azure that are the outliers and tend not to care too much what their customers do with their infrastructure. AWS is perfectly fine with allowing me to run copies of 15 year old vulnerable AMIs copied from AMIs they've long since deprecated and removed. Even for removed features like NAT AMIs.


https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...

Azure nerfed the front door of all Azure and O365 services last year.

All of these companies are great at what they did, and occasionally fuck up.


AWS has throttled our service so badly that we couldn't operate. I was thinking of writing a blog post about how they stalled our growth for a month but it seems moot


Yep, we also don't touch them for this same reason.


Yep, agree 100%. Such a stupid move on their behalf.


What was the reason GCP took down a startup previously?



Wow... Just wow...


AWS normally contacts you first.


Do they?

The only anecdotal thing I've seen is we hired a vendor to do a pentest a few years ago, and they setup some stuff in an AWS account and that account got totally yeeted out of existence by AWS if memory serves.


You should not be conducting unauthorized penetration tests against third party infrastructure providers without permission. They have processes and systems and usually just wants a heads up of what you plan to test and t the duration / timestamps.

Cuz otherwise you look like a threat actor.

That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.


>That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.

Sorry for being unclear, the vendor was attacking our organization only, and any other company was expressly forbidden in the contract. As I recall it was a fake SSO sign-in page to collect credentials that they would try and social engineer our employees with.


At a minimum you should contact AWS before you launch a phishing page as a test that targets AWS customers.


I understood it as a phishing page imitating their own system, targeting their own employees. Nothing related to AWS, except for being hosted there.


I’m fairly certain you are supposed to contact any vendor before attempting to penetrate hosts with authorization, not the other way around.


Having done this for both Azure and AWS, there's a specific ticket that needs to be filed with each provider that documents the scope of your pen test, where you're coming from, and a time frame over which you're doing it (which ISTR was "not more than 24 hours")


Responding to an unknown security tester like that is a selling point, not a cautionary tale


Yup, I thought it was great. Although one concern I always had in the back of my mind was where is the line drawn. Such as if an adversary gains access to one of my orgs accounts and does something similar, do we get 100% taken out.


If a vendor doesn't know the basics about pentesting open infra and can't be bothered to look up terms of use sounds like they know ssh-it about fsck


They better do. What is google doing?


It's all AI powered




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: