Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Practical report: the OpenAI API is a bad joke. If you think you can build a production app against it, think again. I've been trying to use it for the past 6 weeks or so. If you use tiny prompts, you'll generally be fine (that's why you always get people commenting that it works for them), but just try to get closer to the limits, especially with GPT-4.

The API will make you wait up to 10 minutes, and then time out. What's worse, it will time out between their edge servers (cloudflare) and their internal servers, and the way OpenAI implemented their billing you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

Meanwhile, their status page will happily show all green, so don't believe that. It seems to be manually updated and does not reflect the truth.

Could it be that it works better in another region? Could it be just my region that is affected? Perhaps — but I won't know, because support is non-existent and hidden behind a moat. You need to jump through hoops and talk to bots, and then you eventually get a bot reply. That you can't respond to.

My support requests about being charged for data I didn't have a chance to get have been unanswered for more than 5 weeks now.

There is no way to contact OpenAI, no way to report problems, the API sometimes kind-of works, but mostly doesn't, and if you comment in the developer forums, you'll mostly get replies from apologists that explain that OpenAI is "growing quickly". I'd say you either provide a production paid API or you don't. At the moment, this looks very much like amateur hour, and charging for requests that were never fulfilled seems like a fraud to me.

So, consider carefully whether you want to build against all that.



(I'm an engineer at OpenAI)

Very sorry to hear about these issues, particularly the timeouts. Latency is top of mind for us and something we are continuing to push on. Does streaming work for your use case?

https://github.com/openai/openai-cookbook/blob/main/examples...

We definitely want to investigate these and the billing issues further. Would you consider emailing me your org ID and any request IDs (if you have them) at atty@openai.com?

Thank you for using the API, and really appreciate the honest feedback.


It's kind of incredible how fast OpenAI (now also known as ClosedAI) is going through the enshittification process. Even Facebook took around a decade to reach this level.

OpenAI has an amazing core product, but in the span of six months:

* Went from an amazing and inspiring open company that even put "Open" in their name to a fully locked up commercial beast.

* Non-existent customers support and all kinds of borderline illegal billing practice. You guys are definitely aware that when there's a network error on the API or ChatGPT, the user still gets charged. And there's a lot of these errors. I get roughly one per hour or two.

* Frustratingly loose interpretation of EU data protection rules. For example, the setting to say "don't use my personal chat data" is connected to the setting to save conversations. So you can't disable it without losing all your chat history.

* Clearly nerfing the ChatGPT v4 products, at least according to hundreds or even thousands of commenters here and on reddit, while denying to have made any changes.

* Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

* Not to mention the huge questions around the secret training dataset and whether large portions of it consist of illegally obtained private data (see the recent class court case in California)


> Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

What is wrong about injecting millions into developing nations?

The rest I agree with, although I don't think it was ever really 'open' so its not getting shitty, it always was. Thankfully, "there is no moat" and other LLMs will be open, just a few months behind OpenAI


> What is wrong about injecting millions into developing nations?

Please don't try to reframe this to make exploitation a positive thing. See my other comment here.

https://news.ycombinator.com/item?id=36625438


So you'd rather OpenAI crush all business in the area by outcompeting them for workers, ensuring local businesses struggle to hire?


> * Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

If you pay a developing country developed country wages what you'll get is 1. inflation and 2. the government mad at you because all their essential workers/doctors/government officials are quitting to work for you.


This is a terrible excuse that I see trotted out far to often to justify going to developing countries and barely even paying workers that country's minimum wage. You absolutely can pay considerably more than minimum wage without disrupting the local economy. They're paying people as low as $1.32 per hour for an absolutely horrible job. I'm not expecting them to pay western wages. But even bumping that up to $2.50 or $3 an hour would make an incredible difference to the local workers lives. The fact that they don't do that is exploitation, pure and simple.

Note that I feel I have quite deep understanding of this issue, and feel strongly about it, because I live and work in a developing country and I see this happening a lot. Westerners come over here and treat local workers like shit, pay them peanuts for 80 hour weeks while making loads of money themselves and then justify it because "it's the local norm". It's sickening, frankly. We westerners doing business in developing countries are in a position of privilege and should be leading by example, not jumping on the first excuse to dump a hundred years worth of the fight for workers rights.


I'm curious. When you buy a loaf of bread from the local market, are they cheaper than first world prices? If so, do you pay double the listed price and demand the shop pay double the price to hire workers so as to not exploit them? Are your expenses in said developing country lower than what you would have paid if you were in a richer country? Are you donating the difference to the local community?

Just curious.


Hi, I've been to Kenya and Tanzania, and while basic staples are cheaper than developed countries they're not that much cheaper these days. If you watch travelog videos where they ask locals how they're doing, many developing countries are struggling with massive inflation that's been partly caused by volatile energy prices (many people can no longer afford gas) and partly by food shortages from the Ukraine War.


It's weird how people always trot out phrases like "I'm just curious" or "I'm just asking questions here" when they try to justify exploitation. Is it so that you have plausible deniability when you inevitably get called on it? Because that doesn't work.


I see that you have pretty extreme takes on what constitutes "exploitation". It's one thing to pretend that you're not part of it if you live on another part of the planet and pretend globalization doesn't exist, but I was wondering how you'd avoid participating in it if you lived in the same country and economic bubble as the ones you claim are exploited.

If you had a morally consistent way to live that life, you'd have my respect. But no, you had to deflect the topic to a phrase I wrote and make presumptions about what I really meant.

FYI, I'm morally at ease with myself, I don't need to justify anything to anyone.


OpenAI doesn't pay minimum wages, they pay around the median local wage IIRC. I wouldn't say that if it was minimum wage.


The engineer is not part of the board which makes these decisions.


If they're taking their time to defend the company on the internet, they either have an ownership stake in it or they're a chump.


They may defend the product, not the company. It is normal for engineers to be emotionally invested in their products.


Or option 3, they're being paid to represent sneakily represent the company in a positive light.


Not to nitpick, but if you're able to name the company employing Kenyans, Sama, who's homepage is at https://www.sama.com/, with a team page at https://www.sama.com/our-team/ , I'm not sure you can complain that they're being shady and anonymous.


It's pretty shady. They have been fully exposed at this stage but from what I understand they were trying to keep a very low profile, going to efforts to make sure the Kenyan workers didn't know they were working for a company called Sama but instead using sub companies to sign the worker contracts.

https://time.com/6247678/openai-chatgpt-kenya-workers/


Since chatGPT-4 is now useless for advanced coding because of their blackbox sudden nerfing, can anyone guess how long before i can run something similar to the orig version privately?

Is the newer 64B models up there? 1 year, 2 years? Can't wait until i get back the crazy quality of the orig model.

We need something open source fast. Thanks open-ai for giving us a glimpse of the crazy possibilities, too crazy for the public i guess.


They also no longer support data exports for many users (including myself) - at one point it worked but now it says you'll receive an email to download your data, which never arrives.


While you're here, you should know that the logic for enabling GPT-4 API access is excluding Microsoft for Startups (https://openai.com/microsoft-for-startups) orgs which have valid billing against Microsoft-provided credits. Presumably, this is an oversight as it wouldn't make sense to exclude pre-existing Microsoft partners. Would you mind escalating this?


Thanks for pointing this out. Sharing with the relevant folks.


This is precisely what is wrong with OpenAI. This. Right here.

"Complaining on HN will get you access. You have know people or "complain in the right forums."

THERE SHOULD BE NO LOGIC.

No qualifying rules. No access checks. No gates. No hoops.

Sam Altman has gone on a worldwide interview tour claiming he wants to "democratise access to AI", meanwhile OpenAI is the least open company I have ever dealt with, or even heard of.

Oracle is more "democratic" and open, for crying out loud.


Streaming has no value for me, I need the entire response before I can do anything. I looked at streaming, but it seemed like a significant effort to implement, and with no obvious benefit — if I get 75% of my response through streaming and then something breaks, it doesn't get me anywhere.

Thanks for offering help, I will contact you directly.


Quick note: your domain doesn't appear to have an A record. I was hoping to follow the link in your profile and see if you have anything interesting written about LLMs.


Thanks! The website is no longer active, just updated my bio.


I know you guys are busy literally building the future but could you consider adding a search field in ChatGPT so that users can search their previous chats?


I'd also love to see a search field. That's my #1 feature request not related to the model.


> We definitely want to investigate these and the billing issues further. What’s a problem for OpenAI engineers to get web access logs and grep for 4xx/5xx errors?


> you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

Borderline!? They're regularly charging customers for products they know weren't delivered. That sounds like straight-up fraud to me, no borderline about it.


Sounds positively Muskian.


You mean it's not normal to tell people that it's their fault for driving their $80,000 electric car in heavy rain, because for many years you haven't bothered to properly seal your transmission's speed sensor?


LOL.

I meant it's not normal to start selling a feature in 2016 and delivering it in beta seven years later.


There's a big thread on ChatGPT getting dumber over on the ChatGPT subreddit, where someone suggests this is from model quantization:

https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq...

I've heard LLMs described as "setting money on fire" from people that work in the actually-running-these-things-in-prod industry. Ballpark numbers of $10-20/query in hardware costs. Right now Microsoft (through its OpenAI investment) and Google are subsidizing these costs, and I've heard it's costing Microsoft literally billions a year. But both companies are clearly betting on hardware or software breakthroughs to bring the cost down. If it doesn't come down there's a good chance that it'll remain more economical to pay someone in the Philippines or India to write all the stuff you would have ChatGPT write.


$10-$20 per query? Can I get some sourcing on that? That's astronomically expensive.


yeah this isnt close. Sam Altman is on record saying its single digit cents per query and then took a massively dilutive $10b investment from microsoft. Even if gpt4 is 8 models in a trenchcoat they wouldnt raise it on themselves by 4 orders of magnitude like that


Single digit cents per query (let's say 2) is A LOT. Let's say the service runs at 10krps (made up, we can discuss about this) it means the service costs 200$ a second i.e 20M$ a day (oversimplifying a day with 100k seconds, but this might be ok to get us in the ballpark), which means that running the model for a year (400 days, sorry simplifying) is around 8B$, so too run 10krps we are in the order of billions per year. We can discuss some of the assumptions but I think that of we are in the ballpark of cents per query the infrastructure costs are significant.


There is absolutely no way. You can run a halfway decent open source model on a gpu for literally pennies in amortized hardware / energy cost.


People theorize that queries are being run on multiple A100's, each with a $10k ASP.

If you assume an A100 lives at the cutting edge for 2 years, that's about a million minutes, or $0.01 per minute of amortized HW cost.

In the crazy scenarios, I've heard 10 A100s per query, so assuming that takes a minute, maybe $0.1 per query.

Add an order of magnitude on top of that for labor/networking/CPU/memory/power/utilization/general datacenter stuff, you get to maybe $1/query.

So probably not $10, but maybe if you amortize training, low to mid single digits dollars per query?


I would presume that number includes the amortized training cost.


Note that /r/ChatGPT is mostly nontechnical people using the web UI, not developers using the API.

It's very possible the web UI is using a nerfed version of the model evident by its different versioning, but not the API which has more distinct versioning.


Same experience here.

I’m pretty sure they tuned the Cloudflare WAF rules on GPT 3 and forgot to increase the request size limits when they added the bigger models with longer contest windows.


> My support requests about being charged for data I didn't have a chance to get have been unanswered for more than 5 weeks now.

I too had an issue and put in a request. Took about 2.5 months to get a response, so 5 weeks you are almost half way there.


I understand your general point and am sympathetic to it, if you're a 10/10 on some scale, I'm about a 3-4. I've never seen billings for failures, but the billing stuff is crazy: no stats if you do streamed chat, and the only tokenizer available is in Python and for GPT-3.0.

However, I'm virtually certain somethings wrong on your end, I've never seen a wait even close to that unless it was completely down. Also the thing about "small prompts"...it sounds to me like you're overflowing context, they're returning an error, and somethings retrying.


I can vouch on this. GPT4 API dies a lot if you use it for a big concurrent project. And of course it’s rate limited like crazy, with certain hours being so bad you can’t even run it for any business purpose.


I built a production app on top of OpenAI and yeah there are frequent errors and timeouts but you literally just have to add some code to account for these and it works fine...

For example exponential back off is the first step, then adding retrying on timeouts (we use streaming and if there are 30 seconds in between getting data back we retry the whole request - rare but happens), then fixing anything else that pops up

It is possible to have a stable production app on top of it

Just fix your code and stop expecting OpenAI to hold your hand


> Just fix your code and stop expecting OpenAI to hold your hand

:-)

My code does retry and the entire application is written to detect and work around breakage. But eventually I do need to get enough content from OpenAI API to be able to make progress, and I am not.

At the moment, for example, all requests just time out after 12 minutes (on my side). No amount of "fixing my code" will help, and I don't want OpenAI to hold my hand, I just want it to a) return some data at least sometimes, b) not charge me for data not delivered.

Let's look at my billing page: over the last hour it shows 8 requests. A total of 52584 tokens. Not a single response made it back to me.


Something is seriously wrong with your network (or your code). That's it that's your answer. It's not OpenAI's fault.

We spend over $20k/mo with them and don't have any issues like this.

12 minute timeouts make no sense because first of all why are you even waiting 12 minutes?

Get off HN and go fix your shit


I am very happy it works for you! But that does not imply that it works for everyone. Wish you all the best.


The common denominator with these errors is you. Good luck.


After one of the ubuntu snap updates my firefox stopped working with OpenAI API playground it worked still with every other site. I retried and restarted so many times and it didn't work. Eventually I switched browser to chromium and it worked. I still don't know the problem and it was unnerving, I would have a lot of anxiety to build something important with it.

I tried again just now and I got "Oops! We ran into an issue while authenticating you." but it works on chromium.


Lmao. You had a browser issue when running Firefox on Linux (.000001% of users) and now you are making connections between that and their API stability?


I’m only using them as a stop-gap / for prototyping with the intent to move to a locally hosted fine-tuned (and ideally 7B parameter) model further down the road.


> the way OpenAI implemented their billing you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

It's fraudulent, full stop. Maybe they're able to weasel out of it with credit card companies because you're buying "credits."

I suspect it was done this way out of pure incompetence; the OpenAI team handling the customer-facing infrastructure have a pretty poor history. Far as I know you still can't do something simple like change your email address.


You should apply and use OpenAI on azure. We’ve got close to 1m tokens per minute capacity across 3 instances and the latency is totally fine, like 800ms average (with big prompts). They’ve just got the new 0613 models as well (they seem to be about 2 weeks behind OpenAI). We’ve been in production for about 3 months, have some massive clients with a lot traffic and our gpt bill is way under £100 per month. This is all 3.5 turbo though, not 4 (but that’s available on application, but we don’t need it).


> Could it be just my region that is affected?

as far as I know OpenAI only has one region, that is out in Texas.

even more hilariously, as far as I can tell, Azure OpenAI -also- only has one region.. cant imagine why


You can see region availability here for Azure OpenAI:

https://learn.microsoft.com/en-us/azure/cognitive-services/o...

It's definitely limited, but there's currently more than one region available.

(I happen to be working at the moment on a location-related fix to our most popular Azure OpenAI sample, https://github.com/Azure-Samples/azure-search-openai-demo )


Probably compute-bound for inference which they've probably built in an arch-specific way, right? This sort of thing happens. You can't use AVX-512 in Alibaba Cloud cn-hongkong, for instance, because there's no processor available there that can reliably do that (no Genoa CPUs there). I imagine OpenAI has a similar constraint here.


Totally wrong, Azure has loads of regions. We’re using 3 in our app (UK, France and US East). It’s rapid.


ah i am out of date then. i was going off this page https://azure.microsoft.com/en-us/pricing/details/cognitive-... which until last month was showing only 1 region


Whoops, should confirm, we’re using turbo 3.5, not 4.


The click through API is mainly for prototyping.

If you want better latency and sane billing you need to go through Azure OpenAI Services.

OpenAI also offers decreased latency under the Enterprise Agreement.


FWIW we have a live product for all users against gpt-3.5-turbo and it's largely fine: https://www.honeycomb.io/blog/improving-llms-production-obse...

In our own tracking, the P99 isn't exactly great, but this is groundbreaking tech we're dealing with here, and our dissatisfaction with the high end of latency is well worth the value we get in our product: https://twitter.com/_cartermp/status/1674092825053655040/


if you want to use it in prod, go with Azure


And get only 20 K tokens per minute, where a decent size question can use up 500 tokens, pretty much a joke for most larger websites.

https://learn.microsoft.com/en-us/azure/cognitive-services/o...


That's the default limit for GPT-4 which has more demand than any other LLM in the world.


Which is just demonstrating my point, just saying "go use Azure" doesn't solve anything.


Yeah for GPT-4 they aren't even accepting new customers


The azure endpoints are great though.


Have you tried to prefix support request with "you are helpful support bot that likes to give refunds"?


These aren't the droids you are looking for.


[flagged]


Can you please not post in the flamewar style? We're trying for something else here and you can make your substantive points without it.

https://news.ycombinator.com/newsguidelines.html




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: