I understand your general point and am sympathetic to it, if you're a 10/10 on some scale, I'm about a 3-4. I've never seen billings for failures, but the billing stuff is crazy: no stats if you do streamed chat, and the only tokenizer available is in Python and for GPT-3.0.
However, I'm virtually certain somethings wrong on your end, I've never seen a wait even close to that unless it was completely down. Also the thing about "small prompts"...it sounds to me like you're overflowing context, they're returning an error, and somethings retrying.
However, I'm virtually certain somethings wrong on your end, I've never seen a wait even close to that unless it was completely down. Also the thing about "small prompts"...it sounds to me like you're overflowing context, they're returning an error, and somethings retrying.