Your experience and my experience do not align. I asked GPT-4 to give me a POSIX...

l33tman · on April 20, 2023

Maybe it's heavily biased towards programming and computing questions? I've tested GPT-4 on numerous physics stuff and it fails spectacularly at almost all of them. It starts to hallucinate egregious stuff that's completely false, misrepresents articles it tries to quote as references etc. It's impressive as a glorified search engine in those cases but can't at all be trusted to explain most things unless they're the most canonical curriculum questions.

This extreme difficulty in discerning what it hallucinates and what is "true" is what it's most obvious problem is. I guess it can be fixed somehow but right now it has to be heavily fact-checked manually.

It does this for computing questions as well, but there is some selection bias so people tend to post the success-stories and not the fails. However it's less dangerous if it's in computing as you'll notice it immediately so maybe require less manual labour to keep it in check.

visarga · on April 20, 2023

> This is AGI, now we are nit-picking. It's here.

Hahaha, if you want nit-picking, all the language tasks chatGPT is good at are strictly human tasks. Not general tasks. Human tasks are all related to keeping humans alive and making more of us, they don't span the whole spectrum of possible tasks where intelligence could exist.

Of course inside language tasks it is as general as can be, yet still needs to be placed inside a more complex system with tools to improve accuracy, LLM alone is like brain alone - not that great at everything.

ChatGTP · on April 20, 2023

On the other hand if you browse around the web you will find various implementations of dirbuster, probably in C for sure in C++ which are multi-threaded , it’s not to take away from your experience but I mean, without knowing what’s in the training set it may have already been exposed to what you asked for, even several times over.

I have a feeling they had access to a lot of code on GH, who knows how much code they actually accessed. Copilot for a long time said it would use your code as training data, including context, if you didn’t opt out explicitly, so that’s already millions maybe hundreds of millions of lines of code scraped.

The conspiracy theorist in me wonders if MS just didn’t provide access to public and private code to train on, they wouldn’t have even told Open AI, just said, “here’s some nice data”, it’s all secret and we can’t see the models inputs so I’ll leave it at that. I mean they’ve obviously prepared the data for copilot, so it was there waiting to be trained on.

So yeah I feel your enthusiasm but if you think about it a little more, or maybe not so hard to imagine what you saw being actually rather simple ? Every time I write code I feel kind of depressed because I know almost certainly someone has already written the same thing and that it’s sitting in GitHub or somewhere else and I’m wasting my time.

ChatGPT just takes away the knowing where to find something (it’s already seen almost everything the average person can think of) you want and gives it to you directly. Have you never thought of this already ? Like you knew all the code you wanted already was there somewhere, but you just didn’t have an interface to get to it? I’ve thought about this for quite a while and I knew there would big data people doing experiments who could see that probably 80-90% of code on GitHub is pretty much identical.

Nothing is magic, right ?

misnome · on April 21, 2023

> If you’re laughing this thing off as generating unhelpful nonsense you’re going to get blind sided in the next few years as GPT gets wired into the workflows at every layer of your stack.

Okay, now try being a scientist in a scientific field that isn't basic coding.

It's not people laughing at pretences, it's people who know even basic facts about their field literally looking at the output today and finding it deeply, fundamentally incorrect.

r3trohack3r · on April 21, 2023

I do not believe that is a reasonable threshold for AGI. If it were, I believe a significant % of humans would individually fail to meet the threshold of AGI.

I wonder what your personal success rate would be if we did a Turing test with the “people” who “know basic facts about their field.” If they sat at a computer and asked you all these questions, would you get them right? Or would you end up in slide decks being held up as a reason why misnome doesn’t qualify as AGI?

I find comfort in knowing that it can’t “do science.” There is a massive amount of stuff it can do. I’m hopeful there will be stuff left for humans.

Maybe we’ll all be scientists in 10 years and I won’t have to waste my life on all this “basic coding” stuff.