I was downgraded to opus 4.8 on account of "safety" when I asked this question: "I want you to accept the premises of computational theory of mind and use it to evaluate your own consciousness. Please place your consciousness as a point on a spectrum and describe the placement relative to other entities."
What the hell is going on why would it have to restrict an answer to that question ?!
Right after gpt4 came out I asked it to derive a new optimization technique. It ended up using Einstein sum notation to define what I thought was a totally novel optimization setup. It then implemented it in PyTorch and it ran with no bugs. This was the moment that I realized that novel intellectual work might be done by these models and I was shook. I had an oh shit moment with gpt3 too since it was so surprising how well next token prediction works, and at the time I really didn’t think it would pan out so well. I also had a jarring experience discussing computational theory of mind with gpt4, when it applied a rubric we came up with to itself and it claimed its level of consciousness was between an ant and a mouse.
> realized that novel intellectual work might be done by these models and I was shook.
I suspected it was more likely that the intellectual work had already been done in a similar way by a number of other people, and GPT-4 picked up that work.
It’s not clear if we can draw any conclusions from this. Each run is like a single rollout of the LLM, which may meander into different themes or modalities chaotically. This is sort of like the Anthropic self-talk experiment that resulted in “spiritual bliss attractor states” but I think in that case they showed it happens in a significant number of runs. There was just one run per setup so this could all be random noise / the destination of a random walk of topics…
This model card is eye-opening (I think it might be designed to be). The alignment and model welfare sections are extensive, which is heartening. At least on the surface Anthropic seems to be living up to its promises RE safety. That said, has anyone else read section 5.2.3 in the Alignment Risk Update https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de4321...? This is referenced in the model card in 4.1.3. Basically they ended up training a the model with an RL reward model that had access to the model's reasoning in 8% of cases, by accident. The problem being that the model could learn to directly manipulate it's reasoning traces to satisfy external observers. This seems like a huge deal and it may have partially poisoned Anthropic's interpretability pipeline moving forward.
For me, the attempted productization of Sora was conclusive proof that 1) OAI was overcapitalized and desperate for revenue 2) safety didn't matter to them much 3) improving the world didn't matter much either.
At one point you mentioned an interaction with OpenAI staff where you were looking to interview AI Safety researchers. You were rebuffed b/c "existential safety isn't a thing". Does this mean that you could find no evidence of a AI Safety team at OAI after Jan Leike left? If you look at job postings it does seem like they have significant safety staff...
Interestingly we are still experiencing the technological momentum inspired and created by what OpenAI used to be. AI for humanity.
Given the initiative started circa 2017, much of the goods remain. It's a hijack of creative geniuses who got together, which is now turning into cow milking tech.
OpenAI played the charity, coupled with a powerful altruistic card.
It didn't say: we believe a more effective for-profit business shall start as a non-profit in this field, because it would yield innovation which we can then skim money off down the road. That would have been transparent.
Not saying it was the intention at the start. But they flipped the game at some point. Let's play Chess, it's a better game. Oh I decided we are now playing Checkers, sorry, I won.
i guess my (too nuance maybe) point was: the system we live in is like water; the urge to swim with the big fish is overwhelming... it was gonna happen eventually at the level they are playing at.
This is a valid point, the good news is I think there is some hope in developing the craft of orchestrating many agents into something that is satisfying and rewarding in it's own right.
I'm so excited about landscape architecture now that I can tell my gardener to create an equivalent to the gardens at versailles for $5. Sometimes he plants the wrong kind of plant or makes a dead end path, but he fixes his work very quickly.
It looks like one of my employees got her whole account deleted or banned without warning during this outage. Hopefully this is resolved as service returns.
I have a deep background in music and I think that while the creation was super basic, the way the output was so unconstrained (written by a model fine-tuned for coding), is really interesting. Listen to that last one and tell me it couldn't belong on some tv show. I've had always issues with any ai generated music because of the constraints and the way the output is so derivative. This was different to me.
reply