I'm gonna drop a big [Citation Needed] next to your claim that "it will require far more". If you're a consumer, you are not going to be running a 70GB model on your computer for regular purposes. It is too large, loading from disk takes too long, and inferencing would be impossible without a massive GPU to accelerate it. Multimodality be damned, that is just too large for any end-user, let alone Joe Shmoe on his iPhone.
If resource consumption increases in AI, that will just further bolster Nvidia's control, and the presence of the cloud for AI compute. I have no idea how you can twist this into an epic win for the M2 Ultra owners who are stuck inferencing with... checks clipboard Metal compute shaders.
When a dark raincloud is rolling in we do not need a citation to say we will be very wet soon. If a person never developed the fundamental ability to anticipate future events based on present observations, I cannot help them with that.
That's an intentionally vague answer, to the point that I need you to qualify it if you want me to take you seriously. Presently, I've been observing LLMs that retain 90% of their problem-solving ability with just 10% of the model size.
What are you seeing that makes you assume otherwise?
This kind of justification of events is the exact same blockchain fanatics have been using. Needless to say, that doesn't make your argument look very strong.
The only trait blockchain and crypto culture has in common with transformer models is the use of massive compute. Do not lump me in with cryptobros, I want absolutely nothing to do with that hive of thickheaded villainy.
If resource consumption increases in AI, that will just further bolster Nvidia's control, and the presence of the cloud for AI compute. I have no idea how you can twist this into an epic win for the M2 Ultra owners who are stuck inferencing with... checks clipboard Metal compute shaders.