> Google will utilize the base scalar and vector of X280 in a way that allows push/pop vector instructions.
What are push/pop vector instructions? Is that kind of like the old school OpenGL 1 era matrix stacks? Or literally just traditional stack style access between memory and a vector register file? Or something else entirely?
More of a press release than anything else. Could one actually train something interesting on a processor like this? I assume you'd need a pretty custom toolchain. Hard to imagine where these processors would have a competitive advantage (but would love to hear if anyone has encountered something like this in the wild)
Easier said than done. Even with Google level resources, TPU support for pytorch is patchy (https://arxiv.org/abs/2309.07181). Device abstraction is not great, assumes CUDA in unexpected places.
The Groq AI chip startup has solved this problem. They don't use hand written kernels at all, instead they use a compiler, and they have the top speed in the world on LLaMA2-70B, 240tokens/s.
Other interesting Groq tidbits - their models are deterministic, the whole system up to thousands of chips runs in sync on the same clock, memory access and network are directly controlled without any caches or intermediaries so they also run deterministically.
That speeds up communication and allows automatic synchronisation across thousands of chips running as one single large chip. The compiler does all the orchestration/optimisation. They can predict the exact performance of an architecture from compile time.
What makes Groq different is that they started from the compiler, and only later designed the hardware.
What is the pass rate on torchbench? This gives a more realistic measure of how good a vendor's pytorch support is.
All the big chip startups have their own pytorch compiler that works on the examples they write themselves. From what I've seen of Groq it doesn't appear to be any different.
The problem is that pytorch is incredibly permissive in what it lets users do. torch.compile is itself very new and far from optimal.
Pytorch XLA is such a pain to use. And once you go TPU you need the same energy to switch back, so you can’t quickly test out how it performs on your problem.
One of the big reasons custom hardware solutions struggle.
IMO - you’d have better luck as a hardware vendor implementing an LLM toolchain and bypassing a general purpose DL framework. At the very least you should be able to post impressive results with this approach rather than a half baked pytorch port.
I feel like that would make it harder for a vendor to keep up with the industry.
Say you took all the effort in the world to build your custom LLM toolchain to train a Llama on custom hardware. And then suddenly someone comes up with LoRA. You didn't even finish porting it to your toolkit then someone comes up with GPTQ.
Can't keep up with a custom toolchain imo.
It's like a forked linux kernel. Eventually you're gonna have to upstream if you're serious about it, which is what AMD is actively doing with pytorch for ROCm (masquerading it as CUDA for compatibility).
I disagree. llama.cpp[0] is a good counterpoint to this, since it uses a custom ML framework created from scratch. Despite not having the developer team of a large company, it still keeps up with many of the advancements in LLMs.
llama.cpp is not necessary for creating lots of demand for the chip it was originally written for (Apple M1), whereas new hardware vendors need to demonstrate they can plugin to existing tools to generate enough demand to ship in volume.
> lots of demand for the chip it was originally written for (Apple M1)
To be fair, the M1/M2 chip can't be purchased or used separately from the Mac, unlike GPUs or socketed CPUs, and demand for Macs is already fairly high.
That might be good enough to get a hardware startup acquired, but not good enough to get major sales. Users want pytorch and negligible switching cost between chips.
Bigger problem for startups trying to muscle in on LLMs is that there isn't much room for improvement on existing solutions to do something radically different.
>Bigger problem for startups trying to muscle in on LLMs is that there isn't much room for improvement on existing solutions to do something radically different.
aye - unless you are able to notch a 10x cost/performance improvement. The migration overhead will just make it not worth it to switch.
Even after prioritising tensorflow, keras, jax etc., they can still afford to have a very large team working on torch_xla and still hedge their bets with a separate team on torch_mlir.
Yes, there's a lot of hype and buzzwords. Certainly companies are slapping the "AI" label onto their products and jumping on the bandwagon in order to gin up interest.
But there's a real technological change going on. Machines are capable of performing tasks that were nearly unthinkable two years ago. There's real progress beneath the hype.
AI workloads are going to be dramatically more significant in the next decade. A shift like this opens up real opportunities for smaller players to capture enormous market share. If SiFive manages to develop the best AI hardware, they could become bigger than Apple. Or they could fumble and go out of business. Either of these outcomes is genuinely possible when this kind of technological shift happens.
> But there's a real technological change going on. Machines are capable of performing tasks that were nearly unthinkable two years ago. There's real progress beneath the hype.
Like what? What's unthinkable tasks are so useful thanks to AI?
> Like what? What's unthinkable tasks are so useful thanks to AI?
Erasing strangers or unwanted items in the background on pictures taken by my phone locally from my Phone's photo app was unthinkable a few years since you needed Photoshop on a PC for that. I assume the TPU is doing heavy lifting on image segmentation since the UX is pretty much touch-to-erase.
It does a bunch of useful things already. Arguably most users of Copilot see it as having a positive contribution or they would not pay the price. Generative Image models are pretty good at fixing photos and creating vector art. Chatbots, even though they hallucinate, are consistently polite, helpful and efficient. They are good at sparking new ideas, playing fantasy games, debugging and rubberducking. Maybe they are also good therapists, when prompted right. They are universally used for summarisation, text based question answering, and extracting data in JSON.
I asked SD to generate me a photo of Putin eating Turkish food and 30s later had a photo of Putin eating something that looks a lot like Turkish food. If I kept trying or threw more processing time at it, I'd have gotten something nicer. Probably.
Two years ago this immensely useful feat would have been impossible to accomplish in 30 seconds.
On a more serious note, ChatGPT has made made me a much more productive person. Recent example where it helped me a lot in getting an understanding of a new topic: https://news.ycombinator.com/item?id=37622335 - Even when it's making a fair amount of mistakes, it can be like a personal tutor who points you into the right directions when you're stumbling around in an unfamiliar field. Sort of like a much smarter search engine.
Granted, useful to me doesn't mean useful to you. But there's also people to whom search engines aren't useful, because they don't know how or when to use them.
This kind of stuff makes me wonder, are investors really this dumb? Surely they must be able to see something strange is going on when every company ever is suddenly shoving AI into everything, because this has happened many times before but with different things. Maybe they try to keep up the hype because they benefit from it?
I would say that a lot of investors are not the brightest. They just as receptacle to emotion manipulation and social engineering tactics as most people
Theranos is a great example. Good number of rich jumped into the tech while those in the industry knew a drop of blood does not contain the proper sample size for testing multiple markers.
Dr. Varun Sivaram's book "Taming the Sun" even talks about Elon Musk's solar panel grifting tactics to seek and exploit investors.
How is this any different than investors going for the world changing technology such as blockchains and NFTs?
Yup, there's definitely a window of opportunity here before the market has matured and the hype is still on. I honestly blame myself for being too lazy to capitalize on my basic AI skillz and get paid $$$ for basic advice/guidance to organizations and municipalities here. I know some that are cashing in right now.
That's what the "RISC-V Application Profiles" are about: to standardise a set of extensions for application-class processors.
The SiFive P870 supports "RVA23"[1] which provides more or less feature-parity with ARM. It also supports vector cryptography, which is optional in RVA23. (RVA23 is the successor to last year's RVA22, making Vector and a couple smaller extensions mandatory)
RVA23 + vector crypto also lines up with what Google has announced is likely to be their baseline for RISC-V - based Android handsets in the future.[2]
Why? Extensions that are not implemented in silicon: who cares.
For extension that are implemented, better if there's a standard.
As for the # of them: just look what's been bolted onto x86 over the years, and their counterparts in other ISAs like Arm.
Consolidation will happen in the market, over time. Some combinations of extensions will be common, some rare. And due to the modularity, not a big issue for software.
Isn’t that one of the problems of X86? Extension that get used just barely enough that you have to include them, but not enough that they should actually warrant transistors.
Unless you want to make application specific processors that's just a reality. We are gone include things like crypto even if the computer never uses any crypto.
The thing is, for some things you can get 100x improvement. Even if not used often its gone make a difference.
But for them most part, things that are part of RISC-V RVA23 profile make sense to be on a standard PC or Server.
Asked differently what isn't really necessary that is in RISC-V? One could make arguments for a simpler vector extensions for example. And a few other tiny things you can argue about.
Overall RISC-V RVA23 is a reasonable and feature complete set for a modern processor.
x86 on the other-hand has quite a few things that are not really needed if you redesigned it.
Only if you're hoping to have people use them as application cores, running the same binaries off the internet or a CD for various different users. But there are lots and lots of roles that aren't like that from managing hard drivers to SoC's power consumption to Google's TPU AI engines where you're compiling to target a specific hardware configuration. And in that case having lots of standard extensions to choose from, each with their own compiler support, is really valuable to only pay for the hardware features you need for your own application.
Is there a requirement, when a RISC-V extension is proposed, to also provide a software implementation? Then somebody could just run the code and link the software version instead.
At least for stuff like vector extensions and tensor units, which could easily be emulated in software, this would seem to be a desirable trait. And I mean, it is hard to believe that most of this stuff isn’t proof-of-concept tested using software implementations first.
In particular, it could be nice for people who want to develop on RISC-V, when really good desktop chips come out, to be able to actually compile their whole project and run it before sending it off to the exotic hardware.
Sort of. They also want to add a bunch of custom extensions to make up for the lack of the standard C extension. Qualcomm is probably not going to get what they want.
What appears to be happening is that they bought NUVIA, and a bunch of high performance arm64 cores. ARM sued claiming that cores designed under NUVIA'S Arm Architecture License can't be transferred to Qualcomm's Arm Architecture License. It seems they are trying to slap a riscv decoder with a minimum amount of work. So that means no C extension because the arm64 frontend was designed around aligned 32bit instructions, and tons of custom instructions that match how AArch64 gets passable density, like equivalents to ldp/stp.
So what you're seeing here is a success of RISC-V's model to keep consistency without an incredibly hierarchical body mandating consistency. Qualcomm is more than welcome to create that core and it's numerous extensions. The community is also welcome to not support it by default, and so Qualcomm will probably actually go and complete their design and collaborate.
>They also want to add a bunch of custom extensions to make up for the lack of the standard C extension.
Was this mentioned anywhere or just a guess?
>It seems they are trying to slap a riscv decoder with a minimum amount of work.
This is interesting, and if true I would just add that they would likely sell both cores – one RISC-V and one ARM – at least in the short term. Proof is that they have already taped out and anounced thier NUVIA based Snapdragon X desktop/laptop SoC. This would further incentivise compatibilty between ARM and RISC-V as this wouldn't be a one time port like Apple did with PA Semi. I would also add that this is all speculation.
Yeah, they're burying it a bit, but it's on their third slide.
> RV64G + 32-bit instructions for code size is best in class
> • More ld/st addressing modes
> • Ld/st pair
> • Conditional immediate branches
> • Move pair
Those are all custom extensions to Qualcomm's design, and suspiciously all how AArch64 gets passable code density.
> This is interesting, and if true I would just add that they would likely sell both cores – one RISC-V and one ARM – at least in the short term. Proof is that they have already taped out and anounced thier NUVIA based Snapdragon X desktop/laptop SoC.
Them announcing that is what made ARM sue them. That core as an AArch64 design is going to be stuck in the courts for longer than it'll be relevant. This work by Qualcomm seems to be a combo of trying to get anything out of the NUVIA acquisition, and maybe having a stick when negotiating a settlement with ARM ("we'll make your existential threat more real").
>Them announcing that is what made ARM sue them. That core as an AArch64 design is going to be stuck in the courts for longer than it'll be relevant.
I'm reffering to this anouncement made October 10.[0]
>2024 will be an inflection point for the PC industry, and Snapdragon X compute platforms will deliver next-level performance, AI, connectivity and battery life.
So it seems that they will just push this out and deal with the consequences later.
> It seems they are trying to slap a riscv decoder with a minimum amount of work.
This doesn’t make sense to me. I thought that one of the selling points of the C extension was that it was a minimal amount of work to add to the decoder.
It is a minimal amount of work to add to a decoder... if you start with it in your design from day one. The actual decoding is trivial, but if your front end is designed around aligned instructions, there's potentially some nontrivial work on high end cores around correct handling of goofy states like when non aligned instructions straddle cache lines or pages. Definitely doable, but it felt like Qualcomm was trying as much as possible to just change out the 'pure function which turns instruction bit patterns into micro-op bit patterns' part of the decoder.
What do my HN colleagues think about the assertion by some in the US congress that RISC-V is a clever conspiracy by China to dodge sanctions and get clever American universities to do their research for free?
While I think "information wants to be free" and that stopping CPU collab is like trying to stop encryption in the 90s, I can't help but wonder that RISC-V neuters attempts to lock China out of military AI advancements.
Then again, don't they have x86 factories there? If China has a factory, doesn't the government pretty brazenly learn trade secrets from it?
---
edit to Nickik, since HN has throttled me for more than an hour. Gotta get around the system somehow, since the goddamned rules are so opaque. I'll move this to a comment when I am in the site's good graces again.
---
Highlight where I suggested any of that.
I even said that actually attempting to curtail open development, even if desired, is futile.
I was only wondering if there is validity in suspecting that China will explicitly finance/encourage RISC-V development in order to dodge EU/US sanctions of processor technology.
Why don't we just no longer have opensource. We can also shut down the wikipedia. Lets also shut down google. Otherwise China might use these things. Why not just nuke Bay Area, NY and LA. Then we can make sure China can't profit from the people there. All universities should be closed, those people might produce research that China can read.
RISC-V does not have its origin in China. It's not a "clever conspiracy" at all.
Are Chinese companies jumping on board this existing effort to avoid sanctions? Almost certainly.
Other than that, don't care much what US congressmen think, since I'm not American, but attempting to close the door on RISC-V will not only be harmful to the broad industry (not just China) but will also be ineffective.
This all comes down to the US wanting to have both freedom of speech, but also prevent US citizens from talking to Chinese citizens. Because what if the American invents a better mousetrap and tells the Chinese person? We can't let the communists win the rodentiacide race!
Well, considering the communist party doesn't value human life or dignity, keeping them crippled technologically is a noble goal. However, it can't be done at the expense of free expression and collaboration.
So to be clear, I don't support cramping RISC-V development.
https://www.semianalysis.com/p/sifive-powers-google-tpu-nasa...