The smallest transfer done from memory is a single cache line, which on most desktop machines is 64 bytes, or 512 bits. You could imagine a memory bus that was 512 bits wide and transferred a cache line per clock, and this would improve latency when compared to a serial bus with higher clock speed. HBM doesn't do that, though, instead every HBM3 module has 16 individual 64-bit channels, with 8n prefetch (that is, when you send a single request to a single channel, it will respond with 512 bits over 8 cycles).