MLX is slower than GGUFs on Macs. On my M1 Max macbook pro, the GGUF version bar... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jychang on April 21, 2025 \| parent \| context \| favorite \| on: Gemma 3 QAT Models: Bringing AI to Consumer GPUs MLX is slower than GGUFs on Macs. On my M1 Max macbook pro, the GGUF version bartowski/google_gemma-3-27b-it-qat-GGUF is 15.6gb and runs at 17tok/sec, whereas mlx-community/gemma-3-27b-it-qat-4bit is 16.8gb and runs at 15tok/sec. Note that both of these are the new QAT 4bit quants.

phaedrix on April 21, 2025 [–]

No, in general mlx versions are always faster, ice tested most of them.

85392_school on April 21, 2025 | [–]

What TPS difference are you getting?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact