Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MLX is slower than GGUFs on Macs.

On my M1 Max macbook pro, the GGUF version bartowski/google_gemma-3-27b-it-qat-GGUF is 15.6gb and runs at 17tok/sec, whereas mlx-community/gemma-3-27b-it-qat-4bit is 16.8gb and runs at 15tok/sec. Note that both of these are the new QAT 4bit quants.



No, in general mlx versions are always faster, ice tested most of them.


What TPS difference are you getting?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: