Just since I'm curious, what exact models and quantization are you using? In my ...

alfiedotwtf · 2026-04-04T05:39:29 1775281169

I haven’t tried any Qwen yet, but so far I’m sticking with gpt-oss-20B.

In terms of what I’m using, I’ve looked at anything that will fit on a MacBook Pro with 32Gb RAM (so with shared memory) - LFM2, Llama, Minstral, Ministral, Devstral, Phi, and Nemotron.

As for quantisation, I aim for the biggest that will fit while also not being too slow - so it all depends on the model. But I’ll skip a model if I can’t at least use a Q4_K_M.

Also, given that I also bump my context to at least 32K, because tooling sucks when the tooling definitions itself come close to 4096!

I can’t wait for RAM prices to come down!