I haven’t tried any Qwen yet, but so far I’m sticking with gpt-oss-20B.
In terms of what I’m using, I’ve looked at anything that will fit on a MacBook Pro with 32Gb RAM (so with shared memory) - LFM2, Llama, Minstral, Ministral, Devstral, Phi, and Nemotron.
As for quantisation, I aim for the biggest that will fit while also not being too slow - so it all depends on the model. But I’ll skip a model if I can’t at least use a Q4_K_M.
Also, given that I also bump my context to at least 32K, because tooling sucks when the tooling definitions itself come close to 4096!
In terms of what I’m using, I’ve looked at anything that will fit on a MacBook Pro with 32Gb RAM (so with shared memory) - LFM2, Llama, Minstral, Ministral, Devstral, Phi, and Nemotron.
As for quantisation, I aim for the biggest that will fit while also not being too slow - so it all depends on the model. But I’ll skip a model if I can’t at least use a Q4_K_M.
Also, given that I also bump my context to at least 32K, because tooling sucks when the tooling definitions itself come close to 4096!
I can’t wait for RAM prices to come down!