Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have an RTX 3060 with 12GB VRAM. For simpler questions like "how do I change the modified date of a file in Linux", I use Qwen 14B Q4_K_M. It fits entirely in VRAM. If 14B doesn't answer correctly, I switch to Qwen 32B Q3_K_S, which will be slower because it needs to use the RAM. I haven't tried yet the 30B-A3B which I hear is faster and closer to 32B. BTW, I run these models with llama.cpp.

For image generation, Flux and Qwen Image work with ComfyUI. I also use Nunchaku, which improves speed considerably.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: