Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW I'm getting 29 TPS on Ollama on my 7900 XTX with the 27b qat. You can't really compare inference engine to inference engine without keeping the hardware and model fixed.

Unfortunately Ollama and vLLM are therefore incomparable at the moment, because vLLM does not support these models yet.

https://github.com/vllm-project/vllm/issues/16856



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: