1. This looks great and I love how it supports llama.cpp, what about GPTQ, exLla...

Evan-Almloff · on July 12, 2023

1. I would love to support additional model runners including exLlama and API based models like chat GPT. I'm less familiar with how c transformers and GPTQ compare to llama.cpp. GPTQ used to run faster because it supported GPU acceleration, but now llama.cpp supports the GPU as well so that may have changed. Feel free to open a GitHub issue to discuss this: https://github.com/floneum/floneum/issues/new/choose

2. There are a few differences: a) Floneum doesn't require any setup. No need to install python, cuda, or pop. Just download the executable and run b) It has first class support for quantized local models c) It supports fully issolated WASM plugins (not arbitrary python code)

3. Floneum is fully Open Source!

underlines · on July 12, 2023

Thanks for your clarifications. I added it to my awesome list:

https://github.com/underlines/awesome-marketing-datascience/...

Tostino · on July 13, 2023

Exllama is significantly faster if you can fit the whole model in VRAM.