1. I would love to support additional model runners including exLlama and API based models like chat GPT. I'm less familiar with how c transformers and GPTQ compare to llama.cpp. GPTQ used to run faster because it supported GPU acceleration, but now llama.cpp supports the GPU as well so that may have changed. Feel free to open a GitHub issue to discuss this: https://github.com/floneum/floneum/issues/new/choose
2. There are a few differences:
a) Floneum doesn't require any setup. No need to install python, cuda, or pop. Just download the executable and run
b) It has first class support for quantized local models
c) It supports fully issolated WASM plugins (not arbitrary python code)
2. On a first glance, this looks a bit like LangFlow. I guess this is different, but how?
3. Is this freemium, or fully running stand-alone as OSS?