Awesome work! Here's a recent paper released yesterday, also focused on efficien...

abcdabcd987 · on Nov 9, 2023

Thank you! We are also very excited about combining the fast fine-tuning and efficient serving. In fact, what you just said is very related to one of our very first motivations. In my previous blog post [1], I call this scheme "Just-in-time Fine-tuning". Our previous measurement is that, for a medium-sized webpage (~10K tokens), it takes around 30 seconds to 2 minutes to finetune a LoRA model. Another good side of this JIT fine-tuning scheme is that, we can turn any model into a long-context model.

We'll keep doing more research on finetuning. And hopefully, we'll see the results soon.

[1] https://le.qun.ch/en/blog/2023/09/11/multi-lora-potentials/

3abiton · on Nov 9, 2023

It's all very interesting ideas, like Captain Planet becoming a super LLM