Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awesome work! Here's a recent paper released yesterday, also focused on efficiently serving many LoRAs simultaneously: https://arxiv.org/abs/2311.03285

Really looking forward to these innovations becoming more widespread -- I expect we're very close to a world where training a LoRA on a one-off task like "review every HN post from the last 3 years and flag any of them that contain informed speculation about the architecture of GPT-4" will be easy, cheap and routine.



Thank you! We are also very excited about combining the fast fine-tuning and efficient serving. In fact, what you just said is very related to one of our very first motivations. In my previous blog post [1], I call this scheme "Just-in-time Fine-tuning". Our previous measurement is that, for a medium-sized webpage (~10K tokens), it takes around 30 seconds to 2 minutes to finetune a LoRA model. Another good side of this JIT fine-tuning scheme is that, we can turn any model into a long-context model.

We'll keep doing more research on finetuning. And hopefully, we'll see the results soon.

[1] https://le.qun.ch/en/blog/2023/09/11/multi-lora-potentials/


It's all very interesting ideas, like Captain Planet becoming a super LLM




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: