Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1.6k languages is for how many we were able to find more or less reliable evaluation data (mostly thanks to Bible translators and all those who contributed to BOUQuET). Out of the remaining several thousand languages, we expect the OMT models to support understanding (but not generation) for a significant proportion, due to cross-lingual generalisation between similar languages. So it’s not truly “omni” in the sense of supporting every single language on Earth, but it’s our best effort to do so, and probably the most “omni” models existing today.


Is there interest in benchmarking the proprietary LLMs for translation? Curious as I often use Gemini 3 Flash, but I have no idea how good it is for my language family. I prefer open models (in fact the smaller the better for offline), but it'd be useful to know how well the Big Three do.


We did some benchmarking of them internally, but not sure if we'll publish the detailed results. Just in case, keep an eye on https://huggingface.co/spaces/facebook/bouquet: if we release the evaluation results, they will be there.


Thanks! Super interested in LLMs for translation :D glad to see you folks doing this work.


So, hyperchilio-lingual would be more accurate, and myriad-lingual would be even behind all documented existing human language. But I guess marketing team is not that found of precision in philological considerations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: