Hacker Newsnew | past | comments | ask | show | jobs | submit | grayne's commentslogin

Thanks for the question.

audio-to-motion is fairly robust to noisy TTS and differing languages / accents. It doesn't use the raw audio as input, we first embed the audio using a pretrained wav2vec-style embedder, trained on millions of audio samples.

Saying this, we haven't properly evaluated in multiple languages, and we have heard from customers that lip-sync isn't always as good in non-english. For Cara 4 we're training on more diverse data, which will hopefully close this gap.



Very clever and quite frightening. Well done.

Personally I like using LLMs for getting information (not chat) or solving problems, and I like the fact it's text and I can read it quicker than a normal conversation, and don't need to look for facial cues when ingesting the information provided (am I autistic?), but I might be a minority...

Some people might really find this useful.


Hi Ben, nice site. I'm no NLP expert but it might be worth taking a look at huggingface; they offer a simple API for all sorts of models. It looks like you'll be interested in their Zero-Shot Classification models in particular, e.g., https://huggingface.co/facebook/bart-large-mnli. Try dropping some text in the box and see how it goes. Good luck!


Thank you, I will do that :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: