grayne's comments

grayne · 2026-02-17T17:20:21 1771348821

Thanks for the question.

audio-to-motion is fairly robust to noisy TTS and differing languages / accents. It doesn't use the raw audio as input, we first embed the audio using a pretrained wav2vec-style embedder, trained on millions of audio samples.

Saying this, we haven't properly evaluated in multiple languages, and we have heard from customers that lip-sync isn't always as good in non-english. For Cara 4 we're training on more diverse data, which will hopefully close this gap.

grayne · 2026-02-17T15:26:06 1771341966

Full technical blog here: https://anam.ai/blog/cara-3-interactive-avatars

72deluxe · 2026-02-17T17:43:25 1771350205

Very clever and quite frightening. Well done.

Personally I like using LLMs for getting information (not chat) or solving problems, and I like the fact it's text and I can read it quicker than a normal conversation, and don't need to look for facial cues when ingesting the information provided (am I autistic?), but I might be a minority...

Some people might really find this useful.

grayne · on April 20, 2021

Hi Ben, nice site. I'm no NLP expert but it might be worth taking a look at huggingface; they offer a simple API for all sorts of models. It looks like you'll be interested in their Zero-Shot Classification models in particular, e.g., https://huggingface.co/facebook/bart-large-mnli. Try dropping some text in the box and see how it goes. Good luck!

bwb · on April 20, 2021

Thank you, I will do that :)