AI Researcher (Multimodal Audio/Video Generation)

at Tavus

San Francisco, United StatesHybrid

PyTorch

Responsibilities

- Design models that are coupled with conversation flow — capturing and generating verbal + non-verbal signals in sync.
- Drive innovation in diffusion models, long-video generation, and audio-visual modeling.

Requirements

We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems.
AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms.
With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale.
The Role We’re hiring a Senior AI Researcher to lead research in audio-visual avatar generation.
This role is for someone who thrives in ambiguity, has a track record of pushing generative models to new frontiers, and wants to define what human–AI interaction looks like in practice.
- Translate research into production by partnering with Applied ML and engineering.
experience applying generative models at scale. - Expertise in diffusion models and awareness of the latest efficiency techniques. -
Experience in multimodal generation — spanning video, audio, and language. - Proven innovation in long-video generation and/or audio generation. - Excellent programming skills — fluent in PyTorch and GPU-optimized workflows. - Track record of publications in top-tier venues (CVPR, NeurIPS, BMVC, ICASSP, etc.). -
Experience leading research activities or mentoring teams.
Nice-to-Haves - Skills in 3D graphics, Gaussian splatting, or large-scale training setups. - Broad exposure to generative AI models beyond your specialty. - Familiarity with software development best practices.

Experience

You’ll Bring: - A PhD or equivalent research experience, plus 2–3+ years of hands-on

Contact

ABOUT US Tavus https://www.tavus.io/ is a research lab pioneering human computing.

Additional details

Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations.
A fleet of medical assistants that can give every patient the attention they need.
We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners.
Be part of shaping a future where humans and machines truly understand each other.
Your Mission 🚀 - Lead research efforts on audio-visual generation for avatars (Neural Avatars, Talking-Heads), with a focus on conversational settings.
- Mentor researchers, set research directions, and publish impactful work.
Location Preferred: San Francisco (hybrid) or London (office opening soon).
Remote within U.S. or Europe considered for exceptional candidates.

AI Researcher (Multimodal Audio/Video Generation)

Responsibilities

Requirements

Experience

Contact

Additional details

Browse by category

Browse by skills

Browse by role

Browse by location