research
Posted May 12Research Engineer, Voice
United StatesOn-site
Responsibilities
- Research, develop, and optimize neural models for voice and audio—including text-to-speech, automatic speech recognition, audio generation, and spoken dialogue systems.
- Build and maintain production-grade training and inference pipelines for voice models, with close attention to latency, naturalness, and scalability.
- Run experiments end-to-end: data curation, model architecture design, training, evaluation, and ablation studies.
- Collaborate with ML engineers, product teams, and infrastructure to integrate voice models into Pi’s real-time conversational stack.
- Develop robust evaluation frameworks combining perceptual metrics, automated benchmarks, and user-facing quality signals.
Requirements
- Inflection AI is a Public Benefit Corporation empowering people with human-centered, emotionally intelligent AI.
- We’re shaping the future of AI by combining emotional intelligence (EQ) and raw intelligence (IQ) to elevate people’s potential.
- Inflection AI created Pi, the world’s first emotionally intelligent AI, to help people work through decisions, emotions, and challenges.
- Pi is a personal AI agent powered by Inflection AI’s foundation model, proving that AI can be personal, empathetic, and contextually aware. About the Role
- You’ll collaborate closely with ML engineers, product teams, and infrastructure to turn cutting-edge ideas in areas like neural audio codecs, diffusion-based TTS, and multimodal foundation models into the natural, expressive voice experiences that millions of Pi users interact with every day. What You’ll Do
- Explore and apply advances in neural audio codecs, diffusion-based synthesis, streaming architectures, and multimodal foundation models to improve Pi’s voice experience.
- experience (including graduate work) in audio, speech, or multimodal ML.
- Strong proficiency in PyTorch and hands-on
- experience training and debugging large-scale neural models on GPU/accelerator clusters.
- Solid understanding of audio and speech fundamentals spectrograms, mel features, vocoders, codec-based representations, and signal processing.
- Demonstrated ability to take a research idea from prototype to production: equally comfortable reading papers and writing efficient, CUDA-aware training loops.