data
Posted Apr 21Machine Learning Researcher, Multimodal LLMs
at Bland AI
San Francisco, United StatesOn-site
Requirements
- THE ROLE We are looking for someone to contribute to the development of our next-generation multimodal LLM stack, combining speech, text, tools, and real-time reasoning into a single unified system.
- You’ll be responsible for building industry-leading conversational AI models that power Bland's agent, and taking them all the way from idea to production.
- Experience with LLMs, multimodal models, or speech-language systems - Deep understanding of prompting, fine-tuning, and alignment techniques - Familiarity with neural audio codecs and modern multimodal LLM techniques Fast Experimental Loop - You can go from idea → dataset → experiment → conclusion in days - You know how to design experiments that actually answer the question Product Intuition - Strong sense for what makes an interaction feel natural vs robotic - Ability to translate abstract modeling ideas
- Experience with real-time voice systems or conversational AI - Background in tool-using agents or agent frameworks -
Benefits
- Experience with multimodal datasets (audio + text + actions) - Contributions to LLM or speech-related research or open source COMPENSATION &
- BENEFITS - Competitive salary: $180,000 – $260,000 - Meaningful equity - Full healthcare, dental, vision - Office in Jackson Square, SF - High autonomy, high impact
Additional details
- Voice is quickly becoming the primary interface between businesses and their customers, and we are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.
- At Bland, we're not just thinking about text modeling.
- You will define how our agents listen, think, and act in real time, integrating streaming audio, tool execution, and dynamic context into a single coherent system.
- You will take ideas from research through production systems serving millions of calls per day.