Machine Learning Researcher, Multimodal LLMs

San Francisco, United StatesOn-site

Requirements

THE ROLE We are looking for someone to contribute to the development of our next-generation multimodal LLM stack, combining speech, text, tools, and real-time reasoning into a single unified system.
You’ll be responsible for building industry-leading conversational AI models that power Bland's agent, and taking them all the way from idea to production.
Experience with LLMs, multimodal models, or speech-language systems - Deep understanding of prompting, fine-tuning, and alignment techniques - Familiarity with neural audio codecs and modern multimodal LLM techniques Fast Experimental Loop - You can go from idea → dataset → experiment → conclusion in days - You know how to design experiments that actually answer the question Product Intuition - Strong sense for what makes an interaction feel natural vs robotic - Ability to translate abstract modeling ideas
Experience with real-time voice systems or conversational AI - Background in tool-using agents or agent frameworks -

Experience with multimodal datasets (audio + text + actions) - Contributions to LLM or speech-related research or open source COMPENSATION &
BENEFITS - Competitive salary: $180,000 – $260,000 - Meaningful equity - Full healthcare, dental, vision - Office in Jackson Square, SF - High autonomy, high impact

Voice is quickly becoming the primary interface between businesses and their customers, and we are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.
At Bland, we're not just thinking about text modeling.
You will define how our agents listen, think, and act in real time, integrating streaming audio, tool execution, and dynamic context into a single coherent system.
You will take ideas from research through production systems serving millions of calls per day.