other
Posted Nov 7, 2025Member of Technical Staff, Model Efficiency
at Cohere
New York, United StatesRemote
Requirements
- We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents.
- We believe that our work is instrumental to the widespread adoption of AI.
- Join us on our mission and shape the future! Why this role? Our team is a fast-growing group of researchers and engineers focused on building reliable ML systems and pushing the boundaries of LLM inference efficiency.
- As the team evolves, you’ll have opportunities to build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large-scale architectures.
- experience writing high-performance, production-quality code - Strong programming skills in C++ or Python (Rust/Go also welcome) -
- Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.) - Ability to diagnose and resolve performance bottlenecks across the model execution stack - A strong bias for action — you ship fast, measure impact, and iterate It’s a big plus if you have
- experience with: - GPU programming, CUDA, or low-level systems optimization - Language modeling with transformers (MoE, speculative decoding, KV-cache optimizations) - Scaling performance-critical distributed systems (e.g., computation, search, storage) If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all.
Experience
- You may be a good fit for the Model Efficiency team if you have: - 5+ years of
Benefits
- Full-Time Employees at Cohere enjoy these Perks: 🤝 An open and inclusive culture and work environment 🧑💻 Work closely with a team on the cutting edge of AI research 🍽 Weekly lunch stipend, in-office lunches & snacks 🦷 Full health and dental benefits, including a separate budget to take care of your mental health 🐣 100% Parental Leave top-up for up to 6 months 🎨 Personal enrichment
- benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement 🏙 Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend ✈️ 6 weeks of vacation (30 working days!)