infrastructure
Posted Oct 8, 2025Software Engineer, Internal Infrastructure (North America)
at Cohere
Toronto, CanadaRemote
Requirements
- We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents.
- We believe that our work is instrumental to the widespread adoption of AI.
- By joining our team, you will work in close collaboration with AI researchers to support their AI workload needs on the cutting edge, with a strong focus on stability, scalability, and observability.
- You will be responsible for building and operating Kubernetes GPU superclusters across multiple clouds.
- Your work will directly accelerate the development of industry-leading AI models that power Cohere's platform North.
- you will: - Build and operate Kubernetes compute superclusters across multiple clouds - Partner with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads - Work closely with research teams to understand their infrastructure needs and identify ways to improve stability, performance, and efficiency of novel model training techniques - Design and build resilient, scalable systems for training AI models, focusing on creating intuitive user interfaces that empower
- experience running Kubernetes clusters at scale and/or scaling and troubleshooting Cloud Native infrastructure, including Infrastructure as Code - Have strong programming skills in Go or Python - Prefer contributing to Open Source solutions rather than building solutions from the ground up - Are self-directed and adaptable, and excel at identifying and solving key problems - Draw motivation from building systems that help others be more productive - See mentorship, knowledge transfer, and review as
- qualifications: - You've previously worked with ML training infrastructure and GPU workloads and have familiarity with RDMA networking - You have expertise to support and troubleshoot low level Linux systems - You have
- experience collaborating with research teams or machine learning engineers If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all.
Benefits
- Full-Time Employees at Cohere enjoy these Perks: 🤝 An open and inclusive culture and work environment 🧑💻 Work closely with a team on the cutting edge of AI research 🍽 Weekly lunch stipend, in-office lunches & snacks 🦷 Full health and dental benefits, including a separate budget to take care of your mental health 🐣 100% Parental Leave top-up for up to 6 months 🎨 Personal enrichment