infrastructure
Posted Apr 8Research Engineer, Data Infrastructure
at Mistral AI
Paris, FranceHybrid
Responsibilities
- you will: Build & Scale: Help us reach our goal of operating massive distributed compute and storage systems Global Orchestration: Architect and maintain multi-cluster orchestration layers to optimize workload placement across diverse hardware and regions.
- Design Future-Proof Storage: Architect our transition to modern storage formats to handle fine-tuning datasets at a scale that anticipates exabyte growth.
Requirements
- You will help us move toward a future of decoupled control and data planes, scaling big data compute and storage platforms while ensuring secure and governed data access for MLOps and research.
- Platform Engineering: Contribute to the development of our internal training platform, ensuring seamless model training and fine-tuning capabilities across Kubernetes and SLURM based environments.
- experience in Data Infrastructure, MLOps, or Infrastructure Engineering. Have
- Are proficient in Python and enjoy solving the "brittle data lake" problem with modern, columnar storage standards.
- Are well-versed in Kubernetes-native tooling and excited to debug large-scale distributed systems across multi-cluster environments.
- Are comfortable with ambiguity and the challenges of building high-scale infrastructure in a rapid-growth AI environment.
Experience
- You might thrive in this role if you: Have 4+ years of
Additional details
- You will be a core contributor to our evolution, helping us design and scale massive compute fleets and storage systems designed for high performance and scalability.
- You will take full lifecycle ownership: from architecting the migration away from legacy orchestrators to implementing production-grade pipelines and participating in on-call rotations for critical training jobs. In this role,
- Metadata & Lineage: Implement and manage systems to provide clear visibility and lineage as our data and model pipelines grow in complexity.