data
Posted 3 hours agoMachine Learning Engineer II
at May Mobility
United StatesRemote
Responsibilities
- Architect and operate data and training pipelines across cloud and cluster environments.
- Build and maintain distributed training and orchestration tooling.
- Design and maintain the data and metadata stores that back our training and evaluation workflows Skills and Abilities
- Architect data and model parallelism training infrastructure for large data (>100TB) or large model (>100GB) applications
- Building and maintaining CI/CD pipelines and infrastructure-as-code (e.g. Terraform).
Requirements
- Based in Ann Arbor, Michigan, May develops and deploys autonomous vehicles (AVs) powered by our innovative Multi-Policy Decision Making (MPDM) technology that literally reimagines the way AVs think.
- experience fielding robotic systems in the wild, May Mobility is looking to expand its team of robotics engineers with a background in robotics or autonomous vehicles.
- We are seeking ML-Oriented Software Engineers with
- experience in robotics applications.
- As part of our Autonomous Driving ML team, you will use your knowledge of Software and ML concepts to design and operate pipelines that allow May’s Autonomous Driving stack to improve quickly and reliably at scale. Essential Responsibilities
- Architecting and operating containerized/pipelined ML Training workloads, including GPU scheduling/autoscaling, dataloader design and experiment tracking.
- Working with relational and object stores, and high-throughput data formats for ML workloads. Qualifications and Experience
- Bachelor’s or Master’s degree in Robotics, Computer Science or a related field with strong mathematical and engineering foundations.
- A minimum of 2 years building ML-oriented infrastructure, platforms, or distributed systems in production.
- Proficiency in C++, Python and PyTorch with experience in Linux environments.
- Familiarity with basic concepts in Machine Learning (training loops, basic operators and architectures) Desirable
- Proficiency in Go or Rust.
- Familiarity with ML orchestration and experiment tooling such as Ray, Kubeflow, Airflow, MLflow, or Weights & Biases.
- Familiarity with distributed training frameworks (PyTorch DDP/FSDP, DeepSpeed).