data
Posted 2 hours agoStaff Machine Learning Engineer, ML Efficiency
at Redditinc
NetherlandsRemote
Responsibilities
- Design and build systems that improve the efficiency of ML training and inference workloads.
- Develop tooling that helps ML engineers debug, profile, optimize, and monitor model performance.
- Improve GPU and general resource utilization through scheduling, resource management, caching, and workload optimization.
- Build benchmarking frameworks and performance dashboards for training and serving systems.
- Optimize distributed training infrastructure, data pipelines, and model serving architectures.
- Lead cross-functional initiatives that improve the productivity of Reddit ML engineers.
- Drive technical strategy for ML platform scalability, reliability, and cost efficiency.
Requirements
- With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information.
- The ML Efficiency team builds the infrastructure, tooling, and optimization systems that enable machine learning engineers and researchers to train, evaluate, deploy, and operate models efficiently at scale.
- We focus on improving developer productivity, reducing infrastructure costs, increasing hardware utilization, and accelerating experimentation across the company’s ML ecosystem. Responsibilities
- Partner with ML researchers and product teams to identify bottlenecks and drive performance improvements.
- BS, MS, or PhD in Computer Science or a related field.
- Strong proficiency in Python
- Profiency in at least one systems language (Go, C++, Rust, or Java) preferred •
- Experience with machine learning infrastructure, training systems, or model serving platforms.
- Deep understanding of performance engineering and systems optimization.
- Experience with large-scale recommendation, ranking, generative AI, or foundation model systems. •
- Experience with distributed training frameworks such as PyTorch Distributed, Ray, Tensorflow, Spark
- Familiarity with GPU architectures and performance analysis tools. •
- Experience optimizing cloud infrastructure costs across large ML workloads.
- Contributions to internal platforms used by multiple ML teams. •
- Experience with building real time ML inference applications
- ML engineers can move from idea to experiment faster.
- Platform reliability improves as ML workloads scale.
- In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI).
Experience
- 5+ years of software engineering experience.
Benefits
- Gender-Affirming Care
- Private Pension plan with Employer-matching
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
Contact
- For more information, visit www.redditinc.com .
Additional details
- It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
- Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK or the Netherlands. About the Team
- Experience building distributed systems at scale. •
- Strong debugging and profiling skills. Preferred •
- Training and inference costs decrease, performance increases, while model quality is maintained or improved.
- Teams spend less time managing infrastructure and more time building models.
- Average recommendation model size increases. Benefits:
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.
- Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures.