infrastructure
Posted 2 hours agoStaff SRE, Ads
at Redditinc
United KingdomRemote
Responsibilities
- Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
- Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
- Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
- Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
- Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
- Mentor engineers and provide technical leadership across multiple teams.
- Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments. What We’re Looking For: 8+ years of
Requirements
- With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information.
- experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems. Strong
- experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures. •
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar. •
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams. Nice to Have: •
- Experience supporting advertising technology platforms or other large-scale revenue-critical systems.
- Deep understanding of reliability challenges associated with ad-serving, real-time auctions, budget pacing, campaign delivery, measurement, attribution, or billing systems. •
- Experience establishing reliability programs that deliver meaningful, measurable business outcomes •
- Experience with Kubernetes, cloud infrastructure, and large-scale distributed systems.
- Familiarity with Kafka, ClickHouse, Spark, Flink, BigQuery, or similar large-scale data platforms. •
- Experience partnering with Product, Data Science, and Ads Engineering organizations. •
- Experience supporting machine learning inference or recommendation systems at scale. Benefits:
- In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI).
Benefits
- Gender-Affirming Care
- Group Personal Pension Scheme with Employer match
- Private Medical and Dental Scheme
- Income Replacement Programs Bike to Work scheme
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
Contact
- For more information, visit www.redditinc.com .
Additional details
- It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.
- Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK, the Netherlands or Ireland.
- The Ads organization powers Reddit's advertising platform, enabling advertisers to reach highly engaged communities while helping Reddit grow its business.
- The reliability of our Ads systems directly impacts advertiser success, revenue generation, and user experience.
- The Ads Reliability team partners closely with Ads Engineering teams to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem.
- Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
- Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
- Experience operating high-QPS, low-latency services where latency directly impacts business outcomes. •
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.