infrastructure
Posted 12 hours agoStaff Software Engineer - Databases SRE | Sweden | Remote
at Grafana Labs
SwedenRemote
Responsibilities
- Own production reliability for high-SLA and complex customer environments
- Design and implement automation to scale our reliability practices
- Define and evolve per-tenant SLOs and reliability models
- Lead customer-impacting incident response and post-incident reviews
- Influence feature design to ensure production scalability and operability
- Build automation to eliminate toil where needed
- Improve alert quality and reduce noisy escalations
- Improve observability of customers within their environments
- Develop fault-tolerant design patterns ensuring that we are considering reliability at all stages of the service lifecycle.
- Teach others about Site Reliability Engineering and communicate best practices to be applied early in development of new features and functionality
Requirements
- With Grafana Cloud's actually useful AI, organizations can see, understand, and act on all their disparate data to move at the speed of their ambitions.
- Today, more than 35 million users and 7,000+ customers – including Anthropic, Bloomberg, NVIDIA, Microsoft, and Salesforce – trust Grafana Labs to ensure reliability of their applications and systems, resolve incidents quickly, and optimize their telemetry to reduce noise and cost.
- We provide these databases as a SaaS product from AWS, GCP, and Azure across all regions.
- experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.). Strong
- experience with technical leadership, leading a team through projects, mentoring other engineers on the team and serving as a force-multiplier •
- experience designing and implementing SLOs •
- Experience with one or more programming languages (e.g. Go, Python, Java, etc) •
- Experience with Linux operating systems internals, and some knowledge of networking, cloud storage, and scaling.
- Experience with calmly and actively participating in blame-free Incident Response, following up on actions, and writing high quality PIRs (Post Incident Reviews, a.k.a. post-mortem documents)
- Comfortable working within an engineering team where individuals are encouraged to have a strong sense of autonomy and self-direction.
- Participate in PR review and collaborating with other engineers on their Design Docs
- Grafana Labs may utilize AI tools in its recruitment process to assist in matching information provided in CVs to job postings.
Experience
- 8+ years engineering experience, 4+ in SRE/CRE/production engineering. Strong preference for those with formal customer reliability engineering experience. Strong Kubernetes
Benefits
- In Sweden, the Base compensation range for this role is SEK 878,578 - SEK 1,054,294. Actual compensation may vary based on level, experience, and skillset as assessed in the interview process.
- Benefits include equity, bonus (if applicable) and other benefits listed here . #LI-Remote #LI-Remote
- *Compensation ranges are country specific. If you are applying for this role from a different location than listed above, your recruiter will discuss your specific market’s defined pay range &
- Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.
Contact
- Learn more at grafana.com and follow us on LinkedIn and X .
Additional details
- Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture.
- Grafana Cloud, our fully managed observability platform, is flexible and built for scale.
- We are a 100% remote company with 1,600+ team members across 40+ countries, and we’re backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P.
- We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.
- Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
- You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity.
- This is a remote opportunity and we are looking for candidates from the UK, Sweden, Spain or Germany. About the role:
- Partner closely with product engineering squads (embedded model)
- Proactively reduce SLO burn to prevent repeat incidents
- Serving as a primary escalation point and on-call for relevant incidents