Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions
Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures
Requirements
With Grafana Cloud's actually useful AI, organizations can see, understand, and act on all their disparate data to move at the speed of their ambitions.
Today, more than 35 million users and 7,000+ customers – including Anthropic, Bloomberg, NVIDIA, Microsoft, and Salesforce – trust Grafana Labs to ensure reliability of their applications and systems, resolve incidents quickly, and optimize their telemetry to reduce noise and cost.
Our work includes maintaining the billing engine responsible for customer usage calculation, automating provisioning after a customer signs a contract, integrating with cloud marketplaces such as AWS, Azure, and GCP, and building and maintaining the user portal our customers rely on to manage their accounts.
Engineers at Grafana also have the opportunity to contribute to Open Source communities and collaborate across teams beyond their immediate scope.
A stack is the customer-facing Grafana Cloud environment that connects an organization to Grafana and the backend services it uses, including Mimir, Loki, Tempo, plugins, dashboards, data sources, and stack-level configuration.
At Grafana, we actively embrace AI-assisted and agentic development practices, integrating these technologies into both our engineering workflows and the systems we deliver.
We encourage our engineers to thoughtfully leverage AI tools to enhance every stage of the lifecycle, from design and implementation to testing, documentation, and operations.
Our team is small and operates with a high degree of independence; you will be expected to lead major projects, coordinate across service boundaries, and help define the technical direction for our domain.
experience working on a SaaS platform and are familiar with common distributed systems concepts (e.g., scalability, multi-tenancy, HA). Have professional
experience with Golang and be willing to work across both backend service and application code
experience contributing to the delivery of projects, from initial brainstorming to shipping a product to the customer.
Familiarity with Kubernetes in AWS, GCP, or Azure, and exposure to infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.). •
Experience with TypeScript/Node.js •
Experience with Kubernetes control-plane patterns, operators, reconcilers, or desired-state systems •
Experience with Jsonnet/Tanka, Terraform, Flux, Argo, or similar deployment/configuration tooling •
Experience working on SaaS provisioning, tenancy, regional expansion, plugin rollout, or customer lifecycle systems •
Experience with incident response involving configuration drift, partial failure, or cross-service state mismatch
Grafana Labs may utilize AI tools in its recruitment process to assist in matching information provided in CVs to job postings.
Experience
You have at least 1 year of fully remote work experience You have some
Benefits
Experience participating in blameless incident response and contributing to post-incident reviews. Bonus Points For: •
Compensation & Rewards:
In United Kingdom, the compensation range for this role is GBP 72K - GBP 90K.
Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process.
*Compensation ranges are country specific. If you are applying for this role from a different location than listed above, your recruiter will discuss your specific market’s defined pay range &
Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.
Equal Opportunity Employer: We will recruit, train, compensate and promote regardless of race, religion, color, national origin, gender, disability, age, veteran status, and all the other fascinating characteristics that make us different and unique.
Contact
Learn more at grafana.com and follow us on LinkedIn and X .
We utilize the grafana.com platform to engineer bespoke integrations and solutions that unify the diverse technical ecosystem of a modern software enterprise.
We build the control-plane services and workflows that keep stack state aligned across grafana.com, Stack State Service (SSS), Hosted Grafana, cloud regions, and the underlying Grafana Cloud infrastructure.
Additional details
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture.
Grafana Cloud, our fully managed observability platform, is flexible and built for scale.
We are a 100% remote company with 1,600+ team members across 40+ countries, and we’re backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P.
We’re scaling fast and staying true to what makes us different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.
Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
You may not meet every requirement, and that’s okay. If this role excites you, we’d love you to raise your hand for what could be a truly career-defining opportunity.
This role is available for candidates located in the UK, Germany, Spain, Ireland and Sweden. The Opportunity:
Application Core Services (AppCore) is a group within Platform, in the Foundations department.
Foundations produces the Internal Engineering Platform (IEP) and partners closely with our Cloud, Enterprise, and Grafana teams.
Our team develops the essential systems driving Grafana's business operations.