infrastructure
Posted May 12Senior Site Reliability Engineer, Observability
at Webflow
ArgentinaRemote
Responsibilities
- Own and evolve Webflow's observability stack, including OpenTelemetry, and Datadog, to provide reliable, actionable metrics, traces, and logs across our services.
- Build and maintain AI-powered agents and automation that help engineers surface insights faster, reduce alert fatigue, and accelerate incident resolution.
- Guide and empower engineers on other teams to instrument their services effectively and introduce new features into production with confidence.
- Build lasting customer trust. We build trust by taking action that puts customer trust first.
Requirements
- At Webflow, we’re building the world’s leading AI-native Digital
- BS / BA college degree or relevant experience
- Business-level fluency to read, write and speak in English
- Regularly dive into the main Webflow application in TypeScript, Node, or Go to better debug (and sometimes fix) behavior in production.
- experience with observability platforms and tooling such as Datadog, Grafana, Prometheus, ElasticSearch or similar, and a strong opinion on what good observability looks like. Have
- experience with OpenTelemetry or similar instrumentation frameworks for collecting metrics, traces, profiles and logs across distributed services. •
- experience navigating and scaling multi-tier cloud environments on either AWS or GCP. Have
- experience with container-centric architectures built with tools like Docker and Kubernetes (EKS, GKE, AKS, etc.), or ECS. Have
- experience with infrastructure-as-code tools like Terraform,or Pulumi. Have
- experience contributing to full-stack applications built using software like React, Node.js, and MongoDB or PostgreSQL.
- Stay curious and open to growth — demonstrating a proactive embrace of AI, and actively building and applying fluency in emerging technologies to elevate how we work, drive faster outcomes, and expand collective impact.
- Experience building or operating AI agents that interact with observability data (e.g., automated root cause analysis, intelligent alerting, or natural-language querying of telemetry). •
- Experience with OpenTelemetry, Kubernetes and Pulumi specifically. •
- Experience improving on-call and incident response processes for Engineering. In addition to the
- To join Webflow, you'll need a valid right to work authorization depending on the country of employment.
- If you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws.
Experience
- Have either a background as a software engineer with an enthusiasm for observability, infrastructure and reliability or background as an infra or production engineer with an enthusiasm for code, or Have 5+ years of
Benefits
- It would be a bonus if you had even one of the following: •
- Ownership in what you help build. Every permanent Webflower receives equity (RSUs) in our growing, privately held company.
- Health coverage that actually covers you. Comprehensive medical, dental, and vision plans for full-time employees and their dependents, with Webflow covering most premiums.
- Support for every stage of family life . 12 weeks of paid parental leave for all parents and 6+ weeks of additional paid leave for birthing parents.
- Plus inclusive care for family planning, menopause, and midlife transitions.
- Time off that’s actually off. Flexible vacation, paid holidays, and a sabbatical program to help you recharge and come back inspired.
- Invest in your future. A 401(k) with 100% employer match (up to $6,000/year) in the U.S., and support for retirement savings globally.
- Monthly stipends that flex with your life. Localized support for work and wellness expenses — from Wi-Fi to workouts.
- Bonus for building together. All full-time, permanent, non-commission employees are eligible for our annual WIN bonus program.
- Temporary employees may be eligible for paid holiday and time off, statutory leaves of absence, and company-sponsored medical
Additional details
- Experience Platform, and we’re doing it as a remote-first company built on trust, transparency, and a whole lot of creativity.
- This work takes grit, because we move fast, without ever sacrificing craft or quality.
- Our mission is to bring development superpowers to everyone.
- From entrepreneurs launching their first idea to global enterprises scaling their digital presence, we empower teams to design, launch, and optimize for the web without barriers.
- We believe the future of the web, and work, is more open, more creative, and more equitable.
- Our product is used by over 2 million users world-wide across 190 countries, and you’ll help ensure our platform is secure and scalable for these users as tens of thousands of projects are launched on Webflow each month. About the role:
- Location: Remote-first (Argentina) Full-time/Permanent
- Application deadline: applications accepted on an ongoing basis until position is closed and filled
- Reporting to the Engineering Manager of Observability Requirements:
- Join our newly formed Observability team responsible for ensuring engineers across Webflow have the tools, data, and practices they need to understand the health and performance of the Webflow application and our hosting services .