infrastructure
Posted 2 weeks agoStaff Infrastructure Engineer
at Replit
Foster City, United StatesRemote
Responsibilities
- YOU WILL: - Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work.
- Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi.
- Create self-healing systems that can automatically respond to common failure scenarios. - Optimize Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimize our cloud deployments (Kubernetes, Docker, GCP).
- Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions. - Elevate Developer
- Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers.
- - Drive Cross-Company Improvements: Partner directly with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services.
- - Build Shared Tooling: Create and maintain centralized tooling and automation that improves the entire engineering lifecycle, from local development to production monitoring.
- - Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors. REQUIRED SKILLS AND
Requirements
- experience in Infrastructure Engineering or similar roles (DevOps, Systems Engineering, Site Reliability Engineering). - Strong programming skills in languages like Python or Go. - You write high-quality, well-tested code. - Deep understanding of distributed systems.
- Experience with container orchestration platforms (Kubernetes) and cloud-native technologies. - Proven track record of implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning. - Strong incident management skills with
- experience leading incident response and demonstrated critical thinking under pressure. -
- Experience with infrastructure as code (e.g., Terraform) and configuration management tools. - Excellent written and verbal communication skills, with an ability to explain technical concepts clearly and simply and a bias toward open, transparent cultural practices. - Strong interpersonal skills, with