infrastructure
Posted Dec 5, 2025Site Reliability Engineer
at Latenthealth
San Francisco, United StatesOn-site
Requirements
- SRE Location: San Francisco, CA (5 Days In-Office) You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems.
- - Tool Proficiency: You are highly proficient with your tools—you speak command line fluently and have mastered keyboard shortcuts.
- - Kubernetes Mastery: Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.
- - CI/CD & Deployment Optimization: Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.
- experience with Kubernetes, Helm, and Terraform. - Scaling Systems: Proven ability to architect and maintain complex, distributed systems with high-availability requirements. - Deployment
- experience optimizing deployment pipelines for both application code (TypeScript) and machine learning models (Python/ML).
- Also PostgreSQL, Redis, Kakfa. - Core Team Member: Excitement about working five days per week in our San Francisco office.
Additional details
- Your focus on operational excellence is directly tied to a patient's access to life-saving treatment.
- WHAT WE LOOK FOR IN A GREAT ENGINEER You have the intensity and technical mastery to own mission-critical infrastructure.
- You hold yourself and others to high standards and thrive in a high-energy, in-office culture where everyone is in it to win it.
- - Ownership: You thrive on owning complex systems and have a proven track record of scaling mission-critical deployments.
- - Automation Drive: You love automating things, always finding new ways to increase your own leverage, and defining standards for operational excellence.
- - Problem Solver: You won't wait for someone else to solve a problem that you're in a position to solve; you are willing to jump into whatever needs to get done.
- WHAT YOU'LL WORK ON (RESPONSIBILITIES) As our SRE, you will own the entire production environment and improve the development
- experience: - Infrastructure Ownership: Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.
- QUALIFICATIONS & ENVIRONMENT - IaC & Orchestration: Deep, demonstrable