infrastructure
Posted Apr 14Site Reliability Engineer
at StarTree
IndiaRemote
Responsibilities
- Manage and tune multiple critical customer-facing Apache Pinot clusters
- Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
- Build a rapport with and work closely with customers to mitigate and resolve incidents
- Execute disaster recovery strategies with minimal downtime
- Collaborate with other engineers to understand and troubleshoot systems and use the
Requirements
- You will be working with a team of passionate and talented engineers in automation, tuning, and troubleshooting of Apache Pinot and SQL DBs.
- Experience managing highly available production facing distributed systems and in-depth knowledge of Java are a plus •
- Experience with cloud platforms such as AWS, GCP, or Azure •
- Experience with Kubernetes and container orchestration
- Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
- Knowledge of standard methodologies related to security, performance, and disaster recovery
- StarTree was founded by the core software engineering team and inventors of Apache Pinot , which currently powers hundreds of user-facing applications at companies across industries, including LinkedIn, Uber, Target, 7Eleven, Etsy, Walmart, WePay, Factual, Weibo, and more.
- StarTree Cloud has enabled even more companies to deploy and operate real-time analytics at scale, including Stripe, Sovrn, Roadie, Just Eat Takeaway.com, Dialpad, Guitar Center, Blinkit, and more.
Experience
- experience gained to influence the roadmap of other teams Requirements: 5+ years of
Additional details
- At StarTree we're a group of passionate individuals that desire to improve the lives of many by developing tools and technologies that support availability and speed in the world of real-time analytics.