infrastructure
3 hours ago*
Customer Site Reliability Engineer - OpenShift Managed Cloud Services (Kubernetes/AWS/Azure, Linux)
at Red Hat
📍 India·🏢 Remote
You are nearing today's limit. Upgrade for unlimited access.
Responsibilities
- Maintain customer trust and confidence by ensuring stability and functionality of services.
- Drive continuous enhancement of processes, tools, and methodologies to support the evolving needs of the service.
- Lead the development of code and automation scripts to optimize the scalability, reliability, and performance of services.
- Lead and participate in high-priority customer escalations, adopting a customer-first mindset.
- Coordinate and execute complex incident response procedures, ensuring timely resolution and thorough postmortems.
- Collaborate with cross-functional teams to enhance system robustness.
- Demonstrate a proactive mindset to help preempt escalations and ensure reliable operations.
- Document resolutions, root causes, and best practices to enrich the knowledge base and promote self-service solutions.
- Mentor and coach team members, fostering a culture of continuous learning, knowledge sharing and collaboration.
- Collaborate on strategic AI and automation projects designed to increase the efficiency of fleet operations and troubleshooting, ultimately delivering a better product experience for customers.
Requirements
- Red Hat are looking for a Customer Site Reliability Engineer (CSRE) to join our OpenShift Managed Cloud Services (MCS) team.
- experience in software, and systems engineering to automate operations, reduce toil, and drive continuous improvement across the service lifecycle.
- You must demonstrate the ability to articulate complex technical solutions and lead critical incident calls with confidence, even in high-pressure environments." What you will bring Advanced
- Experience with OpenShift/Kubernetes container platform support or administration.
- Proficient with container-based technologies on Linux.
- Proficient in managing Linux-based systems in a public cloud such as AWS, Azure, or GCP. Advanced
- experience with enterprise systems monitoring; knowledge of Prometheus is preferred. Advanced with enterprise configuration management such as Ansible, Terraform. Software engineering
- experience using object-oriented languages; golang is preferred. Superior communications skills and