Customer Site Reliability Engineer - OpenShift Managed Cloud Services (Kubernetes/AWS/Azure, Linux)

📍 India·🏢 Remote

You are nearing today's limit. Upgrade for unlimited access.

Responsibilities

Maintain customer trust and confidence by ensuring stability and functionality of services.
Drive continuous enhancement of processes, tools, and methodologies to support the evolving needs of the service.
Lead the development of code and automation scripts to optimize the scalability, reliability, and performance of services.
Lead and participate in high-priority customer escalations, adopting a customer-first mindset.
Coordinate and execute complex incident response procedures, ensuring timely resolution and thorough postmortems.
Collaborate with cross-functional teams to enhance system robustness.
Demonstrate a proactive mindset to help preempt escalations and ensure reliable operations.
Document resolutions, root causes, and best practices to enrich the knowledge base and promote self-service solutions.
Mentor and coach team members, fostering a culture of continuous learning, knowledge sharing and collaboration.
Collaborate on strategic AI and automation projects designed to increase the efficiency of fleet operations and troubleshooting, ultimately delivering a better product experience for customers.

Red Hat are looking for a Customer Site Reliability Engineer (CSRE) to join our OpenShift Managed Cloud Services (MCS) team.
experience in software, and systems engineering to automate operations, reduce toil, and drive continuous improvement across the service lifecycle.
You must demonstrate the ability to articulate complex technical solutions and lead critical incident calls with confidence, even in high-pressure environments." What you will bring Advanced
Experience with OpenShift/Kubernetes container platform support or administration.
Proficient with container-based technologies on Linux.
Proficient in managing Linux-based systems in a public cloud such as AWS, Azure, or GCP. Advanced
experience with enterprise systems monitoring; knowledge of Prometheus is preferred. Advanced with enterprise configuration management such as Ansible, Terraform. Software engineering
experience using object-oriented languages; golang is preferred. Superior communications skills and