engineering
Added 3 weeks agoStaff Database Reliability Engineer ( MYSQL + Cloud)
at Rackspace
Hyderabad, IndiaHybrid
Requirements
- Role Summary We are seeking an experienced SRE/DBRE to ensure reliability, performance, scalability, and operational excellence of our multi-cloud DBaaS platform across: Microsoft Azure Amazon Web Services Google Cloud Platform This role combines deep database expertise with SRE principles to build highly available, automated, and resilient database platforms.
- The DBRE Lead will drive operational standards, automation frameworks, and reliability engineering practices across distributed cloud environments 🔹 What We’re Looking For 8-10+ years in DBA / Platform Engineering Should be ok to work 24x7 set up Work from Office – Hybrid – 3 days from office – Hyderabad Location Strong multi-cloud
- experience (Azure / AWS / GCP – at least two) Deep HA/DR & performance tuning expertise Automation-first mindset (Terraform, scripting, CI/CD)
- Experience in SaaS/DBaaS environments preferred For a Site Reliability Engineer (SRE) in a DBaaS (Database-as-a-Service) support role, the following mandatory skills are typically required: 1.
- Database Administration (DBA) Skills Primary Database: MySQL Secondary Database: PostgreSQL, SQLServer Database Backup & Recovery: Tools and strategies for database backups and disaster recovery.
- Cloud Infrastructure Knowledge (DBaaS) Cloud Platforms: AWS (RDS, Aurora), Azure (Cosmos DB, SQL Database), GCP (Cloud SQL, Firestore).
- Infrastructure as Code (IaC): Terraform, CloudFormation, Kubernetes.
- Kubernetes & Containers: Running databases in containers (like Kubernetes).
- Observability Tools: ELK stack (Elasticsearch, Logstash, Kibana) Database Migration: Migrating databases across different platforms or cloud environments.
- Scripting and Automation Scripting Languages: Python, Shell scripting, Bash, PowerShell.
- Networking and Infrastructure Networking Basics: TCP/IP, DNS, Firewall, Load Balancers.
- Expertise in Linux OS ( RHEL, UBunto, Centos) Understanding of file systems (ext4, XFS, etc.), permissions, and ownership (chmod, chown, ACLs).
- Knowledge of process monitoring, management, and troubleshooting (ps, top, htop, kill, pkill, etc.).
- Proficiency with tools like top, htop, vmstat, iostat, sar, and dstat to monitor CPU, memory, disk I/O, and network usage.
- Ability to analyze system logs (/var/log/, journalctl, dmesg) for troubleshooting.
- Understanding of resource limits (CPU, memory, disk, network) and how they impact database performance.
- Knowledge of partitioning tools (fdisk, parted) and file system management (mkfs, mount, umount).