Staff Database Reliability Engineer ( PostgreSQL + Cloud)- Remote

at Rackspace

IndiaRemote

Python AWS Kubernetes PostgreSQL MySQL Terraform GCP Azure ansible cloudformation

Requirements

Role Summary We are seeking an experienced SRE/DBRE to ensure reliability, performance, scalability, and operational excellence of our multi-cloud DBaaS platform across: Microsoft Azure Amazon Web Services Google Cloud Platform This role combines deep database expertise with SRE principles to build highly available, automated, and resilient database platforms.
experience in PostgreSQL Should be ok to work 24x7 set up Work from Home / Remote Work – Strong multi-cloud
experience (Azure / AWS / GCP – at least two) Deep HA/DR & performance tuning expertise Automation-first mindset (Terraform, scripting, CI/CD)
Experience in SaaS/DBaaS environments preferred For a Site Reliability Engineer (SRE) in a DBaaS (Database-as-a-Service) support role, the following mandatory skills are typically required: 1.
Database Administration (DBA) Skills Primary Database: PostgreSQL Secondary Database: MySQL, SQLServer Database Backup & Recovery: Tools and strategies for database backups and disaster recovery.
Cloud Infrastructure Knowledge (DBaaS) Cloud Platforms: AWS (RDS, Aurora), Azure (Cosmos DB, SQL Database), GCP (Cloud SQL, Firestore).
Infrastructure as Code (IaC): Terraform, CloudFormation, Kubernetes.
Kubernetes & Containers: Running databases in containers (like Kubernetes).
Observability Tools: ELK stack (Elasticsearch, Logstash, Kibana) Database Migration: Migrating databases across different platforms or cloud environments.
Scripting and Automation Scripting Languages: Python, Shell scripting, Bash, PowerShell.
Networking and Infrastructure Networking Basics: TCP/IP, DNS, Firewall, Load Balancers.
Expertise in Linux OS ( RHEL, UBunto, Centos) Understanding of file systems (ext4, XFS, etc.), permissions, and ownership (chmod, chown, ACLs).
Knowledge of process monitoring, management, and troubleshooting (ps, top, htop, kill, pkill, etc.).
Proficiency with tools like top, htop, vmstat, iostat, sar, and dstat to monitor CPU, memory, disk I/O, and network usage.
Ability to analyze system logs (/var/log/, journalctl, dmesg) for troubleshooting.
Understanding of resource limits (CPU, memory, disk, network) and how they impact database performance.
Knowledge of partitioning tools (fdisk, parted) and file system management (mkfs, mount, umount).
Understanding of RAID configurations and Logical Volume Management (LVM) for storage scalability.

Staff Database Reliability Engineer ( PostgreSQL + Cloud)- Remote

Requirements

Browse by category

Browse by skills

Benefits

Additional details

Browse by role

Browse by location