engineering
Added 6 days agoStaff Database Reliability Engineer ( PostgreSQL + Cloud)- Remote
at Rackspace
IndiaRemote
Requirements
- Role Summary We are seeking an experienced SRE/DBRE to ensure reliability, performance, scalability, and operational excellence of our multi-cloud DBaaS platform across: Microsoft Azure Amazon Web Services Google Cloud Platform This role combines deep database expertise with SRE principles to build highly available, automated, and resilient database platforms.
- experience in PostgreSQL Should be ok to work 24x7 set up Work from Home / Remote Work – Strong multi-cloud
- experience (Azure / AWS / GCP – at least two) Deep HA/DR & performance tuning expertise Automation-first mindset (Terraform, scripting, CI/CD)
- Experience in SaaS/DBaaS environments preferred For a Site Reliability Engineer (SRE) in a DBaaS (Database-as-a-Service) support role, the following mandatory skills are typically required: 1.
- Database Administration (DBA) Skills Primary Database: PostgreSQL Secondary Database: MySQL, SQLServer Database Backup & Recovery: Tools and strategies for database backups and disaster recovery.
- Cloud Infrastructure Knowledge (DBaaS) Cloud Platforms: AWS (RDS, Aurora), Azure (Cosmos DB, SQL Database), GCP (Cloud SQL, Firestore).
- Infrastructure as Code (IaC): Terraform, CloudFormation, Kubernetes.
- Kubernetes & Containers: Running databases in containers (like Kubernetes).
- Observability Tools: ELK stack (Elasticsearch, Logstash, Kibana) Database Migration: Migrating databases across different platforms or cloud environments.
- Scripting and Automation Scripting Languages: Python, Shell scripting, Bash, PowerShell.
- Networking and Infrastructure Networking Basics: TCP/IP, DNS, Firewall, Load Balancers.
- Expertise in Linux OS ( RHEL, UBunto, Centos) Understanding of file systems (ext4, XFS, etc.), permissions, and ownership (chmod, chown, ACLs).
- Knowledge of process monitoring, management, and troubleshooting (ps, top, htop, kill, pkill, etc.).
- Proficiency with tools like top, htop, vmstat, iostat, sar, and dstat to monitor CPU, memory, disk I/O, and network usage.
- Ability to analyze system logs (/var/log/, journalctl, dmesg) for troubleshooting.
- Understanding of resource limits (CPU, memory, disk, network) and how they impact database performance.
- Knowledge of partitioning tools (fdisk, parted) and file system management (mkfs, mount, umount).
- Understanding of RAID configurations and Logical Volume Management (LVM) for storage scalability.