engineering
Posted 2 hours agoPrincipal Engineer - DBRE
at Arcesium
Hyderabad, IndiaOn-site
Responsibilities
- Drive architectural direction for the database platform across SQL Server, Aurora PostgreSQL, and Snowflake — covering high availability, disaster recovery, replication, backup and recovery, capacity, performance, and security.
- Own complex, cross-cutting initiatives such as cross-region disaster recovery, platform refresh orchestration, alerting redesign, and cost optimization, taking each from problem statement through to a deployed, owned solution.
- Lead by example with exemplary code, design documents, RFCs, and runbooks, setting the standard for technical writing, code quality, and operational rigor across the DBRE team.
- Reduce operational toil by engineering automation across provisioning, refresh, patching, scaling, failover, and decommissioning — treating manual operations as bugs to be eliminated.
- Lead alert engineering to drive sustainable reductions in alert volume while improving signal quality, partnering with application teams on alert ownership, attribution, and SLA design.
- Drive incident response and root-cause analysis for the most complex production incidents, and convert RCAs into platform-level improvements that prevent recurrence.
- Define reliability KPIs (availability, MTTR, alert sustainability, SLA adherence) and build the dashboards and reporting cadence to track them.
Requirements
- This is a hands-on individual contributor role that owns the architectural direction for our most complex database reliability challenges - high availability, disaster recovery, observability, and platform automation — across thousands of SQL Server, Aurora PostgreSQL, and Snowflake environments running mission-critical workloads for the world’s most sophisticated financial institutions. What you’ll do:
- A bachelor’s or master’s degree in computer science, Engineering, or a related field with 9+ years of professional engineering experience, including significant time in a principal-level or equivalent individual contributor role.
- Deep, hands-on expertise in at least one major relational database platform (SQL Server or PostgreSQL) including replication, HA/DR architectures, performance tuning, query optimization, and internals.
- Strong working knowledge of cloud infrastructure (AWS preferred): VPC networking, EC2, EBS, FSx, IAM, RDS/Aurora, and cross-region replication.
- Strong programming skills in at least one of Python, PowerShell, Go, or T-SQL — capable of writing production-quality automation, not just scripts.
- Experience leading complex incident response, root-cause analysis, and post-incident improvement programs in 24x7 environments. •
- Experience with observability platforms (Datadog, Prometheus, Grafana), modern alerting design, infrastructure-as-code (Terraform, CloudFormation), and CI/CD pipelines (GitLab CI, Jenkins).
- Exceptional verbal and written communication skills, with the ability to produce clear design documents and executive-level summaries and to influence stakeholders across engineering, infrastructure, and business teams. •
- Experience across multiple database platforms (SQL Server / PostgreSQL / Snowflake / Aurora) and familiarity with financial-services data domains is a bonus. Recruiting Security
Benefits
- Arcesium is an equal opportunity employer.
Contact
- Emails from genuine Arcesium recruiters who are employees of the company will always come from the @arcesium.com domain.
- If something seems off or you're contacted by an unexpected third party, please reach out to us at careers@arcesium.com (US/UK), careers-india@arcesium.com (India) or careers-europe@arcesium.com (Portugal/Sweden) .
Additional details
- Arcesium is a global financial technology firm that solves complex data-driven challenges faced by some of the world’s most sophisticated financial institutions.
- We constantly innovate our platform and capabilities to meet tomorrow’s challenges, anticipate the risks our clients encounter, and design advanced solutions to help our clients achieve transformational business outcomes.
- Financial technology is a high-growth industry as change and innovation continue to disrupt the status-quo and prompt major transformation.
- Arcesium is at a particularly interesting time in our own growth as we look to leverage our successfully established market position and expand operations in pursuit of strategic new business opportunities.
- We value intellectual curiosity, proactive ownership, and collaboration with colleagues, and we empower you to meaningfully contribute from day one and accelerate your professional development.
- Partner with application engineering, infrastructure, and SRE teams on schema design, query performance, data lifecycle, and shared reliability patterns, and engage senior leadership on strategy, multi-quarter roadmaps, and budget trade-offs. What you’ll need:
- A proven track record designing and delivering large-scale reliability initiatives (HA/DR, observability, automation platforms) with measurable outcomes. •
- In some cases, you may also be contacted by independent search firms engaged to recruit on our behalf; emails from their employees should always come from their firm's applicable domain.
- We'll never ask for your banking information or any payment as part of the recruiting process.