engineering
Posted YesterdaySenior Product Engineer, Scalability
at Railway
Remote
Responsibilities
- - Build payment flows that are correct under concurrency and partial failure: idempotent charges, retries, reconciliation, and clean handling of provider edge cases (Stripe and beyond).
- - Develop fraud and abuse detection — signal collection, real-time scoring, automated mitigation — that protects platform margin without getting in legitimate users' way.
- - Build TypeScript + GraphQL APIs where correctness and auditability are non-negotiable.
- - Build internal tooling that gives teams across Railway a trustworthy, real-time view into the systems they depend on.
Requirements
- You'll own your work end-to-end, including when a feature reaches the UI. For this role,
- - Scale the systems everything else depends on: Postgres under heavy write load, Node.js services under pressure, and long-running workflows orchestrated with Temporal where exactly-once semantics and durability actually matter.
- - Contribute to our open-source repositories (CLI, Typescript SDK, Railpack, etc.) — Rust experience, or the desire to learn it, helps here.
- ABOUT YOU - An ability to autonomously lead, design, and implement backend systems where correctness, consistency, and auditability are first-class requirements.
- - Deep expertise in Postgres and relational data modeling — you reach for the right consistency guarantees, understand the cost of getting them wrong, and know how Postgres itself behaves at scale.
- - Strong working knowledge of Node.js internals — the event loop, memory behavior, and what to do when a service degrades under load. -
- Experience managing complex asynchronous and long-running backend jobs, ideally with a workflow engine like Temporal, for things like billing runs or payment reconciliation. - Familiarity with the realities of money movement: payment providers, idempotency, retries, reconciliation, and their failure modes.
- stream processing, and how you avoid losing data when streaming - Correctness under concurrency and partial failure: idempotency, retries, reconciliation, what happens when a step fails halfway through - How you handle cardinality, and which tools you lean on and why - The scalability of the things you depend on — what happens when Postgres becomes the bottleneck - Interview Structure to expect (60 Minutes): - Prework (submitted before your interview): Your design - 0–5 minutes: Introductions - 5–35 ask us