infrastructure
Posted Apr 14Cloud Infrastructure Engineer
at Alchemy
San Francisco, GlobalRemote
You are nearing today's limit. Upgrade for unlimited access.
Responsibilities
- - Build AI-powered infrastructure tooling and automation (e.g., automated K8s upgrades, IaC plan analysis, cost optimization advisors, MCP servers, n8n workflows).
- - Build and maintain internal developer platform (IDP) capabilities for self-service deployments, observability, and reliability.
- - Develop observability frameworks using Prometheus and Grafana for metrics, dashboards, and alerting.
- - Lead incident management with blameless post-mortems; define and enforce SLIs, SLOs, and error budgets across services.
- - Design and manage multi-cloud, multi-region network architecture — VPC design, IPAM, DNS (Cloudflare), cross-cloud connectivity, security groups, and edge-proxy/istio gateway configuration.
- - Collaborate with security teams to embed compliance into infrastructure, including IaC scanning and runtime protection.
Requirements
- About the Role As an engineer in the Infrastructure department at Alchemy, you will design, deploy, and continuously improve the infrastructure powering our blockchain developer platform — serving 100+ chains, billions of daily requests, and over $150B in annual transactions.
- What You'll Do - Architect and operate scalable, self-healing infrastructure leveraging Kubernetes, Terraform, and cloud-native tools across multi-region deployments.
- - Drive AI enablement across engineering — ensuring repos, tooling, and workflows are optimized for agentic development with tools like Claude Code, Cursor, and Codex.
- Experience driving company-wide reliability efforts, including SLO frameworks and error budget policies. - Strong proficiency with observability stacks: OpenTelemetry, Prometheus/Grafana. - Deep
- experience with cloud infrastructure (AWS/GCP), Kubernetes, and multi-region architectures. - Skilled with Terraform, Helm, and GitOps workflows (e.g., ArgoCD) with an automation-first mindset. -
- Experience leveraging agentic development tools (Claude Code, Cursor, Codex) and workflow automation (n8n) to accelerate IaC and build internal tooling is a strong plus. - Solid networking fundamentals — VPC design, DNS, IPAM, security groups, cross-cloud connectivity, and service mesh (e.g., Istio)
- experience is a plus. - Calm and effective incident responder with a focus on systemic improvement. - Strong cross-functional communicator across SRE, security, and product engineering. - Blockchain infrastructure, distributed systems, or high-throughput RPC