jobloom

JobLoom finds jobs directly from company career sites before many job boards, then routes you into detailed role pages like this one.

infrastructure

Posted 5 days ago

Lead Site Reliability Engineer

at Mattermost

United StatesHybrid

Responsibilities

  • Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals.
  • Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
  • Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale.
  • Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements.
  • Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations.
  • Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets.
  • Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production.
  • Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence. Requirements:

Requirements

  • Teams operate across web, desktop, and mobile, with embedded interoperability for Microsoft Teams, Outlook, and Microsoft 365.
  • BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant
  • experience in site reliability engineering, DevOps, or cloud infrastructure roles.
  • Proven expertise in container orchestration platforms, ideally Kubernetes. Extensive
  • experience with infrastructure-as-code, ideally Terraform.
  • Strong background in cloud platforms, ideally AWS. Demonstrated
  • experience designing and implementing monitoring, alerting, and performance optimization strategies.
  • Proficiency in at least one scripting or programming language for automation.
  • Experience leading globally distributed teams in a remote-first environment. Preferences:
  • Familiarity with observability stacks such as Grafana and Prometheus. •
  • Experience designing high-availability, disaster recovery, and scaling architectures.
  • Exposure to GCP and Azure cloud environments. Leadership
  • experience in highly regulated industries such as defense, finance, or critical infrastructure. •
  • Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards. •
  • Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support.

Benefits

  • Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect). Compensation
  • Salary range: $145,000 – $200,000
  • Mattermost takes a market-based approach to pay. Compensation is determined based on skills, experience, qualifications, and work location. Ranges may be updated as market conditions evolve. .

Contact

  • To learn more, visit www.mattermost.com

Additional details

  • Mattermost is the leading collaborative workflow platform for defense, intelligence, security, and critical infrastructure. Trusted by the U.S.
  • Department of War and Fortune 500s, our platform runs on-premises and in private clouds, delivering secure messaging, file sharing, workflow automation, audio/screenshare, and project management—all with full data and operational control.
  • Mattermost powers high-stakes workflows across mission planning, real-time, real-world operations, DevSecOps, incident response, and cyber defense—enabling secure collaboration from tactical edge and DDIL environments to enterprise HQ.
  • Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform.
  • In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments.
  • You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors
  • experience exceptional reliability and performance.
  • Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
  • Exceptional troubleshooting and incident management skills for distributed systems.
  • Excellent communication skills with a track record of influencing cross-functional teams. •

Find more real-time jobs on JobLoom.