[Remote] Site Reliability Engineer (Hosted Infra) - Platform

Remote, USA Full-time Posted 2026-06-16

Note: The job is a remote job and is open to candidates in USA. Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. They are seeking a Site Reliability Engineer to integrate, scale, and evolve multi-cloud infrastructure while optimizing reliability and lifecycle of hosts across multiple cloud providers.

Responsibilities

Engineering software to automate large-scale systems — building internal tools and services, not just running scripts
Optimizing the reliability and lifecycle of hosts across multiple cloud providers
Strengthening our observability posture — crafting alerting and monitoring systems that drive incident prevention over incident response
Scaling global infrastructure and evolving the infrastructure management processes to meet growing demand
Contributing to code reviews, sharing your work, planning what we need to do next, and both mentoring and being mentored by teammates
Being part of a balanced SRE on-call rotation: responding to incidents, improving runbooks, participating in postmortems, and championing reliability improvements

Skills

Experience building software with Golang. You are also comfortable reviewing others' code and offering constructive feedback
Production experience operating large-scale cloud compute (hundreds of hosts or more) via automated workflows
Deep experience with Linux systems — you are at home in the terminal debugging at the OS level
Proficiency working with containerized workloads in production
A customer-first, systems-thinking approach to operational problems — you care about root causes, not just symptoms
Comfortable working across time zones in both real-time and asynchronous contexts
You contribute clear and maintainable documentation such as software designs, runbooks, architecture diagrams/decisions, postmortems, etc
You communicate project status regularly and clearly, flag blockers early, and follow through on action items
A sensible approach to AI integration — identifying where AI tools genuinely reduce operational burden and embedding them into workflows without adding complexity
Production experience with any of: Terraform, Puppet, Ansible, Argo CD, Argo Workflows, CUE, Docker, Kubernetes, Ubuntu, or Ubuntu Live Patch
Experience being on-call during incidents and using observability tools (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues, quantify impact, and confirm mitigations
Hands-on experience engineering solutions with the Elastic Stack

Benefits

Elastic's stock program
Company-matched 401k with dollar-for-dollar matching up to 6% of eligible earnings
Competitive pay based on the work you do here and not your previous salary
Health coverage for you and your family in many locations
Ability to craft your calendar with flexible locations and schedules for many roles
Generous number of vacation days each year
We match up to $2000 (or local currency equivalent) for financial donations and service
Up to 40 hours each year to use toward volunteer projects you love
Minimum of 16 weeks of parental leave

Company Overview

Elastic builds software to make data usable in real time and at scale for search, logging, security, and analytics use cases. It was founded in 2012, and is headquartered in Mountain View, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.elastic.co.

Company H1B Sponsorship

Elastic has a track record of offering H1B sponsorships, with 1 in 2024, 2 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply Now

[Remote] Site Reliability Engineer (Hosted Infra) - Platform

Similar Jobs

[Remote] Associate Project Manager, Bailiwick (Req #1270)

[Remote] Senior IT Operations Analyst

[Remote] Growth Product Manager

[Remote] Accounts Payable Specialist

[Remote] Healthcare Billing Compliance Analyst - AI Trainer

[Remote] Data Analyst

[Remote] Vulnerability Assessment Engineer - AI Trainer

[Remote] SOC Engineer - AI Trainer

[Remote] Remote Banking Recruiter

[Remote] Senior Lifecycle Marketing Manager | Bankrate

Product Sales Executive - Hematology/UA

Experienced Full Stack Data Entry Specialist – Remote Opportunity with arenaflex

Electrical Designer (Construction)

[Remote] Mechanical Field Service Engineer Motors/Generators

Concierge Primary Care Physician

Entry Level Phlebotomist/Medical Customer Service Representative – On-the-Job Training Opportunity at arenaflex

Pediatric Occupational Therapy Fellow

Data Analyst Business Transformation - LH

[Remote] Business Development Manager, Navy/USMC

Remote Customer Support Representative – Streaming Services & Digital Entertainment