[Remote] Senior Site Reliability Engineer (Auth0)
Note: The job is a remote job and is open to candidates in USA. Okta is a company focused on securing identities and enabling organizations to embrace AI. They are seeking a Senior Site Reliability Engineer to ensure their production systems are operational, resilient, and scalable, directly contributing to the platform's core reliability and robustness.
Responsibilities
- Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy
- Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services
- Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions
- Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues
- Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency
- Define, document, and champion reliability best practices across the organisation
Skills
- A proactive and systematic approach to problem-solving, with a high degree of ownership
- Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy
- Proficiency in at least one programming language, with a preference for Go. You should be comfortable writing custom applications, not just scripts
- Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD)
- Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP)
- A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues
- An understanding of core SRE principles, including SLIs, SLOs, and error budgets
- Experience in an on-call rotation for a 24/7 cloud-based environment
- Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven
Benefits
- Equity (where applicable)
- Bonus
- Benefits, including health, dental, and vision insurance
- RRSP with a match
- Healthcare spending
- Telemedicine
- Paid leave (including PTO and parental leave)
Company Overview
Company H1B Sponsorship