Join our dynamic and collaborative technology team as a Site Reliability Engineer to ensure the reliability, scalability, and performance of critical services. You'll design, build, and maintain highly available, cost-effective, and reliable foundational tools, architecture, and infrastructure on the cloud and Kubernetes.
Requirements
- Deep expertise in cloud services (AWS and/or GCP) particularly IAM
- Significant experience managing and troubleshooting services within Kubernetes environments, and an understanding of Kubernetes as an ecosystem
- Strong proficiency in observability platforms, including monitoring, alerting, and production operations. Particularly Prometheus / Grafana.
- Hands-on experience codifying infrastructure with Terraform and Helm charts.
- Excellent incident response and troubleshooting abilities.
- Proficiency in scripting and automation using Python and shell scripting.
- Experience working with containerized workloads.
- Knowledge of networking and basic HTTP/TLS.
- Experience collaborating with software engineers to support production cloud-native applications.
Benefits
- Equity
- Private medical insurance
- Unlimited Time Off Policy
- Hybrid work approach
- New starter budget to kit out home office
- Annual learning budget