We are seeking a Senior Site Reliability Engineer (SRE) to design, scale, and secure our rapidly growing platform infrastructure. This role involves hands-on work across all critical systems, ensuring availability, performance, and cost efficiency. The ideal candidate thrives on complex distributed systems and automation.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field - or equivalent work experience.
- 8+ years in SRE / DevOps / Infrastructure Engineering roles.
- Deep Kubernetes expertise (multi-cluster, Helm chart development, advanced networking).
- Strong GitOps workflows using ArgoCD/Flux.
- Expertise with AWS (preferred) or Azure/GCP, plus Infrastructure-as-Code (Terraform, Pulumi, CloudFormation).
- Advanced knowledge of SQL & NoSQL databases (MySQL/Aurora, PostgreSQL, MongoDB, Redis).
- Scripting/automation skills in Python, Bash, or Go.
- Solid background in monitoring/observability (Prometheus, Grafana, Loki, ELK/Opensearch, VictoriaMetrics).
- Experience with CI/CD at scale and managing production incidents.
- Experience with streaming/messaging (Kafka, RabbitMQ, or similar).
Benefits
- Comprehensive Training & Development programs
- Performance-based Bonus incentives
- Flexible Work From Home options