HelloKindred are specialists in staffing marketing, creative and technology roles, offering a range of talent solutions that can be delivered on-site, remotely or hybrid. Our client in the Information Technology and Services industry is looking for a Site Reliability Engineer (SRE) to support and enhance a complex, multi-cloud Kubernetes platform environment.
Requirements
- Operate and enhance Kubernetes platforms across AWS, Azure, and on-premise environments.
- Lead incident response, problem management, and root cause analysis activities.
- Deliver cluster lifecycle management including upgrades, patching, node pool management, CNI and CSI configuration, ingress management, and Rancher operations.
- Own observability strategy including dashboards, alerting, monitoring, and definition of SLOs and SLIs.
- Implement GitOps practices using Fleet and reduce operational toil through automation and governance.
- Apply secure API gateway and Web Application Firewall (WAF) patterns.
- Design and support distributed systems including event brokers and asynchronous messaging architectures.
- Maintain platform security posture including CVE remediation, GRC controls, and security scanning pipelines.
- Provision and manage infrastructure using Terraform and Crossplane as orchestration layers.
- Implement and maintain CI/CD pipelines using Concourse, GitHub Actions, and Azure DevOps.
- Ensure compliance with PCI DSS and GDPR security patterns.
- Deep expertise in Kubernetes, Rancher, GitOps, Linux, and cloud networking.
- Strong experience operating in hybrid cloud environments across AWS, Azure, and on-premise platforms.
- Strong automation and scripting skills in Python, Go, Bash, PowerShell, or.NET.
- Proven experience with Infrastructure as Code using Terraform and Crossplane.
- Experience implementing and managing observability tooling including Grafana, Prometheus, Jaeger or Tempo, CloudWatch, Loki, and OpenTelemetry.
- Strong understanding of API gateway and Web Application Firewall patterns.
- Experience working with distributed systems and event-driven architectures.
- Experience operating within regulated environments including PCI DSS and GDPR.
- Knowledge of service mesh technologies such as Istio or Kuma is desirable.
- AWS operational experience is advantageous.
- Experience within payments or other regulated industries is beneficial.