HelloKindred are specialists in staffing marketing, creative and technology roles, offering a range of talent solutions that can be delivered on-site, remotely or hybrid. Our client in the Information Technology and Services industry is looking for a Site Reliability Engineer (SRE) to support and enhance a complex, multi-cloud Kubernetes platform environment.

Requirements

Operate and enhance Kubernetes platforms across AWS, Azure, and on-premise environments.
Lead incident response, problem management, and root cause analysis activities.
Deliver cluster lifecycle management including upgrades, patching, node pool management, CNI and CSI configuration, ingress management, and Rancher operations.
Own observability strategy including dashboards, alerting, monitoring, and definition of SLOs and SLIs.
Implement GitOps practices using Fleet and reduce operational toil through automation and governance.
Apply secure API gateway and Web Application Firewall (WAF) patterns.
Design and support distributed systems including event brokers and asynchronous messaging architectures.
Maintain platform security posture including CVE remediation, GRC controls, and security scanning pipelines.
Provision and manage infrastructure using Terraform and Crossplane as orchestration layers.
Implement and maintain CI/CD pipelines using Concourse, GitHub Actions, and Azure DevOps.
Ensure compliance with PCI DSS and GDPR security patterns.
Deep expertise in Kubernetes, Rancher, GitOps, Linux, and cloud networking.
Strong experience operating in hybrid cloud environments across AWS, Azure, and on-premise platforms.
Strong automation and scripting skills in Python, Go, Bash, PowerShell, or.NET.
Proven experience with Infrastructure as Code using Terraform and Crossplane.
Experience implementing and managing observability tooling including Grafana, Prometheus, Jaeger or Tempo, CloudWatch, Loki, and OpenTelemetry.
Strong understanding of API gateway and Web Application Firewall patterns.
Experience working with distributed systems and event-driven architectures.
Experience operating within regulated environments including PCI DSS and GDPR.
Knowledge of service mesh technologies such as Istio or Kuma is desirable.
AWS operational experience is advantageous.
Experience within payments or other regulated industries is beneficial.

Requirements

Operate and enhance Kubernetes platforms across AWS, Azure, and on-premise environments.

Lead incident response, problem management, and root cause analysis activities.

Deliver cluster lifecycle management including upgrades, patching, node pool management, CNI and CSI configuration, ingress management, and Rancher operations.

Own observability strategy including dashboards, alerting, monitoring, and definition of SLOs and SLIs.

Implement GitOps practices using Fleet and reduce operational toil through automation and governance.

Apply secure API gateway and Web Application Firewall (WAF) patterns.

Design and support distributed systems including event brokers and asynchronous messaging architectures.

Maintain platform security posture including CVE remediation, GRC controls, and security scanning pipelines.

Provision and manage infrastructure using Terraform and Crossplane as orchestration layers.

Implement and maintain CI/CD pipelines using Concourse, GitHub Actions, and Azure DevOps.

Ensure compliance with PCI DSS and GDPR security patterns.

Deep expertise in Kubernetes, Rancher, GitOps, Linux, and cloud networking.

Strong experience operating in hybrid cloud environments across AWS, Azure, and on-premise platforms.

Strong automation and scripting skills in Python, Go, Bash, PowerShell, or.NET.

Proven experience with Infrastructure as Code using Terraform and Crossplane.

Experience implementing and managing observability tooling including Grafana, Prometheus, Jaeger or Tempo, CloudWatch, Loki, and OpenTelemetry.

Strong understanding of API gateway and Web Application Firewall patterns.

Experience working with distributed systems and event-driven architectures.

Experience operating within regulated environments including PCI DSS and GDPR.

Knowledge of service mesh technologies such as Istio or Kuma is desirable.

AWS operational experience is advantageous.

Experience within payments or other regulated industries is beneficial.

Site Reliability Engineer (SRE)

About the Company

Job Description

Requirements

Similar Jobs

Site Reliability Engineer (SRE)

Platform Engineer (Kubernetes + Automation)

Automation Engineer

Site Reliability Engineer (SRE)

About the Company

Job Description

Requirements

Similar Jobs

Site Reliability Engineer (SRE)

Platform Engineer (Kubernetes + Automation)

Automation Engineer

Job Details

About HelloKindred