We're looking for a highly skilled and motivated Site Reliability Engineer to join the Platform SRE team at Trimble. This hybrid role is crucial for ensuring the reliability, scalability, and performance of our cloud-based platforms on Azure while also contributing to the design and implementation of new cloud infrastructure.
Requirements
- Monitor and maintain the health and performance of production services and applications using New Relic.
- Respond to and resolve incidents, troubleshoot complex system issues, and perform root cause analysis.
- Implement and improve monitoring, alerting, and logging systems to proactively identify problems.
- Automate repetitive tasks and operational processes using PowerShell and Bash scripting.
- Manage on-call rotations and incident response protocols.
- Design, build, and maintain secure and scalable cloud infrastructure on Azure.
- Develop and manage infrastructure as code (IaC) using Terraform.
- Implement and optimize CI/CD pipelines using GitHub Actions and Azure DevOps Pipelines to facilitate rapid and reliable software deployments.
- Manage our Kubernetes clusters and services to ensure optimal performance and uptime.
- Ensure cloud security best practices are followed, including managing IAM roles, security groups, and encryption.
- Collaborate with software development teams to optimize applications for the cloud and provide expertise on cloud architecture.
Benefits
- Comprehensive benefits including Medical, Dental, Vision, Life, Disability, Time off plans and retirement plans
- Paid Parental Leave
- Employee Stock Purchase Plan