The Technical Infrastructure SRE team is responsible for managing the whole infrastructure and applications, with a mission to ensure all production systems can support our fast growing world-wide user base as well as keep the entire systems stable, efficient and cost effective.
Requirements
- Manage deployments, system capacity, traffic scheduling, fault tolerance, disaster recovery, emergency response, automations, operation platforms development, etc.
- Ensure the stability of the company's core infrastructure (system high availability and reliability), focus on system performance and capacity, establish O&M (Operation & Maintenance) standards and SOP processes.
- Troubleshoot and locate technical issues, collaborate with the technical team to develop and implement system capacity planning, performance testing, anomaly analysis, and fault diagnosis and resolution strategies.
- Design and implement O&M platforms to achieve efficient, automated, and intelligent system maintenance.
- Develop delivery standards for mass production system scales, from budgeting to resource delivery, to online system capacity assessments, to help the company optimize IT costs.
Benefits