We are looking for an Infrastructure Engineer to help us design, build, and scale the foundational architecture that powers our next-generation AI systems. This role is ideal for someone who thrives in a fast-paced, engineering-driven environment and finds joy in creating robust, elegant systems from scratch.
Requirements
- Build and maintain stable, scalable, and highly available compute infrastructure, spanning cloud (AWS) and bare metal environments.
- Design and operate efficient storage solutions for large-scale AI training datasets and checkpoints.
- Develop high-performance online inference systems, optimizing for diverse GPU environments (e.g., H100, B200).
- Automate infra workflows to maximize reliability, observability, and performance across our platform.
- Collaborate closely with AI researchers and backend engineers to support evolving model deployment and experimentation needs.
- Lead and contribute to internal tooling, CI/CD pipelines (e.g., GitHub Actions), and monitoring infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).
Benefits
- Stock options available for core team members.
- 401(k) plan for employees.
- Comprehensive health, dental, and vision insurance.
- The latest and best office equipment.