We're looking for a Head of Infrastructure to lead the design, evolution, and reliability of Hyperbolic’s globally distributed GPU cloud. This role sits at the center of our mission: you will architect and scale the systems that power our peer-to-peer GPU marketplace, inference fabric, and core platform primitives.
Requirements
- 10+ years in infrastructure, systems engineering, or distributed systems, including 5+ years leading managers and senior ICs.
- Proven ability to own multi-year infrastructure roadmaps, align stakeholders, and translate ambiguous requirements into crisp technical direction.
- Experience building, scaling, and mentoring high-performing engineering orgs across infrastructure, platform, and SRE disciplines.
- Exceptional judgment in balancing velocity with reliability, cost, and security.
- Comfortable working in fast-moving, high-stakes environments where infrastructure is the product.
- Deep expertise in distributed systems, operating systems internals, networking, and resource orchestration.
- Hands-on experience with container orchestration systems (Kubernetes, Nomad, SLURM, custom schedulers) at global scale.
- Strong engineering background with the ability to read and write production code (Go, Rust, Python, or similar).
- Experience architecting multi-cloud + on-prem + edge topologies, including GPU-centric workloads.
- Expert-level understanding of infrastructure-as-code, automation frameworks, and GitOps workflows.
- Expertise in designing observability systems (metrics, tracing, logging, alerting) and building operational excellence.
- A track record of owning 99.9–99.99% uptime targets, incident response processes, and resilience engineering.
- Passionate about security-first infrastructure, including workload isolation, network security, IAM, hardening, and compliance.
- Experience leading major capacity planning, load forecasting, and cost optimization initiatives.
Benefits
- equity
- health
- remote policy
- hardware budget
- offsites