Nestmed is seeking a founding Backend Engineer to lead the development of their LLM Orchestration team, deploying and managing LLMs at scale to impact patient care. The ideal candidate will have 6+ years of backend engineering experience and expertise in ML infrastructure, model serving, and distributed training.
Requirements
- 6+ years of backend engineering experience building high-performance distributed systems
- Deep production experience with LLMs including multi-provider orchestration, custom model serving, and building reliable inference infrastructure at scale
- Strong expertise in ML infrastructure including model serving frameworks (TensorRT, vLLM, TorchServe), distributed training, and GPU optimization
- Experience with model evaluation and monitoring including A/B testing frameworks, performance monitoring, and building comprehensive observability for ML systems
- Proficiency in Python and ML frameworks with hands-on experience in model fine-tuning, prompt engineering, and deploying custom models to production
- Track record scaling ML systems with experience optimizing inference costs, managing multiple model providers, and building reliable AI infrastructure
- Understanding of healthcare or regulated industries where model accuracy, auditability, and compliance are mission-critical requirements
- San Francisco-based and excited about working closely with AI researchers to productionize cutting-edge models for healthcare applications
Benefits
- Build the AI infrastructure that processes millions of patient interactions
- Directly impact care quality for thousands of patients daily
- Every optimization reduces healthcare costs, improves clinical accuracy, and enables new AI capabilities that transform patient outcomes