AI Operations Platform Consultant position at ELEVI, with a focus on deploying, managing, and operating containerized services on Kubernetes and LLMs. The role involves managing MLOps/LLMOps pipelines, setting up AI inference service monitoring, and deploying models in production environments.
Requirements
- Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
- Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server
- Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
- Setup and operation of AI inference service monitoring for performance and availability
- Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc.
- Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc.
- Managing scalable infrastructure for deploying and managing LLMs
- Deploying models in production environments, including containerization, microservices, and API design
- Triton Inference Server, including its architecture, configuration, and deployment
- Model Optimization techniques using Triton with TRTLLM
- Model optimization techniques, including pruning, quantization, and knowledge distillation
Benefits
- Healthcare
- Wellness
- Financial
- Retirement
- Family support
- Continuing education
- Time off benefits