Our teams’ mission is to explore, develop and help productionize high performance software & hardware technologies for AI at datacenter scale.

Requirements

Explore, co-design and optimize parallelisms, compute efficiency, distributed training/inference paradigms and algorithms to improve the scalability, efficiency and reliability of inference and large-scale training systems.
Innovate and co-design novel model architectures for sustained scaling and hardware efficiency during training and inference.
Benchmark, analyze, model and project the performance of AI workloads against a wide range of what-if scenarios and provide early input to the design of future hardware, models and runtime, giving crucial feedback to the architecture, compiler, kernel, modeling and runtime teams.
Explore, co-design and productionize model compression techniques such as Quantization, Pruning, Distillation and Sparsity to improve training and inference efficiency.
Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators for Meta’s AI workloads.
Optimize inference and training communications performance at scale and investigate improvements to algorithms, tooling, and interfaces, working across multiple accelerator types and HPC collective communication libraries such as NCCL, RCCL, UCC and MPI.
Guide Meta’s AI HW requirements and design focusing on performance at System and Silicon levels.

Benefits

Our teams’ mission is to explore, develop and help productionize high performance software & hardware technologies for AI at datacenter scale.

Explore, co-design and optimize parallelisms, compute efficiency, distributed training/inference paradigms and algorithms to improve the scalability, efficiency and reliability of inference and large-scale training systems.
Innovate and co-design novel model architectures for sustained scaling and hardware efficiency during training and inference.
Benchmark, analyze, model and project the performance of AI workloads against a wide range of what-if scenarios and provide early input to the design of future hardware, models and runtime, giving crucial feedback to the architecture, compiler, kernel, modeling and runtime teams.
Explore, co-design and productionize model compression techniques such as Quantization, Pruning, Distillation and Sparsity to improve training and inference efficiency.
Explore, prototype and productionize highly optimized ML kernels to unlock full potential of current and future accelerators for Meta’s AI workloads.
Optimize inference and training communications performance at scale and investigate improvements to algorithms, tooling, and interfaces, working across multiple accelerator types and HPC collective communication libraries such as NCCL, RCCL, UCC and MPI.
Guide Meta’s AI HW requirements and design focusing on performance at System and Silicon levels.