NVIDIA Layers
High-level layer abstractions for NVIDIA GPUs.
Layer List
Layer |
Description |
|---|---|
Tensor Parallel Attention layer |
|
Tensor Parallel MLP layer |
|
Tensor Parallel MoE layer |
|
Sequence Parallel Flash Decode layer |
|
Expert Parallelism All-to-All layer |
|
EP All-to-All fused layer (megakernel with token optimization) |
|
Low-Latency Expert Parallelism All-to-All layer |
|
GEMM + AllReduce layer |
|
Low-Latency AllGather layer |
|
Ulysses SP All-to-All layer |
|
Pipeline Parallel block |