NVIDIA Layers

High-level layer abstractions for NVIDIA GPUs.

Layer List

Available NVIDIA Layers

Layer

Description

Tensor Parallel Attention

Tensor Parallel Attention layer

Tensor Parallel MLP

Tensor Parallel MLP layer

Tensor Parallel MoE

Tensor Parallel MoE layer

Sequence Parallel Flash Decode

Sequence Parallel Flash Decode layer

Expert Parallelism All-to-All Layer

Expert Parallelism All-to-All layer

Expert Parallelism All-to-All Fused Layer

EP All-to-All fused layer (megakernel with token optimization)

Low-Latency EP All-to-All Layer

Low-Latency Expert Parallelism All-to-All layer

GEMM AllReduce Layer

GEMM + AllReduce layer

Low-Latency AllGather Layer

Low-Latency AllGather layer

Ulysses SP All-to-All Layer

Ulysses SP All-to-All layer

Pipeline Parallel Block

Pipeline Parallel block