Ulysses Sequence Parallelism

Ulysses-style Sequence Parallelism communication kernels.

API Reference

create_ulysses_sp_pre_attn_comm_context(...)

Creates context for Ulysses SP pre-attention communication.

All-to-All Single GEMM

create_all_to_all_single_gemm_context(...)

Creates context for All-to-All + GEMM fusion.

all_to_all_single_gemm(...)

Fused All-to-All + GEMM operation.

Example Usage

# Test Ulysses SP Dispatch
bash scripts/launch.sh python/triton_dist/test/nvidia/test_ulysses_sp_dispatch.py 1 8000 32 128 --gqa 8 --check