Sequence Parallel AllGather Attention

Fused Sequence Parallel AllGather + Attention kernels.

API Reference

Intra-Node

fused_sp_ag_attn_intra_node(...)

Fused SP AllGather + Attention for intra-node communication.

create_sp_ag_attention_context_intra_node(...)

Creates context for intra-node SP AG Attention.

Inter-Node

fused_sp_ag_attn_inter_node(...)

Fused SP AllGather + Attention for inter-node communication.

create_sp_ag_attention_context_inter_node(...)

Creates context for inter-node SP AG Attention.

Example Usage

# Test SP AG Attention (intra-node)
bash scripts/launch.sh python/triton_dist/test/nvidia/test_sp_ag_attention_intra_node.py \
    --batch_size 1 --q_head 32 --kv_head 32 --max_seqlen_q 8192 --max_seqlen_k 8192 --head_dim 128