Tensor Parallel MoE

Tensor Parallel Mixture of Experts layer for distributed inference.

Description

The TP MoE layer provides distributed MoE computation with efficient expert routing and tensor parallelism.

Example Usage

# Test TP MoE
bash scripts/launch.sh --nproc_per_node=4 python/triton_dist/test/nvidia/test_tp_moe.py \
    --bsz 32 --seq_len 128 --model <moe_model_path>