Qwen MoE Model

Qwen MoE (Mixture of Experts) model implementation for distributed inference.

Description

The Qwen MoE module provides a complete implementation of Qwen MoE models with tensor parallelism and expert parallelism support.

Example Usage

# Test MoE E2E
bash scripts/launch.sh --nproc_per_node=4 python/triton_dist/test/nvidia/test_e2e_inference.py \
    --bsz 4096 --gen_len 128 --max_length 150 --model <moe_model_path> --backend triton_dist