Dense Model =========== Dense transformer model implementation for distributed inference. Description ----------- The dense model module provides a complete implementation of dense transformer models (e.g., Qwen, LLaMA) with tensor parallelism support. Features -------- - Tensor Parallel attention and MLP - Multiple parallelism modes (ag_rs, allreduce, gemm_ar) - KV cache support for efficient decoding Example Usage ------------- .. code-block:: bash # End-to-end inference test bash scripts/launch.sh python/triton_dist/test/nvidia/test_e2e_inference.py \ --bsz 4096 --gen_len 128 --max_length 150 --model --backend triton_dist