Low-Latency AllGather Layer
High-level layer for low-latency AllGather operations.
Description
This layer provides optimized AllGather for small message sizes with minimal latency.
See python/triton_dist/layers/nvidia/low_latency_allgather_layer.py for implementation details.