Low-Latency AllGather Layer

High-level layer for low-latency AllGather operations.

Description

This layer provides optimized AllGather for small message sizes with minimal latency.

See python/triton_dist/layers/nvidia/low_latency_allgather_layer.py for implementation details.