GEMM AllReduce (AMD)

GEMM + AllReduce kernel for AMD GPUs.

See the gemm_allreduce.py module in python/triton_dist/kernels/amd/ for implementation details.