Tensor Parallel Attention (AMD)

Tensor Parallel Attention layer for AMD GPUs.

See python/triton_dist/layers/amd/tp_attn.py for implementation details.