Tensor Parallel MLP (AMD) ========================= Tensor Parallel MLP layer for AMD GPUs. See ``python/triton_dist/layers/amd/tp_mlp.py`` for implementation details.