Welcome to Triton-distributed’s documentation!

Triton-distributed is a distributed compiler designed for computation-communication overlapping, which is based on OpenAI Triton.

Using Triton-distributed, programmers are able to develop efficient kernels comparable to highly-optimized libraries (including Distributed-GEMM and FLUX). Triton-distributed currently mainly targets Nvidia GPU and AMD GPU. It can also be ported to other hardware platforms. Feel free to contact us if you want to use Triton-distributed on your own hardware.

Getting Started

Distributed Kernels

Triton-distributed provides optimized distributed kernels for both NVIDIA and AMD GPUs:

Layer Abstractions

High-level layer abstractions for easier model integration:

Model Implementations

End-to-end model implementations with distributed inference support:

Python API