Welcome to Triton-distributed’s documentation!

Triton-distributed is a distributed compiler designed for computation-communication overlapping, which is based on OpenAI Triton.

Using Triton-distributed, programmers are able to develop efficient kernels comparable to highly-optimized libraries (including Distributed-GEMM and FLUX). Triton-distributed currently mainly targets Nvidia GPU and AMD GPU. It can also be ported to other hardware platforms. Feel free to contact us if you want to use Triton-distributed on your own hardware.

Getting Started

Follow the build instructions for your platform of choice.
Take a look at the tutorials to learn how to write your first Triton-distributed program.
Explore our end-to-end integration to learn how Triton-Distributed accelerates inference for real-world LLMs.
Try our megakernel implementations to learn how Triton-Distributed accelerates inference for real-world LLMs.
Learn how to run all tests to verify your installation.