Welcome to Triton-distributed’s documentation!
Triton-distributed is a distributed compiler designed for computation-communication overlapping, which is based on OpenAI Triton.
Using Triton-distributed, programmers are able to develop efficient kernels comparable to highly-optimized libraries (including Distributed-GEMM and FLUX). Triton-distributed currently mainly targets Nvidia GPU and AMD GPU. It can also be ported to other hardware platforms. Feel free to contact us if you want to use Triton-distributed on your own hardware.
Getting Started
Follow the build instructions for your platform of choice.
Take a look at the tutorials to learn how to write your first Triton-distributed program.
Explore our end-to-end integration to learn how Triton-Distributed accelerates inference for real-world LLMs.
Try our megakernel implementations to learn how Triton-Distributed accelerates inference for real-world LLMs.
Learn how to run all tests to verify your installation.
Distributed Kernels
Triton-distributed provides optimized distributed kernels for both NVIDIA and AMD GPUs:
Layer Abstractions
High-level layer abstractions for easier model integration:
Model Implementations
End-to-end model implementations with distributed inference support: