Getting started

Jump to Content

Guides API Reference Changelog

Guides

Guides API Reference Changelog

Overview

Getting started
FAQ

API

Authentication
Inference
Pointers
Files
Inputs and outputs

Pipeline building

Quickstart
Pipeline building
Streaming
Fractional GPUs & GPU sharing
Using the README.md
Cold start optimisations
Migration guide
Entity objects
GPUs and accelerators
Troubleshooting
Migrating from other frameworks

Advanced features

CI/CD Integration
Teams
Turbo Registry

Cloud integration (BYOC)

Overview
Deploying Pipelines
Scaling configuration
Warmup & Cooldown
Walkthrough video

tutorials

Llama 2 chat with vLLM (7B, 13B & multi-gpu 70B)
Deploy Mistral 7B with vLLM
Deploy Stable Diffusion
How to deploy a Hugging Face model
Deploy MusicGen model from AudioCraft
Deploy Mixtral 8x7B with Exllamav2
How to reduce cold starts in ML models running in production
Transcribe YouTube videos
Deploy Gemma 7B with TensorRT-LLM
Deploy Llama 3 70B with ExLlamav2
Run ComfyUI as an API

Powered by