Cold start optimisations
Many models can take several minutes to load for the first time. The Pipeline SDK offers several features that help in reducing the impact of this on your inference requests.
Common reasons for long cold start times:
- Downloading model weights. This is normally due to unreliable, volatile, and slow download speeds (commonly from Huggingface when using
transformers
) - Initialising model weights on the GPU
- Loading Libraries (
torch
,transformers
,diffusers
,tensorflow
etc)
The typical procedure before running inference is:
- Startup VM instance in the cloud
- Start docker container
- Initialise environment
- Download model weights
- Load model on GPU
- Ready for inference
The next sections cover the features available for decreasing, or removing any cold start times from your pipelines.
Preemptive caching
Both Catalyst and a pcore deployment automate the optimisations of steps 1-3 of the typical procedure, but 4-5 are optimised in the Pipeline SDK when you create a pipeline. Catalyst and pcore deployments automatically try to load your models by looking at the current and historical traffic going to those models. This is called preemptive caching. To make your pipelines compatible with this the Pipeline SDK has arguments in the pipe
decorator, on_startup
and run_once
, we will cover these in shortly.
Here's an example basic StableDiffusion pipeline:
from pathlib import Path
from typing import List
import torch
from diffusers import StableDiffusionPipeline
from pipeline import Pipeline, Variable, entity, pipe
from pipeline.cloud import compute_requirements, environments, pipelines
from pipeline.objects import File
from pipeline.objects.graph import InputField, InputSchema
@entity
class StableDiffusionModel:
@pipe(on_startup=True, run_once=True)
def load(self):
model_id = "runwayml/stable-diffusion-v1-5"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.pipe = StableDiffusionPipeline.from_pretrained(
model_id,
)
self.pipe = self.pipe.to(device)
@pipe
def predict(self, prompt: str) -> List[File]:
images = self.pipe(prompt).images
output_images = []
for i, image in enumerate(images):
path = Path(f"/tmp/sd/image-{i}.jpg")
path.parent.mkdir(parents=True, exist_ok=True)
image.save(str(path))
output_images.append(File(path=path, allow_out_of_context_creation=True))
return output_images
with Pipeline() as builder:
prompt = Variable(str)
model = StableDiffusionModel()
model.load()
output = model.predict(prompt, kwargs)
builder.output(output)
On line 14
we see the use of the two on_startup
and run_once
keyword arguments, but the Pipeline is built as normal as shown in line 37
onwards.
The keyword arguments act as their names describe:
on_startup
- Run thepipe
when the Pipeline is first run, or when the Pipelinesstartup
function is calledrun_once
- Only run the function on the first call, and do not run it again
Files and directories
Libraries such as transformers
use public mirrors to download model weights, this can be unstable and unreliable. It's common to see very volatile/unpredictable download speeds often doubling the cold start times of models. To overcome this it's possible to fully bundle the model before uploading the pipeline, and then use Files and directories objects to store everything on Catalyst/pcore directly. Here's an example:
from pipeline.objects import File
@entity
class StableDiffusionModel:
...
@pipe(on_startup=True, run_once=True)
def load(self, model_file: File):
import dill
model_id = "runwayml/stable-diffusion-v1-5"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
with model_file.path.open("rb") as file:
self.pipe = dill.load(file)
self.pipe = self.pipe.to(device)
...
my_model = self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
with Pipeline() as builder:
prompt = Variable(str)
my_file = File.from_object(my_model) # Create a file object
model = StableDiffusionModel()
model.load(my_file) # Pass file object to load function
output = model.predict(prompt, kwargs)
builder.output(output)
Updated about 1 month ago