How to deploy a Hugging Face model
We outline how to deploy a HuggingFace pretrained transformers model. For this example, we use a text classification sentiment analysis model.
Initialise a new pipeline
In an empty directory, you can initialise a new project by running the following command:
pipeline container init
You should see 2 new files appear, my_new_pipeline.py
and pipeline.yaml
, with some initial content that we will later update. The .py
file should contain the code that should be executed at runtime while the .yaml
file is a configuration file for the pipeline.
The pipeline graph
Substitute the following code into my_new_pipeline.py
:
import numpy as np
import preprocessor
import torch
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
from pipeline import Pipeline, Variable, entity, pipe
def softmax(x):
"""Used to convert raw model output scores into probabilities."""
x_max = np.amax(x, keepdims=True)
exp_x_shifted = np.exp(x - x_max)
return exp_x_shifted / np.sum(exp_x_shifted, keepdims=True)
#: The HuggingFace source model
HF_MODEL_NAME = "cardiffnlp/twitter-roberta-base-sentiment-latest"
@entity
class RobertaPipeline:
def __init__(self) -> None:
self.tokenizer = None
self.config = None
self.model = None
self.device = None
@pipe(on_startup=True, run_once=True)
def load(self) -> None:
"""Load the model, tokenizer and config"""
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model = AutoModelForSequenceClassification.from_pretrained(
HF_MODEL_NAME
).to(self.device)
self.tokenizer = AutoTokenizer.from_pretrained(HF_MODEL_NAME)
# Used in postprocessing to map IDs to labels
self.config = AutoConfig.from_pretrained(HF_MODEL_NAME)
@pipe
def preprocess(self, raw_text: str) -> str:
"""Preprocesses the input text by filtering out unwanted strings.
for further details, see https://github.com/s/preprocessor"""
options = [
preprocessor.OPT.URL,
preprocessor.OPT.MENTION,
preprocessor.OPT.ESCAPE_CHAR,
preprocessor.OPT.RESERVED,
]
preprocessor.set_options(*options)
return preprocessor.clean(raw_text)
@pipe
def predict(self, input_text: str) -> list[float]:
"""Tokenize the input and feed it to the model"""
encoded_input = self.tokenizer(input_text, return_tensors="pt").to(self.device)
output = self.model(**encoded_input)
# Detatch scores from the computation graph and converted into a numpy array.
scores = output[0][0].detach().cpu().numpy()
return scores
@pipe
def postprocess(self, scores: list[float]) -> list[dict[str, float]]:
"""The raw scores from the model are passed through the softmax
function to convert them into probabilities.
The final output represents the model's confidence for each class
(positive, negative, neutral)."""
probablities = softmax(scores)
ranking = np.argsort(probablities)
ranking = ranking[::-1]
result = [
dict(
label=self.config.id2label[ranking[i]],
score=np.round(float(probablities[ranking[i]]), 4),
)
for i in range(probablities.shape[0])
]
return result
# : Define the computational graph for the pipeline
with Pipeline() as builder:
input_text = Variable(
str,
title="input_text",
description="The text that sentiment analysis will be performed on.",
max_length=512,
)
roberta_pipeline = RobertaPipeline()
roberta_pipeline.load()
text = roberta_pipeline.preprocess(input_text)
scores = roberta_pipeline.predict(text)
output = roberta_pipeline.postprocess(scores)
builder.output(output)
# Get the computational graph
my_new_pipeline = builder.get_pipeline()
The pipeline config
Substitute the following configuration into pipeline.yaml
:
runtime:
container_commands:
- apt-get update
- apt-get install -y git
python:
version: "3.10"
requirements:
- pipeline-ai
- tweet-preprocessor==0.6.0
- torch==2.0.1
- transformers==4.32.0
cuda_version: "11.4"
accelerators: ["nvidia_t4"]
accelerator_memory: null
pipeline_graph: new_pipeline:my_new_pipeline
pipeline_name: twitter-roberta-base-sentiment
description: null
readme: null
extras: {}
Build pipeline docker image
To build a docker image of the pipeline, run the following command from the same directory containing the files:
pipeline container build
Run container locally
Before uploading your pipeline, we strongly encourage to test your pipeline locally, to ensure it behaves as expected. To do so, run:
pipeline container up
It will take a little while for the model to be downloaded from HuggingFace. The container comes with an API, allowing you to make local inference requests. Check out the play form to quickly start testing out the pipeline.
Push pipeline docker image
To upload the pipeline, or push the docker image of the pipeline to the Mystic docker registry, run the following command:
pipeline container push
Updated about 1 year ago