Keyword Schemas

Overview

Using many keyword arguments is very common in ML, especially as seen in the transformers library. Defining, maintaining, and versioning these in production setups is supported in Pipeline by the use of schemas (similar to pydantic). They're called an InputSchema that can have several associated InputField variables:

from pipeline.objects.graph import InputField, InputSchema

class MyInputSchema(InputSchema):
    in_1: int = InputField(lt=5, gt=-5, description="kwarg 1", title="my_int")
    in_2: int | None = InputField(
        default=0, lt=5, ge=-5, description="kwarg 1", title="my_optional_int"
    )

These InputSchema objects can be used as a variable type in a Pipeline:

@pipe
def my_func(in_1: int, other_schema: MyInputSchema) -> int:
    return in_1 + other_schema.in_2 + other_schema.in_1


with Pipeline() as builder:
    var_1 = Variable(int, lt=10, ge=0)
    var_2 = Variable(MyInputSchema)

    output = my_func(var_1, var_2)
    builder.output(output)

🚧

You can only use InputFields in the schema definition

Some important notes when using the InputFields:

Optional fields and default values

There are two way to define an optional field (a default value must always be provided). Either by using the typing.Optional object or using the or operator with None (int | None).

note: None can be the default value, ellipsis are the pythonic default representing a literal absence of an input

from typing import Optional
class MyInputSchema(InputSchema):
    in_1: Optional[int] = InputField(default=1)
    # OR
    in_2: int | None = InputField(default=2)

Runs

When performing a run when using the the InputSchema object you treat it as a dictionary. The conversion and validation into the full schema class object is handled for you:

my_pl = builder.get_pipeline()

rm_pipeline = upload_pipeline(
    my_pl,
    "schema-demo",
    "numpy",
    minimum_cache_number=1,
)

result = run_pipeline(
  "schema-demo:v1",
  1,
  {"in_1": 2}
)

Operating this way ensures that any client can send an API request over http without needing any python specific objects.

Validation

The InputField object takes in the following kwargs for validation (all are optional):

  • default (type: any) - The default value of the variable
  • title (type : str) - The name of the variable
  • description (type : str) - Basic description of the variable
  • examples (type : list) - List of possible inputs
  • gt (type : int) - Greater than (int/float)
  • ge (type : int) - Greater than or equal to (int/float)
  • lt (type : int) - Less than (int/float)
  • le (type : int) - Less than or equal to (int/float)
  • multiple_of (type : int) - Must be a multiple of this number (int/float)
  • allow_inf_nan (type : bool) - Whether to allow infinities or nan values (int/float)
  • max_digits (type : int) - Maximum number of digits in the number to allow (int/float)
  • decimal_places (type : int) - maximum number of decimal places to allow in the number (int/float)
  • min_length (type : int) - Minimum length of an input string (string)
  • max_length (type : int) - Maximum length of the input string (string)
  • choices (type : list) - A list of the only inputs that can be entered