Files and directories
When working with ML you will often need to use one or several files that contain model weights or other relevant things to your workflow. It's also possible to send a large image or video (file) as part of a run request. The Pipeline SDK has two things that make working with these easier:
File
- An object that references a local file and allows it to be used remotelyDirectory
- Like a file, but for a full directory and its contents
As a bonus, if you need to use a file that's already hosted online you can use:
FileURL
- An object that references a file to be downloaded via a given URL by the end server
You can create these objects programatically or in the CLI, and also as part of the pipeline definition or directly in a Run.
These file objects are used in two primary ways:
- Static definition - For when you need a file to be accessible every time the pipeline is run or initialised, e.g. model weights.
- Run time input - When a file is different every time a pipeline is run, like when passing in an image to do some processing on it, or transcribe a large audio file.
You can also return
File
orDirectory
objects from a run in our hosted storage, see File responses
Static pipeline files
Using a File
object in the pipeline definition allows you to send local files on pipeline upload for cold start optimisation, but the point is the file will be needed for every run in someway. Typically, this is model weights.
You can add a File
or Directory
in your pipeline like follows:
from pipeline import File, Pipeline, Variable, entity, pipe, FileURL
from pipeline.cloud.pipelines import upload_pipeline
@entity
class MainClass:
@pipe(run_once=True, on_startup=True)
def my_func(self, my_file: File) -> str:
self.text = my_file.path.read_text()
@pipe
def get_text(self, rand: str) -> str:
return self.text
with Pipeline() as builder:
rand = Variable(str)
my_file = File("./local_file.txt")
# OR
my_file = FileURL("https://storage.mystic.ai/run_files/c8/5c/c85c1853-d1cd-40d5-9aaa-6e2cfaca48d5/image-0.jpg")
en = MainClass()
en.my_func(my_file)
res = en.get_text(rand)
builder.output(res)
my_pl = builder.get_pipeline()
remote_pipeline = upload_pipeline(my_pl, "file-test-pipeline", "my-environment")
In this example a local txt file is uploaded with my pipeline to either a pcore deployment or Catalyst in line 27
. It's important to note the contents of the File
object (Directory
is the same). You access the path
of the File
by referencing it's path
attribute which is of type pathlib.Path
. As it is of this type, lots of file manipulation features are offered out of the box.
If you're debugging and don't want to upload large files every time check out Uploading files and directories by CLI below
Run time input
Using files in a run is different than when defining a pipeline. Instead of referencing a file on your local system you pass in the File
/Directory
class to the Variable
type.
API requests for File
objects
File
objectsA standard cURL request for a run looks something like this:
curl --request POST \
--url http://mystic.ai/v3/runs \
--header 'Authorization: Bearer API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"pipeline_id_or_pointer": "meta/llama-2-70B:latest",
"async_run": false,
"input_data": [
{
"type": "string",
"value": "Hellow my name is Paul, I like to"
},
{
"type": "dictionary",
"value": {
"max_new_tokens": 20,
}
}
]
}'
And with a File
:
curl --request POST \
--url http://mystic.ai/v3/runs \
--header 'Authorization: Bearer API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"pipeline_id_or_pointer": "paulh/read-my-file:v1",
"async_run": false,
"input_data": [
{
"type": "file",
"value": null,
"file_name": "my_file.txt",
"file_path": "/pipeline_files/aa/aa/aa598uohkjgesrohuijger/my_file.txt"
}
]
}'
The assumption here is that the file is already uploaded to Catalyst or your pcore deployment (with the exception of FileURLs
see below).
To upload the file via HTTP see the following cURL request that can then be used with the above:
curl --request POST \
--url http://mystic.ai/v3/pipeline_files \
--header 'Authorization: Bearer API_TOKEN' \
--form 'pfile=@/Users/paul/Desktop/my_file.txt'
The response contains both file_name
and file_path
for use in the run above. Or, it is possible to do all of the above directly in python:
from pipeline import File, Pipeline, Variable, pipe
from pipeline.cloud.pipelines import run_pipeline, upload_pipeline
@pipe
def my_func(my_file: File) -> str:
return my_file.path.read_text()
with Pipeline() as builder:
var_1 = Variable(File)
output = my_func(var_1)
builder.output(output)
my_pl = builder.get_pipeline()
remote_pipeline = upload_pipeline(my_pl, "file_test", "numpy")
# Option 1
output = run_pipeline(remote_pipeline.id, open("my_file.txt", "rb"))
# Option 2
output = run_pipeline(remote_pipeline.id, File("my_file.txt"))
All requests are automated on lines 21
and 23
for you.
API requests for Directory
objects
Directory
objects
File
andDirectory
objects are not differentiated outside of the python SDK, if you do not use theDirectory
object in the@pipe
definition but insteadFile
the zip file will not be unzipped
Directory
objects are sent by first zipping the contents of a directory. You can do this by running something like the following which will put file_1.txt
and file_2.txt
into my_zip_file.zip
:
zip my_zip_file.zip file_1.txt file_2.txt
Once this is done you can upload the zip file in the same way you'd upload any file:
curl --request POST \
--url http://mystic.ai/v3/pipeline_files \
--header 'Authorization: Bearer API_TOKEN' \
--form 'pfile=@my_zip_file.zip'
Uploading files and directories by CLI
You can upload either a File
or Directory
object by CLI. For a single file:
pipeline create file my_file.txt
# Pipeline File created with ID= file_34983498734
Or you can upload an entire directory
pipeline create file -r ./my_dir
# Pipeline File created with ID= file_34983498734
File responses
There are two ways files can be returned from a Run, a single File
output variable, or an array of File
objects. You must use the standard typing procedure in python to do this:
@pipe
def get_file() -> File:
local_file_path = "/tmp/image.jpg"
output_image = File(path=file_path, allow_out_of_context_creation=True)
return output_image
# OR for an array
from typing import List
@pipe
def get_file() -> List[File]:
local_file_path = "/tmp/image.jpg"
local_file_path_2 = "/tmp/image2.jpg"
output_image = File(path=file_path, allow_out_of_context_creation=True)
output_image_2 = File(path=file_path_2, allow_out_of_context_creation=True)
return [output_image, output_image_2]
When a File
object is returned from a pipe
a run request returns a URL to that file along with other meta data. The file specific variable object looks like the following in a response:
{
"type": "file",
"value": null,
"file": {
"name": "image-0.jpg",
"path": "run_files/31/7e/317e304d-e816-4036-86b2-7ad82b208b70/image-0.jpg",
"url": "https://storage.mystic.ai/run_files/31/7e/317e304d-e816-4036-86b2-7ad82b208b70/image-0.jpg",
"size": 22241
}
}
For a list/array of File
objects the array
variable is used as the root variable:
{
"type": "array",
"value": [
{
"type": "file",
"value": null,
"file": {
"name": "image-0.jpg",
"path": "run_files/31/7e/317e304d-e816-4036-86b2-7ad82b208b70/image-0.jpg",
"url": "https://storage.mystic.ai/run_files/31/7e/317e304d-e816-4036-86b2-7ad82b208b70/image-0.jpg",
"size": 22241
}
},
{
"type": "file",
"value": null,
"file": {
"name": "image-1.jpg",
"path": "run_files/45/2a/452a3204-beba-4ccc-949c-1af4e2fcd88c/image-1.jpg",
"url": "https://storage.mystic.ai/run_files/45/2a/452a3204-beba-4ccc-949c-1af4e2fcd88c/image-1.jpg",
"size": 16608
}
}
],
"file": null
}
FileURLs
FileURLs can be used as a static file reference for a pipeline as discussed above, or used as an ordinary input in a run. Lets take the below pipeline as our example:
@pipe
def my_func(my_file: File) -> str:
return my_file.path.read_text()
with Pipeline() as builder:
var_1 = Variable(File)
output = my_func(var_1)
builder.output(output)
my_pl = builder.get_pipeline()
remote_pipeline = upload_pipeline(my_pl, "paulh/file_test", "numpy")
Now to perform a run with this pipeline using a FileURL in python:
run_pipeline(
FileURL(
url="https://storage.mystic.ai/run_files/c8/5c/c85c1853-d1cd-40d5-9aaa-6e2cfaca48d5/image-0.jpg",
)
)
Or by curl:
curl --request POST \
--url http://mystic.ai/v3/runs \
--header 'Authorization: Bearer API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"pipeline_id_or_pointer": "paulh/file_test:v1",
"async_run": false,
"input_data": [
{
"type": "file",
"value": null,
"file_name": "image-0.jpg",
"file_path": "https://storage.mystic.ai/run_files/c8/5c/c85c1853-d1cd-40d5-9aaa-6e2cfaca48d5/image-0.jpg"
}
]
}'
Although the input to the my_func
method is a File, using a local FileURL directs the local client to not read/load remote files, but pass them as a reference.
Updated 15 days ago