Debugging mode & logging

See exactly whats going on in your pipeline with debugging mode and log streaming

Knowing what when wrong with your Pipeline when it fails is necessary to know how to fix it. Debugging mode allows you to get verbose information about your pipelines when they execute remotely.

When debugging mode is used:

  • When triggering a remote run with run_pipeline
  • Coming soon when creating a new environment with create_environment

You can activate debugging mode by adding the following code at the top of your scripts:

from pipeline.configuration import current_configuration


current_configuration.set_debug_mode(True)

📘

Make sure you're logged in

More info on this in the Getting Started guide!

Run example

Whats shown from run_pipeline:

  • The current state of the run, along with any updates
  • Run outputs
  • Run ID
  • Run logs
  • Run errors

To test the debugging out, lets create a basic pipeline that prints some things out, and has a basic progress bar with tqdm. You can find the source code for the below example here on our Github.

import time

from tqdm import tqdm

from pipeline import Pipeline, Variable, pipe
from pipeline.cloud.compute_requirements import Accelerator
from pipeline.cloud.environments import create_environment
from pipeline.cloud.pipelines import run_pipeline, upload_pipeline
from pipeline.configuration import current_configuration

# Activate debugging mode
current_configuration.set_debug_mode(True)


@pipe
def test(i: int) -> str:
    for i in tqdm(range(i)):
        time.sleep(0.5)
    print("I'm done now, goodbye!")
    return "Done"


with Pipeline() as builder:
    input_var = Variable(
        int,
        gt=0,
        lt=20,
    )

    b = test(input_var)

    builder.output(b)

pl = builder.get_pipeline()

# Create a basic environment with tqdm installed
env_id = create_environment(
    name="basic",
    python_requirements=[
        "tqdm",
    ],
    allow_existing=True,
)

# Upload our pipeline
result = upload_pipeline(
    pl,
    "debugging-pipeline:latest",
    environment_id_or_name="basic",
    accelerators=[
        Accelerator.cpu,
    ],
)

# Run the pipeline remotely
output = run_pipeline(
    result.id,
    5,
)

When we run this we'll see the following happen:

How it works

When performing a run with debugging mode the async_run field set to True and all runs are submitted in a non-blocking way. This means that a run API request is sent to the /v3/runs endpoint with async_run=True and the API immediately responds with the run_id without waiting for the run to be completed. The python SDK then polls the state of that run until it is completed while opening up a websocket to the logs endpoint to stream all logs back.

Historical run logs

It is possible to view historical logs of runs via the CLI. This will also be coming to the Catalyst dashboard soon, but currently is only available on the CLI.

To view a runs logs simple enter:

pipeline logs run <run_id>

🚧

You can only view logs from a run in the past hour

This will soon be increased to the past 7 days!