Skip to content

Repositories & Workspaces

Dagster repositories and workspaces provide a mechanism for managing pipelines easier when operating at scale - e.g., across multiple teams within an organisation all sharing the same Dagster cluster.

Repositories can be defined in code as follows:

demos/dagster/repository.py
"""
This module defines the Dagster pipeline repository.
"""
from dagster import repository

from pipelines.example_pipeline import cereal_data_pipeline


@repository
def team_one():
    return {
        "pipelines": {
            "cereal_data_pipeline": lambda: cereal_data_pipeline
        }
    }

And workspaces are configured via:

demos/dagster/workspace.yaml
load_from:
  - python_file: 
      relative_path: repository.py
      executable_path: ".venv/bin/python"

At a basic level the above example shows how to associate an execution environment (i.e., a Python virtual environment), with a given team's pipeline repository. This enables teams to specify their own Python requirements - e.g., a ML engineering team may want to use a newer version of NumPy than that used by an adjacent data engineering team.