Repositories & Workspaces¶
Dagster repositories and workspaces provide a mechanism for managing pipelines easier when operating at scale - e.g., across multiple teams within an organisation all sharing the same Dagster cluster.
Repositories can be defined in code as follows:
demos/dagster/repository.py
"""
This module defines the Dagster pipeline repository.
"""
from dagster import repository
from pipelines.example_pipeline import cereal_data_pipeline
@repository
def team_one():
return {
"pipelines": {
"cereal_data_pipeline": lambda: cereal_data_pipeline
}
}
And workspaces are configured via:
demos/dagster/workspace.yaml
load_from:
- python_file:
relative_path: repository.py
executable_path: ".venv/bin/python"
At a basic level the above example shows how to associate an execution environment (i.e., a Python virtual environment), with a given team's pipeline repository. This enables teams to specify their own Python requirements - e.g., a ML engineering team may want to use a newer version of NumPy than that used by an adjacent data engineering team.