Pipeline Orchestration with Dagster

Data engineering and ML often require some level of pipeline orchestration - e.g., for ETL or training models. Dagster is an alternative to orchestration tools such as Airflow.

Demo Objectives

  • Define a pipeline with multiple stages.
  • Test pipelines and stages.
  • Add pipelines to a Dagster repository.
  • Configure pipelines within a Dagster workspace.

Running the Demo

If you run

$ dagit

The pipelines configured in workspace.yaml will be available to run in the UI at http://localhost:3000. Alternatively, individual pipelines can be run directly from the command line - e.g.,

$ python demos/dagster/pipelines/

Alternatively, they can be executed via the Dagster CLI

$ dagster pipeline execute -f demos/dagster/pipelines/

Refer to the Dagster docs for more information - e.g. how to define schedules or triggers, etc.

Running Tests

Example tests (using PyTest) can be found in the demos/dagster/tests folder and can be executed by running

$ pytest