Skip to content

Managing ML Artefacts with DVC

Data Version Control (DVC) is a command line tool that enables version control for ML artefacts (e.g., models and training datasets), using a Git repository and a filesystem (e.g., cloud object storage).

This demo is based around version control for a dataset, but it would work in exactly the same way for any ML model serialised to a file.

Demo Objectives

  • How to initialise version control for a dataset stored on AWS S3.
  • How to update a dataset.
  • How to fetch any versions of dataset.

Running the Demo

This demo is contained within a single Jupyter notebook - demos/dvc/data_and_model_versioning.ipynb. Make sure you have the necessary Python package requirements installed into a Jupyter kernel for it to run successfully.