Deploying Python ML Models with Flask, Docker and Kubernetes

Posted on Thu 10 January 2019 in machine-learning-engineering

jpeg

A common pattern for deploying Machine Learning (ML) models into production environments - e.g. a ML model trained using the SciKit-Learn package in Python and ready to provide predictions on new data - is to expose them as RESTful API microservices hosted from within Docker containers, that are in-turn deployed to a cloud environment for handling everything required for maintaining continuous availability - e.g. fail-over, auto-scaling, load balancing and rolling service updates.

The configuration details for a continuously available cloud deployment are specific to the targeted cloud provider(s) - e.g. the deployment process and topology for Amazon Web Services is not the same as that for Microsoft Azure, which in-turn is not the same as that for Google Cloud Platform. This constitutes knowledge that needs to be acquired for every targeted cloud provider. Furthermore, it is difficult (some would say near impossible) to test entire deployment strategies locally, which makes issues such as networking hard to debug. Do not underestimate the headaches and drain-on-resources that these issues can have, especially if you are not expert in these areas, which many ML and data science practitioners, are not.

Kubernetes is a container orchestration platform that seeks to address these issues. Briefly, it provides a single mechanism for defining entire microservice-based application deployment topologies and their service-level requirements for maintaining continuous availability. It is agnostic to the targeted cloud provider, can be run on-premises and even locally on your laptop - all that’s required is a cluster of virtual machines running Kubernetes - i.e. a Kubernetes cluster.

This blog post is designed to be read in conjunction with the code in this GitHub repository, that contains the Python modules, Docker configuration files and Kubernetes instructions for demonstrating how a simple Python ML model can be turned into a production-grade RESTful model-scoring (or prediction) API service, using Docker and Kubernetes - both locally and with Google Cloud Platform (GCP). It is not a comprehensive guide to Kubernetes, Docker or ML - think of it more as a ‘ML on Kubernetes 101’ for demonstrating capability and allowing newcomers to Kubernetes (e.g. data scientists who are more focused on building models as opposed to deploying them), to get up-and-running quickly and become familiar enough with the basic concepts to be able to use the official documentation for these technologies.

We will demonstrate ML model deployment using two different strategies: first principles approaches using Docker and Kubernetes; and then deployment using the Seldon-Core framework for managing ML model pipelines on Kubernetes. The former will help to appreciate the latter, which constitutes a powerful framework for deploying and performance-monitoring many complex ML model pipelines.

Containerising a Simple ML Model-Scoring Service using Docker

We start by demonstrating how to achieve this basic competence using the simple Python ML model-scoring REST API contained in the api.py module, together with the Dockerfile and Python dependencies frozen in Pipfile.lock, all contained within the py-flask-ml-score-api directory, whose core contents are as follows,

py-flask-ml-score-api/
 | Dockerfile
 | Pipfile
 | Pipfile.lock
 | api.py 

If you’re already feeling lost then these files are discussed below, otherwise feel free to skip to ‘Building a Docker Image’.

Defining a Simple REST API Service

The api.py module uses the Flask framework for defining a web service (app) with a function (score) that executes in response to a HTTP request to a specific URL (or ‘route’), thanks to being wrapped by the app.route function. For reference, the relevant code is reproduced below,

from flask import Flask, jsonify, make_response, request

app = Flask(__name__)


@app.route('/score', methods=['POST'])
def score():
    features = request.json['X']
    return make_response(jsonify({'score': features}))


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

If running locally - e.g. by starting the web service using python run api.py - we would be able reach our function (or ‘endpoint’) at http://localhost:5000/score. This function takes data sent to it as JSON (that has been automatically de-serialised as a Python dict made available as the request variable in our function definition), and returns a response (automatically serialised as JSON). In our example function, we expect an array of features, X, that we pass to a ML model, which in our example returns those same features back to the caller - i.e. our ML model is simply the identity function, which we have chosen for demonstrative purposes. We could have loaded a pickled SciKit-Learn model and passed the data to its predict method, returning its score for the feature-data as JSON, without much additional effort - see here for an example of this in action.

Dockerfile

The Dockerfile is a YAML file that allows us to define the contents and configure the operation of our intended Docker container, when it is running. This static data, when not executed as a container, is referred to as the ‘image’. For reference, the Dockerfile is reproduced below,

FROM python:3.6-slim
WORKDIR /usr/src/app
COPY . .
RUN pip install pipenv
RUN pipenv install
EXPOSE 5000
ENTRYPOINT ["pipenv", "run", "python", "api.py"]

In our example Dockerfile, we start by using a pre-configured Docker image (python:3.6-slim) that has a lightweight version of Linux with Python already installed; we then copy the contents of the py-flask-ml-score-api local directory to a directory on the image called /usr/src/app; then use pip to install the Pipenv package for Python dependency management; then use Pipenv to install the dependencies described in Pipfile.lock into a virtual environment on the image; configure port 5000 to be exposed to the ‘outside world’ on the running container; and finally, to start our Flask RESTful web service - api.py. Building this custom image and asking the Docker daemon to run it (remember that a running image is a ‘container’), will expose our RESTful ML model-scoring service on port 5000 as if it were running on a dedicated virtual machine. Refer to the official Docker documentation for a more comprehensive discussion of the core Docker concepts used above.

Building a Docker Image

We assume that there is a Docker client and Docker daemon running locally, that the client is logged into an account on DockerHub and that there is a terminal open in the this project’s root directory (kubernetes-ml-ops). To build the image described in the Dockerfile run,

docker build --tag alexioannides/test-ml-score-api py-flask-ml-score-api

Where ‘alexioannides’ refers to the name of the DockerHub account that we will push the image to, once we have tested it. To test that the image can be used to create a Docker container that functions as we expect it to use,

docker run --name test-api -p 5000:5000 -d alexioannides/test-ml-score-api

Where we have mapped port 5000 from the Docker container - i.e. the port our ML model-scoring service is listening to - to port 5000 on our host machine (localhost). Then check that the container is listed as running using,

docker ps

And then test the exposed API endpoint using,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Where you should expect a response along the lines of,

{"score":[1,2]}

All our test model does is return the input data - i.e. it is the identity function. Only a few lines of additional code are required to modify this service to load a SciKit Learn model from disk and pass new data to it’s ‘predict’ method for generating predictions - see here for an example. Now that the container has been confirmed as operational, we can stop and remove it,

docker stop test-api
docker rm test-api

Pushing a Docker Image to DockerHub

In order for a remote Docker host or Kubernetes cluster to have access to the image we’ve created, we need to publish it to an image registry. All cloud computing providers that offer managed Docker-based services will provide private image registries, but we will use the public image registry at DockerHub, for convenience. To push our new image to DockerHub (where my account ID is ‘alexioannides’) use,

docker push alexioannides/test-ml-score-api

Where we can now see that our chosen naming convention for the image is intrinsically linked to our target image registry (you will need to insert your own account ID where necessary). Once the upload is finished, log onto DockerHub to confirm that the upload has been successful via the DockerHub UI.

Installing Minikube for Local Development and Testing

Minikube allows a single node Kubernetes cluster to run within a Virtual Machine (VM) within a local machine (i.e. on your laptop), for development purposes. On Mac OS X, the steps required to get up-and-running are as follows:

  • make sure the Homebrew package manager for OS X is installed; then,
  • install VirtualBox using, brew cask install virtualbox (you may need to approve installation via OS X System Preferences); and then,
  • install Minikube using, brew cask install minikube.

To start the test cluster run,

minikube start --memory 4096

Where we have specified the minimum amount of memory required to deploy a single Seldon ML component. Be patient - Minikube may take a while to start. To test that the cluster is operational run,

kubectl cluster-info

Where kubectl is the standard Command Line Interface (CLI) client for interacting with the Kubernetes API (which was installed as part of Minikube, but is also available separately).

Launching the Containerised ML Model-Scoring Service on Minikube

To launch our test model-scoring service on Kubernetes, start by running the container within a Kubernetes pod that is managed by a replication controller, which is the device that ensures that at least one pod running our service is operational at any given time. This is achieved with,

kubectl run test-ml-score-api \
    --image=alexioannides/test-ml-score-api:latest \ 
    --port=5000 \
    --generator=run/v1

Where the --generator=run/v1 flag triggers the construction of the replication controller to manage the pod. To check that it’s running use,

kubectl get pods

It is possible to use port forwarding to test an individual container without exposing it to the public internet. To use this, open a separate terminal and run (for example),

kubectl port-forward test-ml-score-api-szd4j 5000:5000

Where test-ml-score-api-szd4j is the precise name of the pod currently active on the cluster, as determined from the kubectl get pods command. Then from your original terminal, to repeat our test request against the same container running on Kubernetes run,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

To expose the container as a (load balanced) service to the outside world, we have to create a Kubernetes service that references it. This is achieved with the following command,

kubectl expose replicationcontroller test-ml-score-api \
    --type=LoadBalancer \
    --name test-ml-score-api-http

To check that this has worked and to find the services’s external IP address run,

minikube service list

And we can then test our new service - for example,

curl http://192.168.99.100:30888/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Note that we need to use Minikube-specific commands as Minikube does not setup a real-life load balancer (which is what would happen if we made this request on a cloud platform). To tear-down the load balancer, replication controller, pod and Minikube cluster run the following commands in sequence,

kubectl delete rc test-ml-score-api
kubectl delete service test-ml-score-api-http
minikube delete

Configuring a Multi-Node Cluster on Google Cloud Platform

In order to perform testing on a real-world Kubernetes cluster with far greater resources that those available on a laptop, the easiest way is to use a managed Kubernetes platform from a cloud provider. We will use Kubernetes Engine on Google Cloud Platform (GCP).

Getting Up-and-Running with Google Cloud Platform

Before we can use Google Cloud Platform, sign-up for an account and create a project specifically for this work. Next, make sure that the GCP SDK is installed on your local machine - e.g.,

brew cask install google-cloud-sdk

Or by downloading an installation image directly from GCP. Note, that if you haven’t installed Minikube and all of the tools that come packaged with it, then you will need to install Kubectl, which can be done using the GCP SDK,

gcloud components install kubectl

We then need to initialise the SDK,

gcloud init

Which will open a browser and guide you through the necessary authentication steps. Make sure you pick the project you created, together with a default zone and region (if this has not been set via Compute Engine -> Settings).

Initialising a Kubernetes Cluster

Firstly, within the GCP UI visit the Kubernetes Engine page to trigger the Kubernetes API to start-up. From the command line we then start a cluster using,

gcloud container clusters create k8s-test-cluster --num-nodes 3 --machine-type g1-small

And then go make a cup of coffee while you wait for the cluster to be created.

Launching the Containerised ML Model-Scoring Service on the GCP

This is largely the same as we did for running the test service locally using Minikube - run the following commands in sequence,

kubectl run test-ml-score-api \
    --image=alexioannides/test-ml-score-api:latest \
    --port=5000 \
    --generator=run/v1

kubectl expose replicationcontroller test-ml-score-api \
    --type=LoadBalancer \
    --name test-ml-score-api-http

But, to find the external IP address for the GCP cluster we will need to use,

kubectl get services

And then we can test our service on GCP - for example,

curl http://35.234.149.50:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Or, we could again use port forwarding to attach to a single pod - for example,

kubectl port-forward test-ml-score-api-nl4sc 5000:5000

And then in a separate terminal,

curl http://localhost:5000/score \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"X": [1, 2]}'

Finally, we tear-down the replication controller and load balancer,

kubectl delete replicationcontroller test-ml-score-api
kubectl delete service test-ml-score-api-http

Switching Between Kubectl Contexts

If you are running both with Minikube locally and with a cluster on GCP, then you can switch Kubectl context from one cluster to the other using, for example,

kubectl config use-context minikube

Where the list of available contexts can be found using,

kubectl config get-contexts

Using YAML Files to Define and Deploy our ML Model-Scoring Service

Up to this point we have been using Kubectl commands to define and deploy a basic version of our ML model-scoring service. This is fine for demonstrative purposes, but quickly becomes limiting as well as unmanageable. In practice, the standard way of defining entire applications is with YAML files that are posted to the Kubernetes API. The py-flask-ml-score.yaml file in the py-flask-ml-score-api directory is an example of how our ML model-scoring service can be defined in a single YAML file, which we reproduce below for reference,

apiVersion: v1
kind: Namespace
metadata:
  name: test-ml-app
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: test-ml-score-rc
  labels:
    app: test-ml-score
    env: prod    
  namespace: test-ml-app
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: test-ml-score
        env: prod
      namespace: test-ml-app
    spec:
      containers:
      - image: alexioannides/test-ml-score-api
        name: test-ml-score-api
        ports:
        - containerPort: 5000
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: test-ml-score-lb
  labels:
    app: test-ml-score
  namespace: test-ml-app
spec:
  type: LoadBalancer
  ports:
  - port: 5000
    targetPort: 5000
  selector:
    app: test-ml-score

This can now be deployed using a single command,

kubectl apply -f py-flask-ml-score-api/py-flask-ml-score.yaml

Note, that we have defined three separate Kubernetes components in this single file: a replication controller, a load-balancer service and a namespace for all of these components (and their sub-components) - using --- to delimit the definition of each separate component. To see all components deployed into this namespace use,

kubectl get all --namespace test-ml-app

And likewise set the --namespace flag when using any kubectl get command to inspect the different components of our test app. Alternatively, we can set our new namespace as the default context,

kubectl config set-context $(kubectl config current-context) --namespace=test-ml-app

And then run,

kubectl get all

Where we can switch back to the default namespace using,

kubectl config set-context $(kubectl config current-context) --namespace=default

To tear-down this application we can then use,

kubectl delete -f py-flask-ml-score-api/py-flask-ml-score.yaml

Which saves us from having to use multiple commands to delete each component individually. Refer to the official documentation for the Kubernetes API to understand the contents of this YAML file in greater depth.

Using Helm Charts to Define and Deploy our ML Model-Scoring Service

Writing YAML files for Kubernetes can get repetitive and hard to manage, especially if there is a lot of ‘copy paste’ involved when only a handful of parameters need to be changed from one deployment to the next and there is a ‘wall of YAML’ that needs to be modified. Enter Helm - a framework for creating, executing and managing Kubernetes deployment templates. What follows is a very high-level demonstration of how Helm can be used to deploy our ML model-scoring service - for a comprehensive discussion of Helm’s full capabilities (and there are a lot of them), please refer to the official documentation. Seldon-Core can also be deployed using Helm and we will cover this in more detail later on.

Installing Helm

As before, the easiest way to install Helm onto Mac OS X is to use the Homebrew package manager,

brew install kubernetes-helm

Helm relies on a dedicated deployment server, referred to as the ‘Tiller’, to be running within the same Kubernetes cluster we wish to deploy our applications to. Before we deploy Tiller we need to create a cluster-wide super-user role to assign to it (via a dedicated service account),

kubectl --namespace kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller \
    --clusterrole cluster-admin \
    --serviceaccount=kube-system:tiller

We can now deploy the Helm Tiller to your Kubernetes cluster using,

helm init --service-account tiller

Deploying the ML Model-Scoring Service

To initiate a new deployment - referred to as a ‘chart’ in Helm terminology - run,

helm create NAME-OF-YOUR-HELM-CHART

This creates a new directory - e.g. helm-ml-score-app as included with this repository - with the following high-level directory structure,

helm-ml-score-app/
 | -- charts/
 | -- templates/
 | Chart.yaml
 | values.yaml

Briefly, the charts directory contains other charts that our new chart will depend on (we will not make use of this), the templates directory contains our Helm templates, Chart.yaml contains core information for our chart (e.g. name and version information) and values.yaml contains default values to render our templates with, in the case that no values are passed from the command line - for example,

app:
  name: test-ml-score
  env: prod
  namespace: test-ml-app
  image: alexioannides/test-ml-score-api

replicas: 2
containerPort: 5000
targetPort: 5000

The next step is to delete all of the files in the templates directory (apart from NOTES.txt), and to replace them with our own. We start with namespace.yaml for declaring a namespace for our app,

apiVersion: v1
kind: Namespace
metadata:
  name: {{ .Values.app.namespace }}

Anyone familiar with HTML template frameworks (e.g. Jinja), will be familiar with the use of {{}} for defining values that will be injected into the rendered template. In this specific instance .Values.app.namespace injects the app.namespace variable, whose default value defined in values.yaml. Next we define the contents of our pod in pod.yaml,

apiVersion: v1
kind: ReplicationController
metadata:
  name: {{ .Values.app.name }}-rc
  labels:
    app: {{ .Values.app.name }}
    env: {{ .Values.app.env }}
  namespace: {{ .Values.app.namespace }}
spec:
  replicas: {{ .Values.replicas }}
  template:
    metadata:
      labels:
        app: {{ .Values.app.name }}
        env: {{ .Values.app.env }}
      namespace: {{ .Values.app.namespace }}
    spec:
      containers:
      - image: {{ .Values.app.image }}
        name: {{ .Values.app.name }}-api
        ports:
        - containerPort: {{ .Values.containerPort }}
          protocol: TCP

And the details of the load balancer service in service.yaml,

apiVersion: v1
kind: Service
metadata:
  name: {{ .Values.app.name }}-lb
  labels:
    app: {{ .Values.app.name }}
  namespace: {{ .Values.app.namespace }}
spec:
  type: LoadBalancer
  ports:
  - port: {{ .Values.containerPort }}
    targetPort: {{ .Values.targetPort }}
  selector:
    app: {{ .Values.app.name }}

What we have done, in essence, is to split-out each component of the deployment details from py-flask-ml-score.yaml into its own file and then define template variables for each parameter of the configuration that is most likely to change from one deployment to the next. To test and examine the rendered template, without having to attempt a deployment, run,

helm install helm-ml-score-app --debug --dry-run

If you are happy with the results of the ‘dry run’, then execute the deployment and generate a release from the chart using,

helm install helm-ml-score-app

This will automatically print the status of the release, together with the name that Helm has ascribed to it (e.g. ‘willing-yak’) and the contents of NOTES.txt rendered to the terminal. To list all available Helm releases and their names use,

helm list

And to the status of all their constituent components (e.g. pods, replication controllers, service, etc.) use for example,

helm status willing-yak

The ML scoring service can now be tested in exactly the same way as we have done previously (above). Once you have convinced yourself that it’s working as expected, the release can be deleted using,

helm delete willing-way

Using Seldon to Deploy a ML Model-Scoring Service on Kubernetes

Seldon’s core mission is to simplify the deployment of complex ML prediction pipelines on top of Kubernetes. In this demonstration we are going to focus on the simplest possible example - i.e. the simple ML model-scoring API we have already been using.

Installing Source-to-Image

Seldon-core depends heavily on Source-to-Image - a tool for automating the process of building code artifacts from source and injecting them into docker images. For Seldon, the artifacts are the different pieces of an ML pipeline. We use Homebrew to install Source-to-Image on Mac OS X,

brew install source-to-image

To confirm that it has been installed correctly run,

s2i version

Install the Seldon-Core Python Package

We’re using Pipenv to manage the Python dependencies for this project. To install seldon-core into a virtual environment managed by Pipenv for use only by this project use,

pipenv install --python 3.6 seldon-core

Note, that we are specifying Python 3.6 explicitly, as at the time of writing Seldon-Core does not work with Python 3.7. If you don’t wish to use pipenv you can install seldon-core using pip into whatever environment is most convenient and then drop the use of pipenv run when testing with Seldon-Core (below).

Building an ML Component for Seldon

To deploy a ML component using Seldon, we need to create Seldon-compatible Docker images. We start by following these guidelines for defining a Python class that wraps an ML model targeted for deployment with Seldon. This is contained within the seldon-ml-score-component directory. In essence, this replaces the need to define RESTful APIs, which we did in the above examples using Flask. Firstly, ensure that the docker daemon is running locally and then run,

s2i build seldon-ml-score-component \
    seldonio/seldon-core-s2i-python3:0.4 \
    alexioannides/seldon-ml-score-component

Launch the container using Docker locally,

docker run --name seldon-s2i-test -p 5000:5000 -d alexioannides/seldon-ml-score-component

And then test the resulting Seldon component using the dedicated testing application from the seldon-core Python package,

pipenv run seldon-core-tester seldon-ml-score-component/contract.json localhost 5000 -p

If it works as expected (i.e. without throwing any errors), push it to an image registry - for example,

docker push alexioannides/seldon-ml-score-component

Configuring Kubernetes for Seldon-Core

Before we can proceed any further, we will need to grant a cluster-wide super-user role to our user, using Role-Based Access Control (RBAC). On GCP this is achieved with,

kubectl create clusterrolebinding kube-system-cluster-admin \
    --clusterrole cluster-admin \
    --serviceaccount kube-system:default \
    --user $(gcloud info --format="value(config.account)")

And for Minikube with,

kubectl create clusterrolebinding kube-system-cluster-admin \
    --clusterrole cluster-admin \
    --serviceaccount kube-system:default

Next, we create a Kubernetes namespace for all Seldon components that we will deploy,

kubectl create namespace seldon

And we then set it as a default for the current kubectl context,

kubectl config set-context $(kubectl config current-context) --namespace=seldon

So that whenever we run a kubectl command it will now explicitly reference the seldon namespace.

Deploying a ML Component with Seldon-Core via Helm Charts

We now move on to deploying our Seldon compatible ML component and creating a service from it. To achieve this, we will start by demonstrating how to deploy Seldon-Core using Helm charts. To deploy Seldon-Core using Helm and Helm charts, we start by deploying the Seldon Custom Resource Definitions (CRD), directly from the Seldon chart repository hosted at https://storage.googleapis.com/seldon-charts,

helm install seldon-core-crd \
    --name seldon-core-crd \
    --repo https://storage.googleapis.com/seldon-charts \
    --set usage_metrics.enabled=true

We then do the same for Seldon-Core,

helm install seldon-core \
    --name seldon-core \
    --repo https://storage.googleapis.com/seldon-charts \
    --set apife.enabled=false \
    --set rbac.enabled=true \
    --set ambassador.enabled=true \
    --set single_namespace=true \
    --set namespace=seldon

If we now run helm list --namespace seldon we should see that Seldon-Core has been deployed and is waiting for Seldon ML components to be deployed alongside it. To deploy our Seldon-compatible ML model score service we configure and deploy another Seldon chart as follows,

helm install seldon-single-model \
    --name test-seldon-ml-score-api \
    --repo https://storage.googleapis.com/seldon-charts \
    --set model.image.name=alexioannides/seldon-ml-score-component

Testing the API

We will test our Seldon-Core based deployment with the same approaches that we have been using above.

Via Port Forwarding

We follow the same general approach as we did for our first-principles Kubernetes deployments above, but using embedded bash commands to find the Ambassador API gateway component we need to target for port-forwarding. Regardless of whether or not we working with GCP or Minikube use,

kubectl port-forward \
    $(kubectl get pods -n seldon -l service=ambassador -o jsonpath='{.items[0].metadata.name}') \
    -n seldon 8003:8080

We can then test the model-scoring API deployed via Seldon-Core, using the API defined by Seldon-Core,

curl http://localhost:8003/seldon/test-seldon-ml-score-api/api/v0.1/predictions \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'

Via the Public Internet

Firstly, we need to expose the service to the public internet. If working on GCP we can expose the service via the ambassador API gateway component deployed as part of Seldon-Core,

kubectl expose deployment seldon-core-ambassador \
    --type=LoadBalancer \
    --name=seldon-core-ambassador-external

And then to retrieve the external IP for GCP use,

kubectl get services

And for Minikube use,

minikube service list

And then to test the pubic endpoint use, for example,

curl http://192.168.99.111:32074/seldon/test-seldon-ml-score-api/api/v0.1/predictions \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}'

Tear Down

To delete a Helm deployment from the Kubernetes cluster, first retrieve a list of all the releases in the Seldon namespace,

helm list --namespace seldon

And then remove them using,

helm delete seldon-core --purge && \
helm delete seldon-core-crd --purge && \
helm delete test-seldon-ml-score-api --purge

If there is a GCP cluster that needs to be killed run,

gcloud container clusters delete k8s-test-cluster

And likewise if working with Minikube,

minikube stop
minikube delete