Installing KubeFlow on your Laptop

Kubeflow is machine learning toolkit that runs on top of kubernetes. It allows data scientists and machine learning engineers to train and deploy their models. You can install it on your laptop and start to experiment some functionalities. Here is how to.

Installing Kubeflow on your Laptop

This was done on OS X but it should work similarly on Linux and probably Windows also. At least only the first part (installing the VM) might differ.

1. Installing multipass (VM)

For OS X, go to https://multipass.run/ and download the pkg file.

If you’re on Ubuntu:

> snap install multipass --classic
multipass 1.0.2 from Canonical✓ installed
> multipass launch --name kubeflow-vm --mem 4G --disk 30G

You can create a more powerful VM by changing some parameters:

> multipass launch --name kubeflow-vm -c 3 -m 6G -d 50G

It will download Ubuntu 18.04 and install it with the VM.

Then you have to install microk8s inside your VM. But Kubeflow is compatible with k8s until the version 1.15 only. You have you to force this specific version in order to install kubeflow after.

2. Installing Microk8s

> multipass exec kubeflow-vm -- sudo snap install microk8s --classic --channel=1.15/stable
microk8s (1.15/stable) v1.15.7 from Canonical✓ installed

Now we are going to open all ports to have access to the VM:

> multipass exec kubeflow-vm -- sudo iptables -P FORWARD ACCEPT

For the next step, we need to be logged into the VM itself:

> multipass shell kubeflow-vm

And if you want to avoid adding the command sudo for everything, you can add the current user in the VM to the sudo group:

> sudo usermod -a -G microk8s ubuntu

Then logout/login into the VM to make it active.

3. Installing Kubeflow

3.1. Preparing

Before starting, it is necessary to enable few microk8s plugins.

 > microk8s.enable dns storage dashboard
Enabling DNS
Applying manifest
serviceaccount/coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
clusterrole.rbac.authorization.k8s.io/coredns created
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
DNS is enabled
Enabling defaultnstall microk8s inside your VM. But Kubeflow is compatible with k8s until the version 1.15 only. You have you to force this specific version in order to install kubeflow after.

2. Installing Microk8s

> multipass exec kubeflow-vm -- sudo snap install microk8s --classic --channel=1.15/stable
microk8s (1.15/stable) v1.15.7 from Canonical✓ installed

Now we are going to open all ports to have access to the VM:

> multipass exec kubeflow-vm -- sudo iptables -P FORWARD ACCEPT

For the next step, we need to be logged into the VM itself:

> multipass shell kubeflow-vm

And if you want to avoid adding the command sudo for everything, you can add the current user in the VM to the sudo group:

> sudo usermod -a -G microk8s ubuntu

Then logout/login into the VM to make it active.

3. Installing Kubeflow

3.1. Preparing

Before starting, it is necessary to enable few microk8s plugins.

 > microk8s.enable dns storage dashboard
Enabling DNS
Applying manifest
serviceaccount/coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created
clusterrole.rbac.authorization.k8s.io/coredns created
clusterrolebinding.rbac.authorization.k8s.io/coredns created
Restarting kubelet
DNS is enabled
Enabling default storage class
deployment.extensions/hostpath-provisioner created
storageclass.storage.k8s.io/microk8s-hostpath created
serviceaccount/microk8s-hostpath created
clusterrole.rbac.authorization.k8s.io/microk8s-hostpath created
clusterrolebinding.rbac.authorization.k8s.io/microk8s-hostpath created
Storage will be available soon
Applying manifest
secret/kubernetes-dashboard-certs created
serviceaccount/kubernetes-dashboard created
role.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard created
service/monitoring-grafana created
service/monitoring-influxdb created
service/heapster created storage class
deployment.extensions/hostpath-provisioner created
storageclass.storage.k8s.io/microk8s-hostpath created
serviceaccount/microk8s-hostpath created
clusterrole.rbac.authorization.k8s.io/microk8s-hostpath created
clusterrolebinding.rbac.authorization.k8s.io/microk8s-hostpath created
Storage will be available soon
Applying manifest
secret/kubernetes-dashboard-certs created
serviceaccount/kubernetes-dashboard created
role.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard created
service/monitoring-grafana created
service/monitoring-influxdb created
service/heapster created
deployment.extensions/monitoring-influxdb-grafana-v4 created
serviceaccount/heapster created
clusterrolebinding.rbac.authorization.k8s.io/heapster created
configmap/heapster-config created
configmap/eventer-config created
deployment.extensions/heapster-v1.5.2 created

If RBAC is not enabled access the dashboard using the default token retrieved with:

token=$(microk8s.kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)
microk8s.kubectl -n kube-system describe secret $token

In an RBAC enabled setup (microk8s.enable RBAC) you need to create a user with restricted
permissions as shown in https://github.com/kubernetes/dashboard/wiki/Creating-sample-user

If it fails with Failed to enable storage just run it a second time. It might be because of the dns plugin not yet activated.

If you have a Nvidia GPU, you can also enable the plugin on microk8s:

> microk8s.enable gpu
Enabling NVIDIA GPU
NVIDIA kernel module detected
Enabling DNS
Applying manifest
serviceaccount/coredns unchanged
configmap/coredns unchanged
deployment.apps/coredns unchanged
service/kube-dns unchanged
clusterrole.rbac.authorization.k8s.io/coredns unchanged
clusterrolebinding.rbac.authorization.k8s.io/coredns unchanged
Restarting kubelet
DNS is enabled
Applying manifest
daemonset.extensions/nvidia-device-plugin-daemonset created
NVIDIA is enabled

3.2. Connecting to the dashboard

You need to get the access token first:

> token=$(microk8s.kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)
> microk8s.kubectl -n kube-system describe secret $token

Figure 1

Then in the VM you have to redirect the ports. You can do so by using this command (preferably in a new terminal):

> microk8s.kubectl port-forward -n kube-system service/kubernetes-dashboard 10443:443 --address 0.0.0.0

--address 0.0.0.0 means you’re opening to any external IP. Then in your browser connect to:

https://ip:10443

You can check the IP by running:

> multipass list

Figure 2

3.3. Creating an alias to use kubectl

kubectl (kube-cuddle) is a tool that allows you to control your kubernetes cluster. By using microk8s, you have to type microk8s.kubectl. It’s a bit long so you can create an alias to make it shorter and similar to the “real” kubectl.

 > sudo snap alias microk8s.kubectl kubectl
Added:
  - microk8s.kubectl as kubectl

And add the configuration of your kubernetes:

> kubectl config view --raw > $HOME/.kube/config

3.4. Installing kfctl

We set our current OS (in the VM it is Ubuntu):

> export OPSYS=linux

Then download and extract kfctl:

> wget https://github.com/kubeflow/kfctl/releases/download/v1.0/kfctl_v1.0-0-g94c35cf_linux.tar.gz
> tar -zvxf kfctl_v1.0-0-g94c35cf_linux.tar.gz
./kfctl

# curl -s https://api.github.com/repos/kubeflow/kubeflow/releases/latest | grep browser_download | grep $OPSYS | cut -d '"' -f 4 | xargs curl -O -L &&  tar -zvxf kfctl_*_${OPSYS}.tar.gz

> export PATH=$PATH:$PWD

3.5. Deploying Kubeflow

# Set KF_NAME to the name of your Kubeflow deployment. You also use this
# value as directory name when creating your configuration directory.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
export KF_NAME=<your choice of name for the Kubeflow deployment>

# Set the path to the base directory where you want to store one or more
# Kubeflow deployments. For example, /opt/.
# Then set the Kubeflow application directory for this deployment.
export BASE_DIR=<path to a base directory>
export KF_DIR=${BASE_DIR}/${KF_NAME}

# Set the URI of the configuration file to use when deploying Kubeflow.
# For example:
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.0.yaml"

# Create your Kubeflow configurations:
mkdir -p ${KF_DIR}
cd ${KF_DIR}
kfctl build -V -f ${CONFIG_URI}

INFO[0000] Downloading https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.0.yaml to /tmp/073201262/tmp.yaml  filename="utils/k8utils.go:172"
INFO[0000] Downloading https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.0.yaml to /tmp/251815157/tmp_app.yaml  filename="loaders/loaders.go:71"
INFO[0000] App directory /home/yann/kubeflow-local already exists  filename="coordinator/coordinator.go:270"
INFO[0000] Writing KfDef to kfctl_k8s_istio.v1.0.0.yaml   filename="coordinator/coordinator.go:273"
INFO[0000] No name specified in KfDef.Metadata.Name; defaulting to kubeflow-local based on location of config file: /home/yann/kubeflow-local/kfctl_k8s_istio.v1.0.0.yaml.  filename="coordinator/coordinator.go:202"
INFO[0000]
****************************************************************
Notice anonymous usage reporting enabled using spartakus
To disable it
If you have already deployed it run the following commands:
  cd $(pwd)
  kubectl -n ${K8S_NAMESPACE} delete deploy -l app=spartakus

For more info: https://www.kubeflow.org/docs/other-guides/usage-reporting/
****************************************************************
  filename="coordinator/coordinator.go:120"
INFO[0000] Creating directory /home/yann/kubeflow-local/.cache  filename="kfconfig/types.go:445"
INFO[0000] Fetching https://github.com/kubeflow/manifests/archive/v1.0-branch.tar.gz to /home/yann/kubeflow-local/.cache/manifests  filename="kfconfig/types.go:493"
INFO[0003] updating localPath to /home/yann/kubeflow-local/.cache/manifests/manifests-1.0-branch  filename="kfconfig/types.go:540"
...
INFO[0003] Processing application: profiles              filename="kustomize/kustomize.go:397"
INFO[0003] Processing application: seldon-core-operator  filename="kustomize/kustomize.go:397"

The last command is configuring the yaml file.

Now you can run:

> export CONFIG_FILE=${KF_DIR}/kfctl_k8s_istio.1.0.0.yaml
> kfctl apply -V -f ${CONFIG_FILE}
INFO[0000] No name specified in KfDef.Metadata.Name; defaulting to kubeflow-local based on location of config file: /home/yann/kubeflow-local/kfctl_k8s_istio.1.0.0.yaml.  filename="coordinator/coordinator.go:202"
INFO[0000]
****************************************************************
Notice anonymous usage reporting enabled using spartakus
To disable it
If you have already deployed it run the following commands:
  cd $(pwd)
  kubectl -n ${K8S_NAMESPACE} delete deploy -l app=spartakus

For more info: https://www.kubeflow.org/docs/other-guides/usage-reporting/
****************************************************************
  filename="coordinator/coordinator.go:120"
...
service/profiles-kfam created
deployment.apps/profiles-deployment created
application.app.k8s.io/profiles created
virtualservice.networking.istio.io/kfam created
customresourcedefinition.apiextensions.k8s.io/seldondeployments.machinelearning.seldon.io created
serviceaccount/seldon-manager created
clusterrole.rbac.authorization.k8s.io/seldon-operator-manager-role created
clusterrolebinding.rbac.authorization.k8s.io/seldon-operator-manager-rolebinding created
configmap/seldon-config created
secret/seldon-operator-webhook-server-secret created
service/seldon-operator-controller-manager-service created
service/webhook-server-service created
statefulset.apps/seldon-operator-controller-manager created
application.app.k8s.io/seldon-core-operator created
INFO[0006] Applied the configuration Successfully!       filename="cmd/apply.go:72"

It will take some time…

3.6. Monitoring the deployment

In order to check if everything is deployed properly, you can use the command:

> kubectl -n kubeflow get po

NAME                                                           READY   STATUS              RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0                     1/1     Running             0          7m50s
admission-webhook-deployment-b7d89f4c7-k4sz2                   0/1     ContainerCreating   0          3m50s
application-controller-stateful-set-0                          1/1     Running             0          7m53s
argo-ui-6754c76f9b-ltxpq                                       0/1     ContainerCreating   0          7m51s
centraldashboard-5578cc9569-4zk4j                              1/1     Running             0          7m51s
jupyter-web-app-deployment-6b7d9c5fd6-qgp8c                    0/1     ContainerCreating   0          7m50s
katib-controller-789d76d446-dqg4h                              0/1     ContainerCreating   0          7m46s
katib-db-75975d8dbd-5wpq8                                      0/1     ContainerCreating   0          7m45s
katib-manager-59bb84948f-b4pls                                 0/1     ContainerCreating   0          7m45s
katib-ui-dd75bd446-wjtsp                                       0/1     ContainerCreating   0          7m44s
kfserving-controller-manager-0                                 0/2     ContainerCreating   0          7m48s
metadata-db-7584d44b65-lqk2h                                   0/1     ContainerCreating   0          7m50s
metadata-deployment-cd8f7d58f-5sc2q                            0/1     ContainerCreating   0          7m50s
metadata-envoy-deployment-bff4f8b9-4qt9d                       0/1     ContainerCreating   0          7m50s
metadata-grpc-deployment-7cc5d84854-867v2                      0/1     ContainerCreating   0          7m49s
metadata-ui-7c978889b5-vfzrc                                   0/1     ContainerCreating   0          7m49s
minio-764648495-shcmx                                          0/1     ContainerCreating   0          7m43s
ml-pipeline-588b64fff-97ztp                                    0/1     ContainerCreating   0          7m44s
ml-pipeline-ml-pipeline-visualizationserver-6c7c97869d-dgxnt   0/1     ContainerCreating   0          7m41s
ml-pipeline-persistenceagent-79ff896578-68747                  0/1     ContainerCreating   0          7m42s
ml-pipeline-scheduledworkflow-7d89bb6db5-l5mdk                 0/1     ContainerCreating   0          7m41s
ml-pipeline-ui-6656886579-6r9qw                                0/1     ContainerCreating   0          7m42s
ml-pipeline-viewer-controller-deployment-546bd5f545-t2sdd      0/1     ContainerCreating   0          7m42s
mysql-6c9cb88c4d-9vnhx                                         0/1     ContainerCreating   0          7m43s
notebook-controller-deployment-6d594ddd6b-q2nqp                0/1     ContainerCreating   0          7m49s
profiles-deployment-67799585bd-lr5wk                           0/2     ContainerCreating   0          7m41s
pytorch-operator-fdfd7985-82x26                                0/1     ContainerCreating   0          7m49s
seldon-operator-controller-manager-0                           0/1     ContainerCreating   0          7m47s
spartakus-volunteer-5888bc655-b2pbt                            0/1     ContainerCreating   0          7m46s
tensorboard-5f685f9d79-7wnr7                                   0/1     ContainerCreating   0          7m46s
tf-job-operator-5dff84b966-8cb2b                               0/1     ContainerCreating   0          7m46s
workflow-controller-85c665bcb9-j87dc                           1/1     Running             0          7m51s

It takes a while before everything is Running

4. Connecting to Kubeflow

In order to connect to Kubeflow, you need the IP of you VM. First leave the shell in the VM by typing exit. Then:

> multipass list

Then, in your browser:

http://ip:31380

Figure 2

5. Setting Up kubectl (“kube-cuttle”) Context

5.1. Install kubectl

# On OS X
> curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl"
# On Linux
> curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl

> chmod +x ./kubectl
> sudo mv ./kubectl /usr/local/bin/kubectl
> kubectl version --client

5.2. Create the config file

Connect to your VM:

> multipass shell kubeflow-vm

Then copy the content of the generated config file:

> cat ~/.kube/config

Figure 3

Then paste the content of this file in your laptop actual environment (outside the VM):

> mkdir ~/.kube/
> vim .kube/config

But replace the server address by the VM ip:

https://127.0.0.1:16443
# Becomes
https://192.168.64.3:16443
# In my case.

As a reminder, you can see your VM ip by typing in the terminal:

> multipass list

And check if it works correctly:

> kubectl cluster-info

Figure 4

You can now control your kubernetes single node cluster from your regular environment (without connecting to the VM).

comments powered by Disqus