Install TensorFlow on Docker Running on Creodias WAW3-1 vGPU Virtual Machine

TensorFlow is one of the most popular libraries for Machine Learning. Using TensorFlow on vGPU VMs enables significant speed up to the machine learning workflows. In this article we demonstrate how to install TensorFlow on WAW3-1 cloud with enabled vGPU support.

This article describes the installation process of TensorFlow using Docker. If you prefer not to use Docker, please follow the instructions from article: Install TensorFlow on WAW3-1 vGPU enabled VM on Creodias.

These instructions are based on the following sources:

Installation guide from NVIDIA

Installation guide from TensorFlow

Prerequisites

A virtual machine with the Nvidia GPU created on the Creodias cloud. This machine must have a floating IP address and you must have the ability to connect to it using an SSH key stored on your PC (in this article we assume that you have Ubuntu 20.04 on your local computer). The following article describes how to create such machine: How To Create a New Linux VM With NVIDIA Virtual GPU in the OpenStack Dashboard Horizon on Creodias. If during that process you did not add a floating IP, this article describes such process: How to Add or Remove Floating IP’s to your VM on Creodias.

These instructions were tested on an Ubuntu 20.04 virtual machine with the default configuration for Creodias hosting. In particular, it means using the eouser account for the CLI commands.

Step 1: Initial operations

Connect to your virtual machine using SSH by invoking the following command (replace 64.225.129.70 with the floating IP address of your virtual machine).

ssh eouser@64.225.129.70

Update all the software on your virtual machine:

sudo apt update && sudo apt upgrade

Verify that the NVIDIA graphics card works:

nvidia-smi

The result of your command should look like this:

../_images/tensorflow-install-01_creodias1.png

Step 2: Install Docker

Install Docker using the official script and enable its service:

curl https://get.docker.com | sh   && sudo systemctl --now enable docker

Step 3: Install and verify the NVIDIA Container Toolkit (ndivia-docker2)

The NVIDIA Container Toolkit is a tool used for building and running GPU-accelerated Docker containers. More information regarding it can be found under the following link: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/k8s/containers/container-toolkit

We need it since we will be running a GPU-accelerated workflow in a container.

Add the appropriate repository and GPG key:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
          sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
          sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Now install the package nvidia-docker2:

sudo apt update && sudo apt install -y nvidia-docker2

Restart Docker:

sudo systemctl restart docker

Verify that the NVIDIA Container Toolkit works:

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

You should see the output of the nvidia-smi command (this time, however, it is running from the inside of the container):

../_images/tensorflow-install-02_creodias1.png

Step 4 Install TensorFlow with vGPU support

Pull the TensorFlow image:

sudo docker pull tensorflow/tensorflow:latest-gpu-jupyter

Run a test inside it.

sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

After the last command, outcome of a random sample TF operation is shown (ignore warnings).

Your output should include information similar to this:

../_images/tensorflow-install-03_creodias1.png

What To Do Next

Now that you have successfully installed TensorFlow on a Creodias WAW3-1 virtual machine with an enabled vGPU, you can try to use it for practical purposes. One of the ways to test it is described in the article Sample Deep Learning Workflow Using TensorFlow Running on Docker on Creodias WAW3-1 vGPU Virtual Machine. There you will see how quick a deeplearning operation can ben when a vGPU is present.