Docker on Ubuntu with NVIDIA GPU CUDA

Docker on Ubuntu with NVIDIA GPU CUDA

April 12, 2021
post, howto, technical, docker, cuda, gpu, reproducibility

A quick guide to setting up Docker to leverage NVIDIA CUDA GPUs on Ubuntu Linux. (Tested with Ubuntu 20.04 and CUDA 11.2).

Intro #

Containerization is in vogue for a variety of reasons, but I was surprised by how helpful it can be even for development, rather than code deployment/sharing. In particular, it fixes one of the last pain points of working with and sharing neural nets: inconsistent versions of NVIDIA’s CUDA GPU1 libraries.

It’s nice to be able to use slightly different versions of CUDA without having to reinstall it, and you’re not forced to use older known-compatible versions of NVIDIA’s drivers. I’ve been able to use Steam without booting into Windows or screwing up my research code. The fact that you can also use different versions of Torch/TensorFlow, and that the resulting container should be runnable by others with minimal fiddling is almost an afterthought2, but that certainly helps as well.

Getting it to work was surprisingly easy– you can just follow the instructions. I’m not exactly dispensing deep knowledge here, but perhaps having it all in one place might be useful to others; plus I now have a post to refer people to if they’re trying to reproduce something I’ve done.

Install Docker #

I installed Docker following the prescribed steps for Ubuntu 20.04.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22

# Install Docker
# ============================================================================
sudo apt update
sudo apt install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

# Add Docker GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# This is a pretty neat trick for forming URLs
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

# Update and install
sudo apt update && sudo apt install docker-ce docker-ce-cli containerd.io

Assuming everything went smoothly, you should be able to run the hello-world container:

1
sudo docker run hello-world

and see something like:

me@home: ~ sudo docker run hello-world
Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/

Post-installation #

In principle you should be able to skip ahead to the nvidia-docker part, but the post-install notes describe some things you can do to make your life easier.

Start Docker on boot #

You can get Docker to start on boot via:

1
2
sudo systemctl enable docker.service
sudo systemctl enable containerd.service

Avoiding the need for sudo #

I found entering my password, scanning my eyeballs, and providing a blood sample3 every single time I wanted to do something Docker-related to be tedious, so I manage Docker as a non-root user.

Note that there are some security implications, but I don’t see how this necessarily makes things substantially more dangerous than running regular (rooted) Docker.

1
2
3
4
5
6
7
8
# Create `docker` group
sudo groupadd docker

# Add user to group
sudo usermod -aG docker $USER

# Activate changes (should begin happening automatically after reboot)
newgrp docker

Check that it works as expected; you should get the same cheery message as before:

me@home: ~ docker run hello-world
Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/

Installing nvidia-docker #

NVIDIA provides installation instructions here, which I reproduce below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Add GPG key
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -

# Add remote repositories
# Note -- these seem to refer to Ubuntu 18.04 at the present,
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L \
    https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Update and install
sudo apt update && sudo apt install -y nvidia-docker2

# Restart Docker daemon
sudo systemctl restart docker

Now we run one of the base CUDA images and execute the nvidia-smi command to see if it picks up our GPUs.

1
2
# Test that a CUDA container works
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

You should see something like the following:

me@home: ~ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX TIT... Off | 00000000:01:00.0 On | N/A | | 22% 38C P0 102W / 250W | 4026MiB / 12209MiB | 13% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

Note the command line switch: --gpus all, as without it you’re liable to get something like:

me@home: ~ docker run --rm nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown

References #


  1. NVIDIA truly recognizes the importance of allcaps for effective communication; I just know they’re as excited as I am by some of the research being done in this area↩︎

  2. We’ve come a long way since the days of Theano and it’s combination of fragility and combativeness. Some of those error messages…it was like debugging a Hungarian translation of Perl. I’ll miss it. ↩︎

  3. I got a really great deal on eBay from the Theranos liquidation. ↩︎

Generated Wed, 05 May 2021 23:10:04 MDT