Docker Swarm and GPUs

Sourabh Burnwal
4 min readMar 19, 2022

#MLiP-1
Hey Everyone!

So, you have a unique use-case. You want to deploy an application that may be on cloud or on-premise, and you can’t use anything pre-built simply because it doesn’t give you much control over how you want to operate it. Let’s assume, for instance, your application needs a GPU to run and you also need to pass shm size for some reason. This article covers two things:

1. How to enable GPU access in a docker swarm service? (since you can’t use --gpus like in docker run)

2. How to pass shm size argument while creating a docker swarm service?

What is Docker Swarm?

When you want to scale an application either on cloud or on-premise, one of the ways to do that is creating a docker image of your application, and orchestrating it using an orchestration tool. You must have heard about Kubernetes, right? Well, docker swarm is another. Docker swarm is a container orchestration tool, with master-worker architecture.

Fig 1: A service on a swarm with 4 nodes (source)

As you can see, this swarm aka cluster has 4 nodes. The master node is called swarm manager and others are workers. One can interact with the cluster through the manager and run the commands.

I recommend going through the documentation for further info if you are new to this. Let’s quickly get on with our main goal for this article. We will first pull a Triton inference server docker image from the Nvidia NGC catalog and a machine learning model from their Github repo. Later, we will deploy the server on a docker swarm. I currently have one machine, so won’t be able to show you a multi-node setup, but adding a new node is just a one-line command.

Configuration to setup GPU for Docker Swarm

[The steps below are for Linux-based OS (Ubuntu 20.04 in my case) and Nvidia GPUs (gtx 1660ti, driver version 510.47)]

$ nvidia-smi -a | grep UUID
Fig 2: Output of GPU uuid

Update the /etc/docker/daemon.json file to this and include your GPU uuid:

{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime":"nvidia",
"node-generic-resources": [
"NVIDIA-GPU=GPU-4b0429cc"
]
}

Next, uncomment the second line in /etc/nvidia-container-runtime/config.toml, so your file should look like this:

disable-require = false
swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"

Restart the docker service:

$ sudo systemctl restart docker.service

Pull the Triton Inference server image from Nvidia NGC Catalog and create a model directory for the server. (Triton Inference server is just an example of an application that can be orchestrated on docker swarm/k8s which uses GPU.)

Create a docker swarm:

$ sudo docker swarm init --advertise-addr <ip-of-any-network-interface>

This command will initialize a docker swarm cluster with that machine as swarm manager. The output will consist of a command which you can use to add workers to this swarm.

Note: you can use the ifconfig command to find out the IP for different network interfaces.

Fig 3: Output of swarm init

Start a service with GPU access and custom shm size

For GPU access, the argument is --generic-resource "NVIDIA-GPU=0" , and for shm size, the argument is --mount type=tmpfs,target=/dev/shm .

So, in the case of running a service of triton inference server, it would look like this:

Fig 4: Output of docker service create

As it can be seen in the output, we have successfully created a service that has GPU access and can use an unrestricted amount of shared memory. Now, you can scale it to some number and the cluster will scale up.

That’s it for this article. Feel free to drop a comment if I missed anything or you face any issues in any step.

References:

--

--

Sourabh Burnwal

Working in MLOps and Infrastructure. Helping organizations deploy and scale ML models in production. LinkedIn: /in/burnwalsourabh