GPU-Capable OLIVE Docker Installation and Setup

OLIVE Folder Structure Overview

The initial release of GPU-enabled OLIVE follows the structure described below. Note that this is similar overall to previous deliveries but differs significantly from both native linux-based packages and docker-based packages in the past regarding how the server is started and managed. The important differences can be seen in the oliveDocker/ directory. If the OLIVE package you were provided does not match this formatting, please refer to the appropriate setup guide for your delivery.

The OLIVE delivery typically comes in a single archive:

olive5.5.0-DDMonthYYYY.tar.gz

That unpacks into a structure resembling:

- olive5.5.0
    - api - Java and Python example client API implementation code and CLI client utilities
        - java
        - python
    - docs/ - Directory containing the OLIVE documentation 
         - index.html - Open this in a web browser to view
    - oliveDocker/
        - olive+runtime-5.5.0-Ubuntu-20.04-x86_64.tar.gz - OLIVE Core Software and Runtime Bundle
        - Dockerfile
        - docker-compose.yml
    - OliveGUI/ - The OLIVE Nightingale GUI (not included in all deliveries)
        - bin/
            - Nightingale
    - oliveAppData/
        - plugins/
            - sad-dnn-v7.0.0 (example) – Speech Activity Detection plugin 
            - Actual plugins included will depend on the customer, mission, and delivery
    - oliveAppDataGPU/
        - plugins/
            - asr-end2end-v1.0.0 (example) - Speech Recognition (end-to-end) plugin configured to run on GPU
            - Actual plugins included will depend on the customer, mission, and delivery

The actual plugins included will vary from customer to customer, and may even vary between use case configurations within a customer integration.

Install and Start Docker

Before you can get started installing and running OLIVE, you'll need to make sure you have fully installed and configured Docker. The proper installation steps vary depending on your host OS, so please refer to the appropriate official Docker installation instructions:

Docker for Ubuntu is especially important to follow the official steps closely, as there are additional important post-installation steps to perform to make sure docker runs smoothly on your system.

Note that if installing into an Ubuntu instance running on WSL2, systemctl is not used on such systems. This means that some of the commands provided in the Docker for Ubuntu instructions above may not succeed as written; notably for starting and stopping the Docker service. Please use service for these commands instead:

$ sudo service docker start

In addition, if using Docker for Ubuntu, the Nvidia drivers must be installed separately. This doesn't seem to be necessary if using Docker Desktop. Instructions for this installation can be found here, from Nvidia.

Before moving on, be sure that the docker service has been started.

Build and Launch OLIVE Docker

The core of the OLIVE software is contained within oliveDocker/olive+runtime-5.5.0-Ubuntu-20.04-x86_64.tar.gz archive. Each delivery includes a docker-compose.yml and Dockerfile that informs Docker how to build and launch this into a running OLIVE server.

The process for building the OLIVE image is:

$ cd olive5.5.0/oliveDocker/
$ docker compose build

This only needs to be performed once per machine.

Once this is complete, launching the server can be done with the following command, from the same location:

$ docker compose up

This call will start the OLIVE image and launch multiple OLIVE servers according to the configuration contained in docker-compose.yml; the default configuration is described below.

Default OLIVE Server Configuration

As delivered, performing the steps above will launch two OLIVE servers. The first will only perform CPU processing, and has access to the plugins contained in:

`olive5.5.0/oliveAppData/plugins/`

This server is analogous to previous Docker-based OLIVE deliveries. It listens on the same ports as before, and is configured the same way, such that the number of workers (concurrent server jobs) is automatically limited based on the number of threads available on the CPU hardware.

The second server has access to the GPU and can use this if the plugin domains have been properly configured, and assumes there is only one GPU available, device 0 according to nvidia-smi. It can run with the plugins contained in:

`olive5.5.0/oliveAppDataGPU/plugins/`

It listens on different ports (see next section), and is configured to have a single worker to stabilize GPU memory usage. This is a global setting for an OLIVE server, so separating the CPU and GPU plugins into separate OLIVE servers allows us to run the CPU server unthrottled, allowing a number of parallel jobs based on the number of cores or threads available on the host hardware, without being limited by the setting of the GPU server that is relying on a single (very fast) worker thread. By default, this initial delivery only includes a single plugin with domains configured and placed such that it will run on the GPU:

asr-end2end-v1.0.0

See below for instructions on configuring others to run on the GPU, and a list of released GPU-capable plugins.

Interacting with OLIVE GPU Server

Once an OLIVE 5.5.0 GPU-capable server is running, tasking it from a client is largely identical to before; whether from the Java or python client APIs or one of our GUIs. The only salient difference is that the multi-server approach means that each server is listening on a different set of ports. By default, those ports are:

Server	Server Request Port	Server Status Port
CPU	5588	5589
GPU	6588	6589

These are the defaults, which can be configured in the docker-compose.yml file as shown below. Be sure to route requests to the appropriate server. If using most of SRI's client utilities or UIs, the default ports used (5588/5589) will contact the CPU-only server. Sending requests to the GPU server instead will require a flag (i.e. --port 6588 if using the Java and Python client utilities) or other configuration changes.

Docker Image Configuration

Configuration of the number and setup of OLIVE servers is controlled by settings within the included docker-compose.yml file. The entire file is displayed below, with excerpts showing which section is configuring the CPU server and the GPU server. This is followed by a section for each of the most important and likely to be altered variables. Note that none of this needs to be touched if running the default configuration, on the assumed default hardware (single-GPU machine).

docker-compose.yml

Whole File

version: "3.9"

x-common-variables: &common-variables
  FONTCONFIG_PATH: /etc/fonts/
  FONTCONFIG_FILE: /etc/fonts/fonts.conf
  XDG_CACHE_HOME: /tmp
  MPLCONFIGDIR: /tmp

services:
  runtime-cpu:
    build:
      context: .
      args:
        gpu_or_cpu: gpu
    user: 1000:1000
    command:
      - "--verbose"
      - "--debug"
    ports:
      - "5588:5588"
      - "5589:5589"
      - "5590:5590"
      - "5591:5591"
    volumes:
      - ../oliveAppData:/home/olive/olive
    environment: *common-variables
    shm_size: '4gb'
  runtime-gpu:
    build:
      context: .
      args:
        gpu_or_cpu: gpu
    user: 1000:1000
    command:
      - "--verbose"
      - "--debug"
      - "--workers=1"
    ports:
      - "6588:5588"
      - "6589:5589"
      - "6590:5590"
      - "6591:5591"
    volumes:
      - ../oliveAppDataGPU:/home/olive/olive
    environment:
      <<: *common-variables
      CUDA_VISIBLE_DEVICES: '0'
    shm_size: '4gb'
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              # count: 1
              # Can use 'count' or explicity list device_ids
              # How does device_ids and CUDA_VISIBLE_DEVICES match up?
              # Do the indexes inside the container always start at 0?
              device_ids: ["0"]
              capabilities: [gpu]

CPU Server Section

...

services:
  runtime-cpu:
    build:
      context: .
      args:
        gpu_or_cpu: gpu
    user: 1000:1000
    command:
      - "--verbose"
      - "--debug"
    ports:
      - "5588:5588"
      - "5589:5589"
      - "5590:5590"
      - "5591:5591"
    volumes:
      - ../oliveAppData:/home/olive/olive
    environment: *common-variables
    shm_size: '4gb'

...

GPU Server Section

...

services:
  ...
  runtime-gpu:
    build:
      context: .
      args:
        gpu_or_cpu: gpu
    user: 1000:1000
    command:
      - "--verbose"
      - "--debug"
      - "--workers=1"
    ports:
      - "6588:5588"
      - "6589:5589"
      - "6590:5590"
      - "6591:5591"
    volumes:
      - ../oliveAppDataGPU:/home/olive/olive
    environment:
      <<: *common-variables
      CUDA_VISIBLE_DEVICES: '0'
    shm_size: '4gb'
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              # count: 1
              # Can use 'count' or explicity list device_ids
              # How does device_ids and CUDA_VISIBLE_DEVICES match up?
              # Do the indexes inside the container always start at 0?
              device_ids: ["0"]
              capabilities: [gpu]

Exposing GPU(s) to an OLIVE Server (CUDA_VISIBLE_DEVICES)

If your hardware contains multiple GPUs or if your GPU does not identify as device 0 for any reason, you may need to modify the CUDA_VISIBLE_DEVICES environment variable used by OLIVE GPU server and/or the device_ids argument in the resource allocation section of the docker-compose.yml.

The former is controlled by: services: runtime-gpu: environment: CUDA_VISIBLE_DEVICES

And the latter by: services: runtime-gpu: deploy: resources: reservations: devices: device_ids

Both can be seen in the excerpt below:

...
services:
  ...
  runtime-gpu:
    ...
    environment:
      <<: *common-variables
      CUDA_VISIBLE_DEVICES: '0'
    ...
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              # count: 1
              # Can use 'count' or explicity list device_ids
              # How does device_ids and CUDA_VISIBLE_DEVICES match up?
              # Do the indexes inside the container always start at 0?
              device_ids: ["0"]
              capabilities: [gpu]

For more information on how and why to set CUDA_VISIBLE_DEVICES, see Nvidia's documentation. This variable is used to control what GPU(s) the server is able to see and lean on during processing. Exposing a GPU device here makes the GPU visible to OLIVE, but does not force anything to run on it. For configuring a plugin/domain to use an available GPU, see the relevant section below.

For more information about device_ids, how it should be set, and how this may interact with other settings, see the Docker documentation on GPU support. This page also covers a possible, untested alternative, of using count instead of device_ids.

Both of these arguments are controlling the container's GPU access; as such, these should likely match, and both pass the same list of GPUs and in the same order, but as each application and each customer's hardware setup and needs may vary, some testing and tweaking may be necessary on the client end to ensure the behavior is as desired.

Note that these parameters identify a GPU by its device ID as reported by nvidia-smi. These can be verified by checking the top left entry for each GPU in the nvidia-smi output; an example is shown below.

nvidia-smi Example Output (Click to expand)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 20%   28C    P0    71W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 20%   28C    P0    73W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
| 19%   28C    P0    73W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 19%   28C    P0    73W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:84:00.0 Off |                  N/A |
| 18%   26C    P0    71W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:85:00.0 Off |                  N/A |
| 19%   27C    P0    71W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  Off  | 00000000:88:00.0 Off |                  N/A |
| 18%   28C    P0    71W / 250W |      0MiB / 12212MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  Off  | 00000000:89:00.0 Off |                  N/A |
| 17%   27C    P0    73W / 250W |      0MiB / 12212MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Plugin Location (OLIVE_APP_DATA)

As described above, the default configuration separates CPU-configured plugins into olive5.5.0/oliveAppData/plugins/ and GPU-enabled/configured plugins into olive5.5.0/oliveAppDataGPU/plugins/. If maintaining a two-server setup similar to the delivered configuration, these do not need to be changed. To convert a GPU-capable plugin from CPU, first it must be moved or placed into the oliveAppDataGPU plugins directory.

Note that if moving a plugin that has enrollment capability, any enrolled models will not be transferred. Re-enrollment must be performed once the plugin is moved and reconfigured, or for advanced users, the enrollments can be moved separately.

If the name or location of the oliveAppData directories need to change for any reason, the docker-compose.yml must be updated to reflect this, specifically the volumes mounted for each server that are mapped to /home/olive/olive/ within the containers. This excerpt shows where this is set for the GPU server:

...
services:
  ...
  runtime-gpu:
    ...
    volumes:
      - ../oliveAppDataGPU:/home/olive/olive
  ...

GPU Server Concurrent Processing Jobs (--workers) [Experimental]

If maximum batch throughput is important for your application, and it is known that memory requirements of your use case won't overwhelm the available memory of the GPU, the number of server workers for the GPU server can be increased. This is experimental and may affect overall server performance (speed) and stability. This setting controls the number of jobs the server can process simultaneously, and for GPU-enabled domains will increase the number of times the associated models are loaded into memory. As the percentage of used GPU memory increases, speed may decrease as less optimal pathways are used. This also increases the likelihood of ungracefully exhausting the available GPU memory, which may cause a system crash. The majority of the OLIVE 5.5.0 testing was performed with --workers set to 1.

To increase the number of server workers, increase the argument passed to the --workers flag in the docker-compose.yml, under services: runtime-gpu: command:

...
services:
  ...
  runtime-gpu:
    ... 
    command:
      ...
      - "--workers=1"

Plugin Configuration for GPU Use

Enable GPU usage for a plugin's domain

To allow a plugin to run on an available GPU, it is crucial that the plugin:

Is located in the plugins directory of a GPU enabled server (oliveAppDataGPU/plugins by default)
Has each desired domain configured to choose a GPU device in its meta.conf file
Only has domains configured to select GPU device(s) that are properly exposed to the server via CUDA_VISIBLE_DEVICES and Docker's device_ids

More information for #1 and #3 can be found in the appropriate sections above. Be sure to move the plugin to oliveAppDataGPU/plugins/ when reconfiguring, and restarting any running servers. Be sure to double-check that the device(s) you are configuring each plugin domain to use are actually exposed to the server that will be using that plugin/domain.

Configuring a plugin to use a GPU is done at the domain level, by changing the device variable assignment within the domain's meta.conf file. By default, most plugins have this variable assigned to cpu, and as such the domain will run on CPU only, even when running within a GPU-enabled OLIVE server. To assign a domain access to a GPU, change this device assignment from cpu to gpuN where N is the device ID of the GPU to run on, as reported by nvidia-smi (more info here). As already discussed it is critical that the device assigned to the domain is exposed to the OLIVE server that will be running this domain.

Most hardware setups will only have a single GPU available, so enabling this is simply replacing cpu with gpu0.

As an example, the meta.conf for the english-v1 domain of asr-end2end-v1.0.0 in its off-the-shelf format is shown below:

label: english-v1
description: Large vocabulary English wav2vec2 model for both 8K and 16K data
resample_rate: 8000
language: English
device: cpu

To instead configure this domain to run on the GPU with ID #0, as it is when delivered with OLIVE 5.5.0, this domain then becomes:

label: english-v1
description: Large vocabulary English wav2vec2 model for both 8K and 16K data
resample_rate: 8000
language: English
device: gpu0

Each domain must be configured separately. If within a single plugin, one domain is configured for GPU, the others don't automatically start using a GPU. By default they will still run on CPU, which for most plugins will run significantly more slowly. By extension, it is not necessary for all domains of a plugin to run on the same GPU.

Having this device assignment at the domain level allows the distribution of domains across multiple GPUs. For example, if multiple GPUs are available on a system, lower-memory-usage plugins like SAD, SID, and LID may share a single GPU, while heavier plugins like ASR or MT can be flexible enough to assign different language domains of each plugin across multiple GPUs, spreading the memory load to minimize the chance of exhausting GPU memory, while also saving lost time frequently loading and unloading models.

Building off of the example above, if we wanted to run the russian-v1 domain of the same ASR plugin on GPU device #2, so that english-v1 and russian-v1 don't compete with respect to GPU memory, the russian-v1 domain would look like this:

label: russian-v1
description: Large vocabulary Russian wav2vec2 model for both 8K and 16K data
resample_rate: 8000
language: Russian
device: gpu2

Note that in these examples, GPU device 0 and GPU device 2 must both be listed in CUDA_VISIBLE_DEVICES and device_ids as outlined above.

Example docker-compose.yml modification

``` yaml ... services: ... runtime-gpu: ... environment: <<: *common-variables CUDA_VISIBLE_DEVICES: '0,2' ... deploy: resources: reservations: devices: - driver: nvidia # count: 1 # Can use 'count' or explicity list device_ids # How does device_ids and CUDA_VISIBLE_DEVICES match up? # Do the indexes inside the container always start at 0? device_ids: ['0', '2'] capabilities: [gpu]

The exact ideal configuration may vary greatly depending on specific customer use case and mission needs.

Currently supported GPU plugins

These plugins currently support GPU operation, when configured as outlined above:

All are capable of running on CPU as well, though in some cases (notably asr-end2end-v1) at a drastically reduced speed.

OLIVE GPU Restrictions and Configuration Notes

GPU Compute Mode: "Default" vs "Exclusive"

The OLIVE software currently assumes that any available GPUs are in the "default" mode. In testing, some configurations of number of OLIVE server workers have been found to cause unexpected issues if the GPUs are configured to be running in "Exclusive" mode. If possible, please configure GPUs that OLIVE will be using in "Default" mode, and if this is not possible, please ensure that the number of workers for the GPU-enabled server is specified to be 1. This is configured in the provided docker-compose.yml file.

To check the mode of your GPU, you can view the Compute Mode field of the nvidia-smi command output. Please refer to Nvidia's instructions for Nvidia Control Panel or the Usage instructions for nvidia-smi for more information on how to set these modes.

Workflow Restrictions

Plugins can't currently pass jobs between separate OLIVE servers. This means that if you have a workflow that requires several plugins, each of those plugins must be visible to the OLIVE server that is receiving the workflow job request. If your current configuration separates any of these plugins, your plugins directories must be reconfigured so that all required plugins are colocated within the same server.

There are two possible workarounds. The first would be disabling GPU capabilities in some plugins and moving them to the CPU oliveAppData/ directory, which may cause them to run much more slowly, but will allow full job parallelization. An alternative would be to move CPU-only plugins into the GPU-enabled server, leaving them configured to run on the CPU. This would allow the GPU-enabled plugins to run at full speed, but would limit parallel processing due to having a single worker thread.

Future work in OLIVE hopes to address these first-release GPU limitations to reduce and eventually eliminate workarounds and restrictions like this.