Skip to content

OLIVE Java and Python Clients

Introduction

Each OLIVE delivery includes two OLIVE client utilities - one written in Java, one written in Python. Out of the box, these tools allow a user to jump right in with running OLIVE if the GUI is not desired. These can also serve as code examples for integrating with OLIVE. This page primarily covers using these clients for processing audio, rather than integrating with the OLIVE API. For more information on integration, the nitty-gritty details of the OLIVE Enterprise API, and code examples, refer to these integration-focused pages instead:

As far as the usage and capabilities of these tools, they were meant to mirror the Legacy CLI Tools as closely as possible, and shares many input/output formats and assumptions with those tools. As this document is still under construction, referring to this older guide may help fill in some useful information that may currently be missing from this page.

Note that unlike the Legacy CLI tools, that are calling plugin code directly, these client tools require a running OLIVE Server. They are client utilities that are queueing and submitting job requests to the OLIVE server, which then manages the plugins themselves and actual audio processing. If you haven't already, please refer to the appropriate guide for setting up and starting an OLIVE server depending on your installation type:

Client Setup, Installation, Requirements

As a quick review, the contents of an OLIVE package typically look like this:

  • olive5.5.0/
    • api/
      • java/
      • python/
    • docs/
    • martini/ -or- docker/ -or- runtime/
    • OliveGUI/ - (Optional) The OLIVE Nightingale GUI (not included in all deliveries)
    • oliveAppData/

The clients this page describes are contained in the bolded api/ directory above.

Java (OliveAnalyze)

The Java tools are the most full-featured with respect to tasking individual plugins. They are asynchronous, and better able to deal with large amounts of file submissions by parallelizing the submission of large lists of files. If the primary task is enrolling and scoring audio files with individual plugins, the Java tools, what we call the OliveAnalyze suite.

The tools themselves do not need to be 'installed'. For convenience, their directory can be added to your $PATH environment variable, so that they can be called from anywhere:

 $ export PATH=$PATH:<path>/olive5.5.0/api/java/bin/
 $ OliveAnalyze -h

But they can also be left alone and called directly, as long as their full or relative path is present:

 # From inside olive5.5.0/api/java/bin:
 $ ./OliveAnalyze -h

 # From inside olive5.5.0/:
 $ ./api/java/bin/OliveAnalyze -h

 # From elsewhere:
 $ <path>/olive5.5.0/api/java/bin/OliveAnalyze -h

These tools depend on OpenJDK 11 or newer being installed. Refer to OpenJDK for more information on downloading and installing this for your operating system.

The full list of utilities in this suite are as follows:

  • OliveAnalyze
  • OliveAnalyzeText
  • OliveEnroll
  • OliveLearn (rarely used)

But the most commonly used are OliveAnalyze for scoring requests, and OliveEnroll for enrollment requests. Examples are provided for each of these below, and for more advanced users that need different tools, each utility has its own help statement that can be accessed with the -h flag:

 $ OliveAnalyzeText -h

The arguments and formatting for each tool is very similar, so familiarity with the OliveAnalyze and OliveEnroll examples below should allow use of most of these tools.

Python (olivepyanalyze)

The Python client, what we call the olivepyanalyze suite, is not as fully-featured with respect to batch-processing of audio files. It performs synchronous requests to the OLIVE server, and so it will sequentially score each provided audio file, rather than submitting jobs in parallel. For this reason, the Java OliveAnalyze tools are recommended for batch processing of individual plugin tasks.

Only the python client currently has workflow support, however, and the python workflow client utility, olivepyworkflow is not limited by the synchronous restriction of olivepyanalyze - so when operating with workflows it is the clear choice.

The python client tools require Python 3.8 or newer - please refer to Python for downloading and installing Python.

Installing these tools has been simplified by providing them in the form of a Python wheel, that can be easily installed with pip.

 $ cd olive5.5.0/api/python
 $ ls
   olivepy-5.5.0-py3-none-any.whl
   olivepy-5.5.0.tar.gz
   requirements.txt
 $ python3 -m pip install -r requirements.txt olivepy-5.5.0-py3-none-any.whl

This will fetch and install (if necessary) the olivepy dependencies, and install the olivepy tools. Those dependencies are:

The olivepy utilities closely mirror the Java utilities, with the addition of the workflow tool, and are as follows:

  • olivepyanalyze
  • olivepyenroll
  • olivepylearn (rarely used)
  • olivepyworkflow
  • olivepyworkflowenroll

The olivepyworkflow tools are the most important, and examples are provided below for both scoring with olivepyworkflow and enrollment with olivepyworkflowenroll. We also provide examples for olivepyanalyze and olivepyenroll that mirror the Java examples.

Scoring/Analysis Requests

Plugin Scoring Types

In general, the output format will depend on the type of ‘scorer’ the plugin being used is.

For a deeper dive into OLIVE scoring types, please refer to the appropriate section in the OLIVE Plugin Traits Guide, but a brief overview follows. The most common types of plugins in OLIVE are:

Global Scorer

Any plugin that reports a single score for a given model over the entire test audio file is a global scoring plugin. Every input test audio file will be assigned a single score for each enrolled target model, as measured by looking at the entire file at once.

Speaker and Language Identification are examples of global scorers.

OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.

Global Scorer Output

In the case of global scorers like LID and SID, the output file, which by default is called output.txt, contains one or more lines containing the audio path, speaker/language ID (class id), and the score:

<audio_path> <class_id> <score>

For example, a Speaker Identification analysis run, with three enrolled speakers (Alex, Taylor, Blake) might return:

/data/sid/audio/file1.wav Alex -0.5348
/data/sid/audio/file1.wav Taylor 3.2122
/data/sid/audio/file1.wav Blake -5.5340
/data/sid/audio/file2.wav Alex 0.5333
/data/sid/audio/file2.wav Taylor -4.9444
/data/sid/audio/file2.wav Blake -2.6564

Note the actual meanings of the scores and available classes will vary from plugin-to-plugin. Please refer to individual plugin documentation for more guidance on what the scores mean and what ranges are acceptable.

Also note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a global-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, class_id, score) are still provided, but packed into a json structure.

Region scorer

Region scoring plugins are capable of considering each audio file in small pieces at a time. Scores are reported for enrolled target models along with the location within that audio file that they are thought to occur. This allows OLIVE to pinpoint individual keywords or phrases or pick out one specific speaker in a recording where several people may be talking.

Automatic Speech Recognition (ASR), Language Detection (LDD), and Speaker Detection (SDD) are all region scorers.

OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.

Region Scorer Output

Region scoring plugins will generate a single output file, that is also called output.txt by default, just like global scorers. The file looks very similar to a global scorer’s output, but includes a temporal component to each line that represents the start and end of each scored region. In practice, this looks like:

<audio_path> <region_start_timestamp> <region_end_timestamp> <class_id> <score>

For example, a language detection plugin might output something like this:

/data/mixed-language/testFile1.wav 2.170 9.570 Arabic 0.912
/data/mixed-language/testFile1.wav 10.390 15.930 French 0.693
/data/mixed-language/testFile1.wav 17.639 22.549 English 0.832
/data/mixed-language/testFile2.wav 0.142 35.223 Pashto 0.977

Each test file can have multiple regions where scores are reported, depending on the individual plugin. The region boundary timestamps are in seconds. More specific examples can be found in the respective plugin-specific documentation pages. As with global scoring, note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a region-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, region start and end timestamps, class_id, score) are still provided, but packed into a json structure.

Plugin Direct (Analysis)

Performing an analysis request with both tools is very similar, as the tools were designed to closely mirror each other so that familiarity with one would easily transfer to the other. The usage statements for each can be examined by invoking each with their -h or --help flag:

$ ./OliveAnalyze -h
usage: OliveAnalyze
    --align              Perform audio alignment analysis.  Must specify
                         the two files to compare using an input list file
                         via the--list argument
    --apply_update       Request the plugin is update (if supported)
    --box                Perform bounding box  analysis.  Must specify an
                         image or video input
    --channel <arg>      Process stereo files using channel NUMBER
    --class_ids <arg>    Use Class(s) from FILE for scoring.  Each line in
                         the file contains a single class, including any
                         white space
    --compare            Perform audio compare analysis.  Must specify the
                         two files to compare using an input list file via
                         the--list argument
    --decoded            Send audio file as decoded PCM16 samples instead
                         of sending as serialized buffer.  Input file must
                         be a wav file
    --domain <arg>       Use Domain NAME
    --enhance            Perform audio conversion (enhancement)
    --frame              Perform frame scoring analysis
    --global             Perform global scoring analysis
 -h                      Print this help message
 -i,--input <arg>        NAME of the input file (audio/video/image as
                         required by the plugin
    --input_list <arg>   Use an input list FILE having multiple
                         filenames/regions or PEM formatted
 -l,--load               load a plugin now, must use --plugin and --domain
                         to specify the plugin/domain to preload
    --options <arg>      options from FILE
    --output <arg>       Write any output to DIR, default is ./
 -p,--port <arg>         Scenicserver port number. Defauls is 5588
    --path               Send audio file path instead of a buffer.  Server
                         and client must share a filesystem to use this
                         option
    --plugin <arg>       Use Plugin NAME
    --print <arg>        Print all available plugins and domains.
                         Optionally add 'verbose' as a print option to
                         print full plugin details including traits and
                         classes
 -r,--unload             unload a loaded plugin now, must use --plugin and
                         --domain to specify the plugin/domain to unload
    --region             Perform region scoring analysis
 -s,--server <arg>       Scenicserver hostname. Default is localhost
    --shutdown           Request a clean shutdown of the server
    --status             Print the current status of the server
 -t,--timeout <arg>      timeout (in seconds) when waiting for server
                         response.  Default is 10 seconds
    --threshold <arg>    Apply threshold NUMBER when scoring
    --update_status      Get the plugin's update status
 -v,--vec <arg>          PATH to a serialized AudioVector, for plugins
                         that support audio vectors in addition to wav
                         files
    --vector             Perform audio vectorization
    --workflow           Request a workflow
$ olivepyanalyze ---h
usage: olivepyanalyze [-h] [-C CLIENT_ID] [-p PLUGIN] [-d DOMAIN] [-G] [-e] [-f] [-g] [-r] [-b] [-P PORT] [-s SERVER] [-t TIMEOUT] [-i INPUT]
                      [--input_list INPUT_LIST] [--text] [--options OPTIONS] [--class_ids CLASS_IDS] [--debug] [--path] [--print]

optional arguments:
  -h, --help            show this help message and exit
  -C CLIENT_ID, --client-id CLIENT_ID
                        Experimental: the client_id to use
  -p PLUGIN, --plugin PLUGIN
                        The plugin to use.
  -d DOMAIN, --domain DOMAIN
                        The domain to use
  -G, --guess           Experimental: guess the type of analysis to use based on the plugin/domain.
  -e, --enhance         Enhance the audio of a wave file, which must be passed in with the --wav option.
  -f, --frame           Do frame based analysis of a wave file, which must be passed in with the --wav option.
  -g, --global          Do global analysis of a wave file, which must be passed in with the --wav option.
  -r, --region          Do region based analysis of a wave file, which must be passed in with the --wav option.
  -b, --box             Do bounding box based analysis of an input file, which must be passed in with the --wav option.
  -P PORT, --port PORT  The port to use.
  -s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
  -t TIMEOUT, --timeout TIMEOUT
                        The timeout to use
  -i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
                        the --text flag
  --input_list INPUT_LIST
                        A list of files to analyze. One file per line.
  --text                Indicates that input (or input list) is a literal text string to send in the analysis request.
  --options OPTIONS     Optional file containing plugin properties ans name/value pairs.
  --class_ids CLASS_IDS
                        Optional file containing plugin properties ans name/value pairs.
  --debug               Debug mode
  --path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
  --print               Print all available plugins and domains

To perform a scoring request with these tools, you will need these essential pieces of information:

  • Plugin name (--plugin)
  • Domain name (--domain)
  • Scoring type to perform (--region for region-scoring, --global for global-scoring, others for less-common plugins)
  • Input audio file or list of input audio files (--input for a single file, --input_list for a list of files)

The flag for providing each piece of information is the same for both tools, as shown in the list above.

For more information on what the difference is between a plugin and a domain, refer to the Plugins Overview. For more information on the domains available for each plugin, refer to documentation page for that specific plugin.

To see which plugins and domains you have installed and running in your specific OLIVE environment, refer to the server startup status message, that appears when you start the server:

  • martini.sh start, then martini.sh log once the server is running for martini-based OLIVE packages (most common)
  • ./run.sh for non-martini docker OLIVE packages
  • oliveserver for native linux OLIVE packages

Or exercise the --print option for each tool to query the server and print the available plugins and domains:

$ ./OliveAnalyze --print
$ olivepyanalyze --print

Example output:

2022-06-14 12:12:25.786 INFO  com.sri.speech.olive.api.Server - Connected to localhost - request port: 5588 status_port: 5589
Found 8 plugin(s):
Plugin: sad-dnn-v7.0.2 (SAD,Speech) v7.0.2 has 2 domain(s):
    Domain: fast-multi-v1, Description: Trained with Telephony, PTT and Music data
    Domain: multi-v1, Description: Trained with Telephony, PTT and Music data
Plugin: asr-dynapy-v3.0.0 (ASR,Content) v3.0.0 has 9 domain(s):
    Domain: english-tdnnChain-tel-v1, Description: Large vocabulary English DNN model for 8K data
    Domain: farsi-tdnnChain-tel-v1, Description: Large vocabulary Farsi DNN model for 8K data
    Domain: french-tdnnChain-tel-v2, Description: Large vocabulary African French DNN Chain model for 8K data
    Domain: iraqiArabic-tdnnChain-tel-v1, Description: Large vocabulary Iraqi Arabic DNN Chain model for 8K data
    Domain: levantineArabic-tdnnChain-tel-v1, Description: Large vocabulary Levantine Arabic DNN Chain model for 8K data
    Domain: mandarin-tdnnChain-tel-v1, Description: Large vocabulary Mandarin DNN model for clean CTS 8K data
    Domain: pashto-tdnnChain-tel-v1, Description: Large vocabulary Pashto DNN Chain model for 8K data
    Domain: russian-tdnnChain-tel-v2, Description: Large vocabulary Russian DNN model for 8K data
    Domain: spanish-tdnnChain-tel-v1, Description: Large vocabulary Spanish DNN model for clean CTS 8K data
Plugin: sdd-diarizeEmbedSmolive-v1.0.0 (SDD,Speaker) v1.0.0 has 1 domain(s):
    Domain: telClosetalk-int8-v1, Description: Speaker Embeddings Framework
Plugin: tmt-neural-v1.0.0 (TMT,Content) v1.0.0 has 3 domain(s):
    Domain: cmn-eng-nmt-v1, Description: Mandarin Chinese to English NMT
    Domain: rus-eng-nmt-v1, Description: Russian to English NMT
    Domain: spa-eng-nmt-v3, Description: Spanish to English NMT
Plugin: ldd-embedplda-v1.0.1 (LDD,Language) v1.0.1 has 1 domain(s):
    Domain: multi-v1, Description: PNCC bottleneck domain suitable for mixed conditions (tel/mic/compression)
Plugin: sdd-diarizeEmbedSmolive-v1.0.2 (SDD,Speaker) v1.0.2 has 1 domain(s):
    Domain: telClosetalk-smart-v1, Description: Speaker Embeddings Framework
Plugin: sid-dplda-v2.0.2 (SID,Speaker) v2.0.2 has 1 domain(s):
    Domain: multi-v1, Description: Speaker Embeddings DPLDA
Plugin: lid-embedplda-v3.0.1 (LID,Language) v3.0.1 has 1 domain(s):
    Domain: multi-v1, Description: PNCC Bottleneck embeddings suitable for mixed conditions (tel/mic/compression)

Examples

To perform a global score analysis on a single file with the speaker identification plugin sid-dplda-v2.0.2, using the multi-v1 domain, the calls for each would look like this:

$ ./OliveAnalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav
$ olivepyanalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav

Performing region scoring instead, using a transcription plugin, asr-dynapy-v3.0.0, via the english domain english-tdnnChain-tel-v1 on a list of audio files would be performed with:

$ ./OliveAnalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt
$ olivepyanalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt

Where the format of the input list is simply a text file with a path to an audio file on each line. For example:

/data/mixed-language/testFile1.wav 
/data/mixed-language/testFile2.wav 
/data/mixed-language/testFile3.wav
/moreData/test-files/unknown1.wav

Workflow (Analysis)

OLIVE Workflows provide a simple way of creating a sort of 'recipe' that specifies how to deal with the input data and one or more OLIVE plugins. It allows complex operations to be requested and performed with a single, simple call to the system by allowing complexities and specific knowledge to be encapsulated within the workflow itself, rather than known and implemented by the user at run time. Due to off-loading this burden, operating with workflows is much simpler than calling the plugin(s) directly - typically all that is needed from the user to request an analysis from olivepyworkflow is the workflow itself, and one or more input files. There are more options available to the olivepyworkflow client, as shown in the usage statement:

$ olivepyworkflow
usage: olivepyworkflow [-h] [--tasks] [--class_ids] [--print_actualized] [--print_workflow] [-s SERVER] [-P PORT] [-t TIMEOUT] [-i INPUT]
                       [--input_list INPUT_LIST] [--text] [--options OPTIONS] [--path] [--debug]
                       workflow

Perform OLIVE analysis using a Workflow Definition file

positional arguments:
  workflow              The workflow definition to use.

optional arguments:
  -h, --help            show this help message and exit
  --tasks               Print the workflow analysis tasks.
  --class_ids           Print the class IDs available for analysis in the specified workflow.
  --print_actualized    Print the actualized workflow info.
  --print_workflow      Print the workflow definition file info (before it is actualized, if requested)
  -s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
  -P PORT, --port PORT  The port to use.
  -t TIMEOUT, --timeout TIMEOUT
                        The timeout (in seconds) to wait for a response from the server
  -i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
                        the --text flag
  --input_list INPUT_LIST
                        A list of files to analyze. One file per line.
  --text                Indicates that input (or input list) is a literal text string to send in the analysis request.
  --options OPTIONS     A JSON formatted string of workflow options such as [{"task":"SAD", "options":{"filter_length":99, "interpolate":1.0}] or
                        {"filter_length":99, "interpolate":1.0, "name":"midge"}, where the former options are only applied to the SAD task, and the later are
                        applied to all tasks
  --path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
  --debug               Debug mode

But their use rarely necessary, and is reserved for advanced users or specific system testing.

Generically, calling olivepyworkflow will look like this:

$ olivepyworkflow --input ~/path/to/test-file1.wav <workflow>
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt <workflow> 

As an example of the power of workflows, this request calls the SmartTranscription workflow - that performs Speech Activity Detection (region scoring), Speaker Diarization and Detection (region scoring), Language Detection (region scoring), and then Automatic Speech Recognition (region scoring) on any sections of each input file that are detected to be a language that ASR currently has support for, and returns all of the appropriate results in a JSON structure. Performing this same task by calling plugins directly, this same functionality would be a minimum of 4 separate calls to OLIVE; significantly more if more than one language is detected being spoken in the file.

$ olivepyworkflow --input ~/path/to/test-file1.wav ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

Output Format (Workflow)

Workflows are generally customer/user-specific and can be quite specialized - the output format and structure will depend heavily on the individual workflow itself and the tasks being performed. All of the information pieces that define each scoring type are still reported for each result, but the results are organized into a single JSON structure for the workflow call. This means that the output of a region scoring plugin within the workflow is still one or more sets of:

<region_start_timestamp> <region_end_timestamp> <class_id> <score>

But the data is arranged into the JSON structure and will be nested depending on the structure of the workflow itself and how the audio is routed by the workflow. For more detailed information on the structure of this JSON message and the inner-workings of workflows, please refer to the OLIVE Workflow API documentation. A brief, simplified summary to jump start working with workflow output follows.

The main skeleton structure of the results output is shown below, along with an actual example. The results are provided as a result for each input file, that lists the job name(s), some metadata about the input audio and how it was processed, and then the returned results (if any) for each task, which is generally a plugin.

Workflow analysis results:
[
 {
  "job_name": <workflow job name>,
  "data": [
   {
    "data_id": <data ID, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <data label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ],
   <task 2 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ],
   <task 3 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ]
  }
 }
]
Workflow analysis results:
[
 {
  "job_name": "SAD, SDD, and ASR English Workflow",
  "data": [
   {
    "data_id": "z_eng_englishdemo.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 5.932625,
    "number_channels": 1,
    "label": "z_eng_englishdemo.wav",
    "id": "0b04c7497521d53a5d6939533a55c461795f9d685b1bd19fd9031fc6f3997a8f"
   }
  ],
  "tasks": {
   "SAD_REGIONS": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SAD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sad-dnn-v7.0.2",
     "domain": "multi-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 5.93,
        "class_id": "speech",
        "score": 0.0
       }
      ]
     }
    }
   ],
   "SDD": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SDD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sdd-diarizeEmbedSmolive-v1.0.2",
     "domain": "telClosetalk-smart-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.1,
        "end_t": 5.0,
        "class_id": "unknownspk00",
        "score": 1.4
       }
      ]
     }
    }
   ],
   "ASR": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "ASR",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "asr-dynapy-v3.0.0",
     "domain": "english-tdnnChain-tel-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.15,
        "end_t": 0.51,
        "class_id": "hello",
        "score": 100.0
       },
       {
        "start_t": 0.54,
        "end_t": 0.69,
        "class_id": "my",
        "score": 100.0
       },
       {
        "start_t": 0.69,
        "end_t": 0.87,
        "class_id": "name",
        "score": 99.0
       },
       {
        "start_t": 0.87,
        "end_t": 1.05,
        "class_id": "is",
        "score": 99.0
       },
       {
        "start_t": 1.05,
        "end_t": 1.35,
        "class_id": "evan",
        "score": 88.0
       },
       {
        "start_t": 1.35,
        "end_t": 1.47,
        "class_id": "this",
        "score": 99.0
       },
       {
        "start_t": 1.5,
        "end_t": 1.98,
        "class_id": "audio",
        "score": 95.0
       },
       {
        "start_t": 1.98,
        "end_t": 2.16,
        "class_id": "is",
        "score": 74.0
       },
       {
        "start_t": 2.16,
        "end_t": 2.31,
        "class_id": "for",
        "score": 99.0
       },
       {
        "start_t": 2.31,
        "end_t": 2.4,
        "class_id": "the",
        "score": 99.0
       },
       {
        "start_t": 2.4,
        "end_t": 2.91,
        "class_id": "purposes",
        "score": 100.0
       },
       {
        "start_t": 2.91,
        "end_t": 3.06,
        "class_id": "of",
        "score": 99.0
       },
       {
        "start_t": 3.12,
        "end_t": 3.81,
        "class_id": "demonstrating",
        "score": 100.0
       },
       {
        "start_t": 3.81,
        "end_t": 3.96,
        "class_id": "our",
        "score": 78.0
       },
       {
        "start_t": 4.05,
        "end_t": 4.44,
        "class_id": "language",
        "score": 100.0
       },
       {
        "start_t": 4.44,
        "end_t": 4.53,
        "class_id": "and",
        "score": 93.0
       },
       {
        "start_t": 4.53,
        "end_t": 4.89,
        "class_id": "speaker",
        "score": 100.0
       },
       {
        "start_t": 4.89,
        "end_t": 5.01,
        "class_id": "i.",
        "score": 99.0
       },
       {
        "start_t": 5.01,
        "end_t": 5.22,
        "class_id": "d.",
        "score": 99.0
       },
       {
        "start_t": 5.22,
        "end_t": 5.85,
        "class_id": "capabilities",
        "score": 99.0
       }
      ]
     }
    }
   ]
  }
 }
]

Each task output will typically be for a single plugin, and will be outputting the information provided by a Region Scorer or Global Scorer or Text Transformer in the case of Machine Translation, depending on how the workflow is using the plugin. The format of each result sub part is:

<task name>: [
  {
    "task_trait": "GLOBAL_SCORER",
    "task_type": <task type, generally LID, SID, etc.>,
    "message_type": "GLOBAL_SCORER_RESULT",
    "plugin": <plugin>,
    "domain": <domain>,
    "analysis": {
      "score": [
      {
        "class_id": <class 1>,
        "score": <class 1 score>
      },
      {
        "class_id": <class 2>,
        "score": <class 2 score>
      },
      ...
      {
        "class_id": <class N>,
        "score": <class N score>
      }
     ]
    }
  }
]
<task name>: [
  {
    "task_trait": "REGION_SCORER",
    "task_type": <task type, typically ASR, SDD, SAD, etc.>,
    "message_type": "REGION_SCORER_RESULT",
    "plugin": <plugin>,
    "domain": <domain>,
    "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 detected class>,
        "score": <region 1 score>
      },
      {
        "start_t": <region 2 start time (s)>,
        "end_t": <region 2 end time (s)>,
        "class_id": <region 2 detected class>,
        "score": <region 2 score>
      },
      ...
      {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N detected class>,
        "score": <region N score>
      },
     ]
    }
  }
]
<task name>: [
  {
    "task_trait": "TEXT_TRANSFORMER",
    "task_type": <task type, typically MT>,
    "message_type": "TEXT_TRANSFORM_RESULT",
    "plugin": <plugin name>,
    "domain": <domain name>,
    "analysis": {
     "transformation": [
       {
        "class_id": "test_label",
        "transformed_text": <the translated/transformed text returned from the plugin>
       }
     ]
    }
  }
]

Many workflows consist of a single job, and bundle all plugin tasks into this single job, as seen above. More complex workflows, or what OLIVE calls "Conditional Workflows" can pack multiple jobs into a single workflow. This happens when there are certain tasks in the workflow that depend on other tasks in the workflow - for example when OLIVE needs to choose the appropriate Speech Recognition (ASR) language to use, depending on what language is detected being spoken by Language Identification (LID) or Language Detection (LDD). In this case, the LID/LDD is separated into one job, and the ASR into another, that is triggered to run once LID/LDD's decision is known. In this case, the results from each job are grouped accordingly in the results output. Below shows a simplified output from a workflow that includes three jobs; "job 1", "job 2", "job 3", and a real-life example output from the SmartTranscription conditional workflow that also has three jobs;

  1. the first performs Speech Activity Detection and Language Identification (LID);
    • Smart Translation SAD and LID Pre-processing
  2. the second uses the language decision from Language Identification to choose the appropriate (if any) language and domain for Automatic Speech Recognition (ASR) and runs that,
    • Dynamic ASR
  3. the third takes the output transcript from ASR and the language decision from LID and choose the appropriate (if any) language and domain for Text Machine Translation, and runs that.
    • Dynamic MT

As you can see below, these jobs are listed separately in the JSON for each result:

Workflow analysis results:
[
 {
  "job name": <job 1 name>,
  "data": [
   {
    "data_id": <data identifier, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <audio label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name (job 1)>: [
    {
      <task 1 results>
    }
   ],
   ...
   <task N name (job 1)>: [
    {
      <task N results>
    }
   ]
  }
 },
 {
  "job name": <job 2 name>,
  "data": [
   {
    "data_id": <data identifier, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <audio label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name (job 2)>: [
    {
      <task 1 results>
    }
   ],
   ...
   <task N name (job 2)>: [
    {
      <task N results>
    }
   ]
  }
 },
 ... <repeat if more jobs>
]
Workflow analysis results:
[
 {
  "job_name": "Smart Translation SAD and LID Pre-processing",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 8.0,
    "number_channels": 1,
    "label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
   }
  ],
  "tasks": {
   "SAD_REGIONS": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SAD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sad-dnn-v7.0.2",
     "domain": "multi-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 8.0,
        "class_id": "speech",
        "score": 0.0
       }
      ]
     }
    }
   ],
   "LID": [
    {
     "task_trait": "GLOBAL_SCORER",
     "task_type": "LID",
     "message_type": "GLOBAL_SCORER_RESULT",
     "plugin": "lid-embedplda-v3.0.1",
     "domain": "multi-v1",
     "analysis": {
      "score": [
       {
        "class_id": "Mandarin",
        "score": 3.5306692
       },
       {
        "class_id": "Korean",
        "score": -1.9072952
       },
       {
        "class_id": "Japanese",
        "score": -3.7805116
       },
       {
        "class_id": "Tagalog",
        "score": -7.4819508
       },
       {
        "class_id": "Vietnamese",
        "score": -8.094855
       },
       {
        "class_id": "Iraqi Arabic",
        "score": -10.63325
       },
       {
        "class_id": "Levantine Arabic",
        "score": -10.694491
       },
       {
        "class_id": "French",
        "score": -11.542379
       },
       {
        "class_id": "Pashto",
        "score": -12.11981
       },
       {
        "class_id": "English",
        "score": -12.323014
       },
       {
        "class_id": "Modern Standard Arabic",
        "score": -12.626052
       },
       {
        "class_id": "Spanish",
        "score": -13.469315
       },
       {
        "class_id": "Iranian Persian",
        "score": -13.763366
       },
       {
        "class_id": "Amharic",
        "score": -17.129797
       },
       {
        "class_id": "Portuguese",
        "score": -17.31257
       },
       {
        "class_id": "Russian",
        "score": -18.770994
       }
      ]
     }
    }
   ]
  }
 },
 {
  "job_name": "Dynamic ASR",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 8.0,
    "number_channels": 1,
    "label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
   }
  ],
  "tasks": {
   "ASR": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "ASR",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "asr-dynapy-v3.0.0",
     "domain": "mandarin-tdnnChain-tel-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 0.18,
        "class_id": "跟",
        "score": 31.0
       },
       {
        "start_t": 0.18,
        "end_t": 0.36,
        "class_id": "一个",
        "score": 83.0
       },
       {
        "start_t": 0.36,
        "end_t": 0.66,
        "class_id": "肯定",
        "score": 100.0
       },
       {
        "start_t": 0.66,
        "end_t": 0.81,
        "class_id": "是",
        "score": 83.0
       },
       {
        "start_t": 0.81,
        "end_t": 1.23,
        "class_id": "北京",
        "score": 95.0
       },
       {
        "start_t": 1.23,
        "end_t": 1.47,
        "class_id": "啊",
        "score": 96.0
       },
       {
        "start_t": 2.07,
        "end_t": 2.49,
        "class_id": "他俩",
        "score": 96.0
       },
       {
        "start_t": 2.7,
        "end_t": 3.09,
        "class_id": "上海",
        "score": 99.0
       },
       {
        "start_t": 3.09,
        "end_t": 3.21,
        "class_id": "的",
        "score": 99.0
       },
       {
        "start_t": 3.21,
        "end_t": 3.57,
        "class_id": "人口",
        "score": 99.0
       },
       {
        "start_t": 3.57,
        "end_t": 3.87,
        "class_id": "好像",
        "score": 73.0
       },
       {
        "start_t": 3.87,
        "end_t": 3.99,
        "class_id": "没",
        "score": 54.0
       },
       {
        "start_t": 3.99,
        "end_t": 4.32,
        "class_id": "北京",
        "score": 74.0
       },
       {
        "start_t": 4.32,
        "end_t": 4.68,
        "class_id": "多",
        "score": 99.0
       },
       {
        "start_t": 4.86,
        "end_t": 5.19,
        "class_id": "但是",
        "score": 100.0
       },
       {
        "start_t": 5.4,
        "end_t": 5.91,
        "class_id": "不知道",
        "score": 100.0
       },
       {
        "start_t": 6.06,
        "end_t": 6.48,
        "class_id": "@reject@",
        "score": 62.0
       },
       {
        "start_t": 6.69,
        "end_t": 7.05,
        "class_id": "其他",
        "score": 93.0
       },
       {
        "start_t": 7.05,
        "end_t": 7.2,
        "class_id": "的",
        "score": 97.0
       },
       {
        "start_t": 7.65,
        "end_t": 7.89,
        "class_id": "上",
        "score": 32.0
       },
       {
        "start_t": 7.89,
        "end_t": 7.95,
        "class_id": "啊",
        "score": 67.0
       }
      ]
     }
    }
   ]
  }
 },
 {
  "job_name": "Dynamic MT",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "WORKFlOW_TEXT_RESULT",
    "text": "跟 一个 肯定 是 北京 啊 他俩 上海 的 人口 好像 没 北京 多 但是 不知道 @reject@ 其他 的 上 啊"
   }
  ],
  "tasks": {
   "MT": [
    {
     "task_trait": "TEXT_TRANSFORMER",
     "task_type": "MT",
     "message_type": "TEXT_TRANSFORM_RESULT",
     "plugin": "tmt-neural-v1.0.0",
     "domain": "cmn-eng-nmt-v1",
     "analysis": {
      "transformation": [
       {
        "class_id": "test_label",
        "transformed_text": "with someone in beijing they don't seem to have a population in shanghai but we don't know what else to do"
       }
      ]
     }
    }
   ]
  }
 }
]

Enrollment Requests

Enrollments are a sub-set of classes that the user can create and/or modify. These are used for classes that cannot be known ahead of time and therefore can't be pre-loaded into the system, such as specific speakers or keywords of interest. To determine if a plugin supports or requires enrollments, or to check what its default enrolled classes are (if any), refer to that plugin's details page, linked from the navigation or the Release Plugins page.

Enrollment list format

As with analysis, both the Java and Python tools were designed to share as much of a common interface as possible, and as such share an input list format when providing exemplars for enrollment. The audio enrollment list input file is formatted as one or more newline-separated lines containing a path to an audio file and a class or model ID, which can be a speaker name, topic name, or query name for SID, TPD, and QBE respectively. A general example is given below, and more details and plugin-specific enrollment information are provided in the appropriate section in each plugin's documentation. Format:

<audio_path> <model_id>

Example enrollment list file (SID):

/data/speaker1/audiofile1.wav speaker1
/data/speaker1/audiofile2.wav speaker1
/data/speaker7/audiofile1.wav speaker7

Plugin Direct (Enrollment)

Performing an enrollment request is similar to an analysis request and is again very similar between the two tools. The usage statements for each can be examined by invoking each with their -h or --help flag:

$ ./OliveEnroll -h
usage: OliveEnroll
    --channel <arg>      Process stereo files using channel NUMBER
    --classes            Print class names if also printing plugin/domain
                         names.  Must use with --print option.  Default is
                         to not print class IDs
    --decoded            Sennd audio file as a decoded PCM16 sample buffer
                         instead of a serialized buffer. The file must be
                         a WAV file
    --domain <arg>       Use Domain NAME
    --enroll <arg>       Enroll speaker NAME. If no name specified then,
                         the pem or list option must specify an input file
    --export <arg>       Export speaker NAME to an EnrollmentModel
                         (enrollment.tar.gz)
 -h                      Print this help message
 -i,--input <arg>        NAME of the input file (input varies by plugin:
                         audio, image, or video)
    --import <arg>       Import speaker from EnrollmentModel FILE
    --input_list <arg>   Batch enroll using this input list FILE having
                         multiple filenames/class IDs or PEM formmated
                         file
    --nobatch            Disable batch enrollment when using pem or list
                         input files, so that files are processed serially
    --options <arg>      Enrollment options from FILE
    --output <arg>       Write any output to DIR, default is ./
 -p,--port <arg>         Scenicserver port number. Defauls is 5588
    --path               Send the path to the audio file instead of a
                         (serialized) buffer.  The server must have access
                         to this path.
    --plugin <arg>       Use Plugin NAME
    --print              Print all plugins and domains that suport
                         enrollment and/or class import and export
    --remove <arg>       Remove audio enrollment for NAME
$ olivepyenroll -h
usage: olivepyenroll [-h] [-C CLIENT_ID] [-D] [-p PLUGIN] [-d DOMAIN] [-e ENROLL] [-u UNENROLL] [-s SERVER] [-P PORT] [-t TIMEOUT] [-i INPUT] [--input_list INPUT_LIST] [--path]

optional arguments:
  -h, --help            show this help message and exit
  -C CLIENT_ID, --client-id CLIENT_ID
                        Experimental: the client_id to use
  -D, --debug           The domain to use
  -p PLUGIN, --plugin PLUGIN
                        The plugin to use.
  -d DOMAIN, --domain DOMAIN
                        The domain to use
  -e ENROLL, --enroll ENROLL
                        Enroll with this name.
  -u UNENROLL, --unenroll UNENROLL
                        Uneroll with this name.
  -s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
  -P PORT, --port PORT  The port to use.
  -t TIMEOUT, --timeout TIMEOUT
                        The timeout to use
  -i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
                        the --text flag
  --input_list INPUT_LIST
                        A list of files to analyze. One file per line.
  --path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option

To perform an enrollment request with these tools, you will need these essential pieces of information:

  • Plugin name (--plugin)
  • Domain name (--domain)
  • One of:
    • A properly formatted enrollment list (--input_list), if providing multiple files at once (see below for formatting)
    • An input audio file (--input) AND the name of the class you wish to enroll (--enroll for OliveEnroll, -e or --enroll for olivepyanalyze)

Generically, this looks like this for a single file input:

$ ./OliveEnroll  --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>
$ olivepyenroll --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>

A more specific example:

$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"

Or if providing the enrollment list format shown above, the call is even simpler. Generically:

$ ./OliveEnroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>
$ olivepyenroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>

Specific:

$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt

Where the enrollment_input.txt file might look like:

/some/data/somewhere/inputFile1.wav Logan
/some/other/data/somewhere/else/LoganPodcast.wav Logan
/yet/another/data/directory/charlie-speaks.wav Charlie

Workflow (Enrollment)

In the most basic case, enrollment using a workflow is just as simple as scoring with a workflow. This is becuase most workflows will only have a single enrollment-capable job; a job is a subset of the the tasks a workflow is performing, typically linked to a single plugin. In the rare case that you're using a workflow with multiple supported enrollment jobs, you will need to specify which job to enroll to. See the Advanced Workflow Enrollment section below.

Workflow enrollment is performed by using the olivepyworkflowenroll utility, whose help/usage statement is:

olivepyworkflowenroll usage
$ olivepyworkflowenroll -h
usage: olivepyworkflowenroll [-h] [--print_jobs] [--job JOB] [--enroll ENROLL] [--unenroll UNENROLL] [-i INPUT] [--input_list INPUT_LIST] [--path]
                             [-s SERVER] [-P PORT] [-t TIMEOUT]
                             workflow

Perform OLIVE enrollment using a Workflow Definition file

positional arguments:
  workflow              The workflow definition to use.

optional arguments:
  -h, --help            show this help message and exit
  --print_jobs          Print the supported workflow enrollment jobs.
  --job JOB             Enroll/Unenroll an Class ID for a job(s) in the specified workflow. If not specified enroll or unenroll for ALL enrollment/unenrollment jobs
  --enroll ENROLL       Enroll using this (class) name. Should be used with the job argument to specify a target job to enroll with (if there are more than one enrollment jobs)
  --unenroll UNENROLL   Enroll using this (class) name. Should be used with the job argument to specify a job to unenroll (if there are more than one unenrollment jobs)
  -i INPUT, --input INPUT
                        The data input to enroll. Either a pathname to an audio/image/video file or a string for text input
  --input_list INPUT_LIST
                        A list of files to enroll. One file per line plus the class id to enroll.
  --path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
  -s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
  -P PORT, --port PORT  The port to use.
  -t TIMEOUT, --timeout TIMEOUT
                        The timeout (in seconds) to wait for a response from the server

If there is only one supported enrollment job in the workflow, using this utility for enrollment is very similar to the enrollment utilities above; but a workflow is provided instead of a plugin and domain combination. As with the other enrollment utilities, olivepyworkflowenroll supports both single-file enrollment and batch enrollment using an enrollment-formatted text file.

Generically, this looks like:

$ olivepyworkflowenroll --input_list <path to enrollment file> <workflow>
$ olivepyworkflowenroll --input <path to audio file> --enroll <class name> <workflow>

And a specific example of each:

$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav  --enroll "Logan" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

Important: Note that in OLIVE, when you enroll a class, you are enrolling to a plugin and domain, and enrollments are shared server-wide. Even when you enroll using a workflow and the olivepyworkflow utility, enrollments are associated with the specific plugin/domain that the workflow is using under the hood. Any enrollments made to a workflow will be available to anyone else who may be using that server instance, and will also be made available to anyone interacting with the individual plugin - whether directly or via a workflow.

As a more concrete example of this, the "SmartTranscription" workflow that is sometimes provided with OLIVE, that performs Speech Activity Detection, Speaker Detection, Language Detection, and Speech Recognition on supported languages has a single plugin that supports enrollments; Speaker Detection. As a result, the workflow is set up to have a single enrollment job, to allow workflow users to enroll new speakers to be detected by this plugin. When enrollment is performed with this workflow, the newly created speaker model is created by and for the Speaker Detection plugin itself, and goes into the global OLIVE enrollment space. If a file is analyzed by directly calling this Speaker Detection plugin, the new enrollment will be part of the pool of target speakers the plugin will search for. More information on this concept of "Workflow Enrollment Jobs" is provided in the next section.

Advanced Workflow Enrollment - Jobs

It's rare, but possible for a workflow to bundle multiple enrollment-capable plugin capabilities into one. One example could be combining Speaker Detection in a workflow that also runs Query-by-Example Keyword Spotting, both of which rely on user enrollments to define their target classes. When this happens, if a user wishes to maintain the ability to enroll separate classes into each enrollable plugin, the workflow needs to expose these different enrollment tasks as separate jobs in the workflow enrollment capabilities.

If this is necessary, the workflow will come from SRI configured appropriately - the user need only be concerned with how to specify which job to enroll with.

To query which enrollment jobs are available to a workflow, use the olivepyworkflowenroll tool with the --print_jobs flag:

$ olivepyworkflowenroll --print_jobs <workflow>

Investigating the "SmartTranscription" workflow we briefly mentioned above:

$ olivepyworkflowenroll --print_jobs SmartTranscriptionFull.workflow.json
enrolling 0 files
Enrollment jobs '['SDD Enrollment']'
Un-Enrollment jobs '['SDD Unenrollment']'

We see that there is only a single Enrollment job available; SDD Enrollment. If there were others, they would be listed in this output. Now that the desired job name is known, enrolling with the specified job is done by supplying that job name to the --job flag; in this case:

$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav  --enroll "Logan" --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json