Skip to content

OLIVE Java and Python Clients

Introduction

Each OLIVE delivery includes two OLIVE client utilities - one written in Java, one written in Python. Out of the box, these tools allow a user to jump right in with running OLIVE if the GUI is not desired. These can also serve as code examples for integrating with OLIVE. This page primarily covers using these clients for processing audio, rather than integrating with the OLIVE API. For more information on integration, the nitty-gritty details of the OLIVE Enterprise API, and code examples, refer to these integration-focused pages instead:

As far as the usage and capabilities of these tools, they were meant to mirror the Legacy CLI Tools as closely as possible, and shares many input/output formats and assumptions with those tools. As this document is still under construction, referring to this older guide may help fill in some useful information that may currently be missing from this page.

Note that unlike the Legacy CLI tools, that are calling plugin code directly, these client tools require a running OLIVE Server. They are client utilities that are queueing and submitting job requests to the OLIVE server, which then manages the plugins themselves and actual audio processing. If you haven't already, please refer to the appropriate guide for setting up and starting an OLIVE server depending on your installation type:

Client Setup, Installation, Requirements

As a quick review, the contents of an OLIVE package typically look like this:

  • olive5.7.0/
    • api/
      • java/
      • python/
    • docs/
    • martini/ -or- docker/ -or- runtime/
    • OliveGUI/ - (Optional) The OLIVE Nightingale GUI (not included in all deliveries)
    • oliveAppData/

The clients this page describes are contained in the bolded api/ directory above.

Java (OliveAnalyze)

The Java tools are the most full-featured with respect to tasking individual plugins. They are asynchronous, and better able to deal with large amounts of file submissions by parallelizing the submission of large lists of files. If the primary task is enrolling and scoring audio files with individual plugins, the Java tools, what we call the OliveAnalyze suite.

The tools themselves do not need to be 'installed'. For convenience, their directory can be added to your $PATH environment variable, so that they can be called from anywhere:

 $ export PATH=$PATH:<path>/olive5.7.0/api/java/bin/
 $ OliveAnalyze -h

But they can also be left alone and called directly, as long as their full or relative path is present:

 # From inside olive5.7.0/api/java/bin:
 $ ./OliveAnalyze -h

 # From inside olive5.7.0/:
 $ ./api/java/bin/OliveAnalyze -h

 # From elsewhere:
 $ <path>/olive5.7.0/api/java/bin/OliveAnalyze -h

These tools depend on OpenJDK 11 or newer being installed. Refer to OpenJDK for more information on downloading and installing this for your operating system.

The full list of utilities in this suite are as follows:

  • OliveAnalyze
  • OliveAnalyzeText
  • OliveEnroll
  • OliveLearn (rarely used)
  • OliveWorkflow

But the most commonly used are OliveAnalyze for scoring requests, and OliveEnroll for enrollment requests. Examples are provided for each of these below, and for more advanced users that need different tools, each utility has its own help statement that can be accessed with the -h flag:

 $ OliveAnalyzeText -h

The arguments and formatting for each tool is very similar, so familiarity with the OliveAnalyze and OliveEnroll examples below should allow use of most of these tools.

Python (olivepyanalyze)

The Python client, what we call the olivepyanalyze suite, is not as fully-featured with respect to batch-processing of audio files. It performs synchronous requests to the OLIVE server, and so it will sequentially score each provided audio file, rather than submitting jobs in parallel. For this reason, the Java OliveAnalyze tools are recommended for batch processing of individual plugin tasks.

The python client tools require Python 3.8 or newer - please refer to Python for downloading and installing Python.

Installing these tools has been simplified by providing them in the form of a Python wheel, that can be easily installed with pip.

 $ cd olive5.7.0/api/python
 $ ls
   olivepy-5.7.0-py3-none-any.whl
   olivepy-5.7.0.tar.gz
 $ python3 -m pip install --upgrade pip setuptools wheel
 $ python3 -m pip install olivepy-5.7.0-py3-none-any.whl
 $ cd olive5.7.0/api/python
 $ ls
   olivepy-5.7.0-py3-none-any.whl
   olivepy-5.7.0.tar.gz
 $ python3 -m venv olivepy-virtualenvironment
 $ source olivepy-virtualenvironment/bin/activate
 (olivepy-virtualenvironment) $ python3 -m pip install --upgrade pip setuptools wheel
 (olivepy-virtualenvironment) $ python3 -m pip install olivepy-5.7.0-py3-none-any.whl

This will fetch and install (if necessary) the olivepy dependencies, and install the olivepy tools. Those dependencies are:

The olivepy utilities closely mirror the Java utilities, with the addition of the workflow tool, and are as follows:

  • olivepyanalyze
  • olivepyenroll
  • olivepylearn (rarely used)
  • olivepyworkflow
  • olivepyworkflowenroll

The olivepyworkflow tools are the most important, and examples are provided below for both scoring with olivepyworkflow and enrollment with olivepyworkflowenroll. We also provide examples for olivepyanalyze and olivepyenroll that mirror the Java examples.

Scoring/Analysis Requests

Background: Plugin Scoring Types

In general, the output format will depend on the type of ‘scorer’ the plugin being used is.

For a deeper dive into OLIVE scoring types, please refer to the appropriate section in the OLIVE Plugin Traits Guide, but a brief overview follows. The most common types of plugins in OLIVE are:

Global Scorer

Any plugin that reports a single score for a given model over the entire test audio file is a global scoring plugin. Every input test audio file will be assigned a single score for each enrolled target model, as measured by looking at the entire file at once.

Speaker and Language Identification are examples of global scorers.

OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.

Global Scorer Output

In the case of global scorers like LID and SID, the output file, which by default is called output.txt, contains one or more lines containing the audio path, speaker/language ID (class id), and the score:

<audio_path> <class_id> <score>

For example, a Speaker Identification analysis run, with three enrolled speakers (Alex, Taylor, Blake) might return:

/data/sid/audio/file1.wav Alex -0.5348
/data/sid/audio/file1.wav Taylor 3.2122
/data/sid/audio/file1.wav Blake -5.5340
/data/sid/audio/file2.wav Alex 0.5333
/data/sid/audio/file2.wav Taylor -4.9444
/data/sid/audio/file2.wav Blake -2.6564

Note the actual meanings of the scores and available classes will vary from plugin-to-plugin. Please refer to individual plugin documentation for more guidance on what the scores mean and what ranges are acceptable.

Also note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a global-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, class_id, score) are still provided, but packed into a json structure.

Region scorer

Region scoring plugins are capable of considering each audio file in small pieces at a time. Scores are reported for enrolled target models along with the location within that audio file that they are thought to occur. This allows OLIVE to pinpoint individual keywords or phrases or pick out one specific speaker in a recording where several people may be talking.

Automatic Speech Recognition (ASR), Language Detection (LDD), and Speaker Detection (SDD) are all region scorers.

OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.

Region Scorer Output

Region scoring plugins will generate a single output file, that is also called output.txt by default, just like global scorers. The file looks very similar to a global scorer’s output, but includes a temporal component to each line that represents the start and end of each scored region. In practice, this looks like:

<audio_path> <region_start_timestamp> <region_end_timestamp> <class_id> <score>

For example, a language detection plugin might output something like this:

/data/mixed-language/testFile1.wav 2.170 9.570 Arabic 0.912
/data/mixed-language/testFile1.wav 10.390 15.930 French 0.693
/data/mixed-language/testFile1.wav 17.639 22.549 English 0.832
/data/mixed-language/testFile2.wav 0.142 35.223 Pashto 0.977

Each test file can have multiple regions where scores are reported, depending on the individual plugin. The region boundary timestamps are in seconds. More specific examples can be found in the respective plugin-specific documentation pages. As with global scoring, note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a region-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, region start and end timestamps, class_id, score) are still provided, but packed into a json structure.

Plugin Direct (Analysis)

Performing an analysis request with both tools is very similar, as the tools were designed to closely mirror each other so that familiarity with one would easily transfer to the other. The usage statements for each can be examined by invoking each with their -h or --help flag:

$ ./OliveAnalyze -h
usage: OliveAnalyze
    --align                Perform audio alignment analysis.  Must specify
                        the two files to compare using an input list
                        file via the--list argument
    --apply_update         Request the plugin is updated (if supported)
    --box                  Perform bounding box  analysis.  Must specify
                        an image or video input
    --cabundlepass <arg>   Specifies the certificate authority passphrase
                        of the certificate authority.
    --cabundlepath <arg>   Specifies the path of the certificate authority
    --certpass <arg>       Specifies the certificate passphrase to unlock
                        the encrypted certificate key
    --certpath <arg>       Specifies the path of the certificate
    --channel <arg>        Process stereo files using channel NUMBER
    --class_ids <arg>      Use Class(s) from FILE for scoring.  Each line
                        in the file contains a single class, including
                        any white space
    --compare              Perform audio compare analysis.  Must specify
                        the two files to compare using an input list
                        file via the--list argument
    --decoded              Send audio file as decoded PCM16 samples
                        instead of sending as serialized buffer.  Input
                        file must be a wav file
    --domain <arg>         Use Domain NAME
    --enhance              Perform audio conversion (enhancement)
    --frame                Perform frame scoring analysis
    --global               Perform global scoring analysis
-h                        Print this help message
-i,--input <arg>          NAME of the input file (audio/video/image as
                        required by the plugin
    --input_list <arg>     Use an input list FILE having multiple
                        filenames/regions or PEM formatted
-l,--load                 load a plugin now, must use --plugin and
                        --domain to specify the plugin/domain to
                        preload
    --options <arg>        options from FILE
    --output <arg>         Write any output to DIR, default is ./
-p,--port <arg>           Scenicserver port number. Defauls is 5588
    --path                 Send audio file path instead of a buffer.
                        Server and client must share a filesystem to
                        use this option
    --plugin <arg>         Use Plugin NAME
    --print <arg>          Print all available plugins and domains.
                        Optionally add 'verbose' as a print option to
                        print full plugin details including traits and
                        classes
-r,--unload               unload a loaded plugin now, must use --plugin
                        and --domain to specify the plugin/domain to
                        unload
    --region               Perform region scoring analysis
-s,--server <arg>         Scenicserver hostname. Default is localhost
    --secure               Indicates a secure connection should be made.
                        Requires --certpath, --cabundlepath,
                        --certpass, and --cabundlepass to be set.
    --shutdown             Request a clean shutdown of the server
    --status               Print the current status of the server
-t,--timeout <arg>        timeout (in seconds) when waiting for server
                        response.  Default is 10 seconds
    --threshold <arg>      Apply threshold NUMBER when scoring
    --update_status        Get the plugin's update status
    --upload_files         Must be specified with --path argument. This
                        uploads the files to the server so the client
                        and server do not need to share a filesystem.
                        This can also be used to bypass the 2 GB
                        request size limitation.
-v,--vec <arg>            PATH to a serialized AudioVector, for plugins
                        that support audio vectors in addition to wav
                        files
    --vector               Perform audio vectorization
$ olivepyanalyze -h
usage: olivepyanalyze [-h] [-C CLIENT_ID] [-p PLUGIN] [-d DOMAIN] [-G] [-e] [-f] [-g] [-r] [-b] [-P PORT] [--upload_port UPLOAD_PORT] [-s SERVER] [-t TIMEOUT] [-i INPUT]
                    [--input_list INPUT_LIST] [--text] [--options OPTIONS] [--class_ids CLASS_IDS] [--debug] [--concurrent_requests CONCURRENT_REQUESTS] [--path] [--upload_files]
                    [--secure] [--certpath CERTPATH] [--keypath KEYPATH] [--keypass KEYPASS] [--cabundlepath CABUNDLEPATH] [--print]

options:
-h, --help            show this help message and exit
-C CLIENT_ID, --client-id CLIENT_ID
                        Experimental: the client_id to use
-p PLUGIN, --plugin PLUGIN
                        The plugin to use.
-d DOMAIN, --domain DOMAIN
                        The domain to use
-G, --guess           Experimental: guess the type of analysis to use based on the plugin/domain.
-e, --enhance         Enhance the audio of a wave file, which must be passed in with the --wav option.
-f, --frame           Do frame based analysis of a wave file, which must be passed in with the --wav option.
-g, --global          Do global analysis of a wave file, which must be passed in with the --wav option.
-r, --region          Do region based analysis of a wave file, which must be passed in with the --wav option.
-b, --box             Do bounding box based analysis of an input file, which must be passed in with the --wav option.
-P PORT, --port PORT  The port to use.
--upload_port UPLOAD_PORT
                        The upload port to use when specifying --upload_files.
-s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
-t TIMEOUT, --timeout TIMEOUT
                        The timeout to use
-i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify the --text flag
--input_list INPUT_LIST
                        A list of files to analyze. One file per line.
--text                Indicates that input (or input list) is a literal text string to send in the analysis request.
--options OPTIONS     Optional file containing plugin properties ans name/value pairs.
--class_ids CLASS_IDS
                        Optional file containing plugin properties ans name/value pairs.
--debug               Debug mode
--concurrent_requests CONCURRENT_REQUESTS
                        # of concurrent requests to send the server
--path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option unless --upload_files is also used.
--upload_files        Must be specified with --path argument. This uploads the files to the server so the client and server do not need to share a filesystem. This can also be used to
                        bypass the 2 GB request size limitation.
--secure              Indicates a secure connection should be made. Requires --certpath, --keypass, --keypath, and --cabundlepath to be set.
--certpath CERTPATH   Specifies the path of the certificate
--keypath KEYPATH     Specifies the path of the certificate key
--keypass KEYPASS     Specifies the certificate passphrase to unlock the encrypted certificate key
--cabundlepath CABUNDLEPATH
                        Specifies the path of the certificate authority
--print               Print all available plugins and domains

To perform a scoring request with these tools, you will need these essential pieces of information:

  • Plugin name (--plugin)
  • Domain name (--domain)
  • Scoring type to perform (--region for region-scoring, --global for global-scoring, others for less-common plugins)
  • Input audio file or list of input audio files (--input for a single file, --input_list for a list of files)

The flag for providing each piece of information is the same for both tools, as shown in the list above.

For more information on what the difference is between a plugin and a domain, refer to the Plugins Overview. For more information on the domains available for each plugin, refer to documentation page for that specific plugin.

To see which plugins and domains you have installed and running in your specific OLIVE environment, refer to the server startup status message, that appears when you start the server:

  • martini.sh start, then martini.sh log once the server is running for martini-based OLIVE packages (most common)
  • ./run.sh for non-martini docker OLIVE packages
  • oliveserver for native linux OLIVE packages

Or exercise the --print option for each tool to query the server and print the available plugins and domains:

$ ./OliveAnalyze --print
$ olivepyanalyze --print

Example output:

2022-06-14 12:12:25.786 INFO  com.sri.speech.olive.api.Server - Connected to localhost - request port: 5588 status_port: 5589
Found 8 plugin(s):
Plugin: sad-dnn-v7.0.2 (SAD,Speech) v7.0.2 has 2 domain(s):
    Domain: fast-multi-v1, Description: Trained with Telephony, PTT and Music data
    Domain: multi-v1, Description: Trained with Telephony, PTT and Music data
Plugin: asr-dynapy-v3.0.0 (ASR,Content) v3.0.0 has 9 domain(s):
    Domain: english-tdnnChain-tel-v1, Description: Large vocabulary English DNN model for 8K data
    Domain: farsi-tdnnChain-tel-v1, Description: Large vocabulary Farsi DNN model for 8K data
    Domain: french-tdnnChain-tel-v2, Description: Large vocabulary African French DNN Chain model for 8K data
    Domain: iraqiArabic-tdnnChain-tel-v1, Description: Large vocabulary Iraqi Arabic DNN Chain model for 8K data
    Domain: levantineArabic-tdnnChain-tel-v1, Description: Large vocabulary Levantine Arabic DNN Chain model for 8K data
    Domain: mandarin-tdnnChain-tel-v1, Description: Large vocabulary Mandarin DNN model for clean CTS 8K data
    Domain: pashto-tdnnChain-tel-v1, Description: Large vocabulary Pashto DNN Chain model for 8K data
    Domain: russian-tdnnChain-tel-v2, Description: Large vocabulary Russian DNN model for 8K data
    Domain: spanish-tdnnChain-tel-v1, Description: Large vocabulary Spanish DNN model for clean CTS 8K data
Plugin: sdd-diarizeEmbedSmolive-v1.0.0 (SDD,Speaker) v1.0.0 has 1 domain(s):
    Domain: telClosetalk-int8-v1, Description: Speaker Embeddings Framework
Plugin: tmt-neural-v1.0.0 (TMT,Content) v1.0.0 has 3 domain(s):
    Domain: cmn-eng-nmt-v1, Description: Mandarin Chinese to English NMT
    Domain: rus-eng-nmt-v1, Description: Russian to English NMT
    Domain: spa-eng-nmt-v3, Description: Spanish to English NMT
Plugin: ldd-embedplda-v1.0.1 (LDD,Language) v1.0.1 has 1 domain(s):
    Domain: multi-v1, Description: PNCC bottleneck domain suitable for mixed conditions (tel/mic/compression)
Plugin: sdd-diarizeEmbedSmolive-v1.0.2 (SDD,Speaker) v1.0.2 has 1 domain(s):
    Domain: telClosetalk-smart-v1, Description: Speaker Embeddings Framework
Plugin: sid-dplda-v2.0.2 (SID,Speaker) v2.0.2 has 1 domain(s):
    Domain: multi-v1, Description: Speaker Embeddings DPLDA
Plugin: lid-embedplda-v3.0.1 (LID,Language) v3.0.1 has 1 domain(s):
    Domain: multi-v1, Description: PNCC Bottleneck embeddings suitable for mixed conditions (tel/mic/compression)

Examples

To perform a global score analysis on a single file with the speaker identification plugin sid-dplda-v2.0.2, using the multi-v1 domain, the calls for each would look like this:

$ ./OliveAnalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav
$ olivepyanalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav

Performing region scoring instead, using a transcription plugin, asr-dynapy-v3.0.0, via the english domain english-tdnnChain-tel-v1 on a list of audio files would be performed with:

$ ./OliveAnalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt
$ olivepyanalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt

Where the format of the input list is simply a text file with a path to an audio file on each line. For example:

/data/mixed-language/testFile1.wav 
/data/mixed-language/testFile2.wav 
/data/mixed-language/testFile3.wav
/moreData/test-files/unknown1.wav

Workflow (Analysis)

OLIVE Workflows provide a simple way of creating a sort of 'recipe' that specifies how to deal with the input data and one or more OLIVE plugins. It allows complex operations to be requested and performed with a single, simple call to the system by allowing complexities and specific knowledge to be encapsulated within the workflow itself, rather than known and implemented by the user at run time. Due to off-loading this burden, operating with workflows is much simpler than calling the plugin(s) directly - typically all that is needed from the user to request an analysis from a workflow client is the workflow itself, and one or more input files. There are more options available to the workflow client, as shown in the usage statement:

./OliveWorkflow -h
usage: OliveWorkflow
    --cabundlepass <arg>   Specifies the certificate authority passphrase
                           of the certificate authority.
    --cabundlepath <arg>   Specifies the path of the certificate authority
    --certpass <arg>       Specifies the certificate passphrase to unlock
                           the encrypted certificate key
    --certpath <arg>       Specifies the path of the certificate
-h                         Print this help message
-i,--input <arg>           The data input to analyze. Either a pathname to
                           an audio/image/video file or a string for text
                           input. For text input, also specify the --text
                           flag
    --input_list <arg>     A list of files to analyze. One file per line.
    --options <arg>        options from FILE
-p,--port <arg>            Scenicserver port number. Defauls is 5588
    --print_class_ids      Print the class IDs available for analysis in
                           the specified workflow.
    --print_tasks          Print the workflow analysis tasks.
-s,--server <arg>          Scenicserver hostname. Default is localhost
    --secure               Indicates a secure connection should be made.
                           Requires --certpath, --cabundlepath,
                           --certpass, and --cabundlepass to be set.
-t,--timeout <arg>         Timeout (in seconds) when waiting for server
                           response.  Default is 10 seconds
    --workflow <arg>       The workflow definition to use.
$ olivepyworkflow -h
usage: olivepyworkflow [-h] [--tasks] [--print_class_ids] [--class_ids CLASS_IDS] [--print_actualized] [--print_workflow] [-s SERVER] [-P PORT] [--upload_port UPLOAD_PORT] [-t TIMEOUT]
                    [-i INPUT] [--input_list INPUT_LIST] [--text] [--options OPTIONS] [--path] [--upload_files] [--secure] [--certpath CERTPATH] [--keypath KEYPATH]
                    [--keypass KEYPASS] [--cabundlepath CABUNDLEPATH] [--debug]
                    workflow

Perform OLIVE analysis using a Workflow Definition file

positional arguments:
workflow              The workflow definition to use.

options:
-h, --help            show this help message and exit
--tasks               Print the workflow analysis tasks.
--print_class_ids     Print the class IDs available for analysis in the specified workflow.
--class_ids CLASS_IDS
                        Send class IDs with the analysis request.
--print_actualized    Print the actualized workflow info.
--print_workflow      Print the workflow definition file info (before it is actualized, if requested)
-s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
-P PORT, --port PORT  The port to use.
--upload_port UPLOAD_PORT
                        The upload port to use when specifying --upload_files.
-t TIMEOUT, --timeout TIMEOUT
                        The timeout (in seconds) to wait for a response from the server
-i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify the --text flag
--input_list INPUT_LIST
                        A list of files to analyze. One file per line.
--text                Indicates that input (or input list) is a literal text string to send in the analysis request.
--options OPTIONS     A JSON formatted string of workflow options such as [{"task":"SAD", "options":{"filter_length":99, "interpolate":1.0}] or {"filter_length":99, "interpolate":1.0,
                        "name":"midge"}, where the former options are only applied to the SAD task, and the later are applied to all tasks
--path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
--upload_files        Must be specified with --path argument. This uploads the files to the server so the client and server do not need to share a filesystem. This can also be used to
                        bypass the 2 GB request size limitation.
--secure              Indicates a secure connection should be made. Requires --certpath, --keypass, --keypath, and --cabundlepath to be set.
--certpath CERTPATH   Specifies the path of the certificate
--keypath KEYPATH     Specifies the path of the certificate key
--keypass KEYPASS     Specifies the certificate passphrase to unlock the encrypted certificate key
--cabundlepath CABUNDLEPATH
                        Specifies the path of the certificate authority
--debug               Debug mode

But their use rarely necessary, and is reserved for advanced users or specific system testing.

Generically, calling a workflow client will look like this:

Python

$ olivepyworkflow --input ~/path/to/test-file1.wav <workflow>
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt <workflow> 

Java

$ ./OliveWorkflow --input ~/path/to/test-file1.wav --workflow <workflow>
$ ./OliveWorkflow --input_list ~/path/to/list-of-audio-files.txt --workflow <workflow> 

As an example of the power of workflows, this request calls the SmartTranscription workflow - that performs Speech Activity Detection (region scoring), Speaker Diarization and Detection (region scoring), Language Detection (region scoring), and then Automatic Speech Recognition (region scoring) on any sections of each input file that are detected to be a language that ASR currently has support for, and returns all of the appropriate results in a JSON structure. Performing this same task by calling plugins directly, this same functionality would be a minimum of 4 separate calls to OLIVE; significantly more if more than one language is detected being spoken in the file.

Python

$ olivepyworkflow --input ~/path/to/test-file1.wav ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

Java

$ ./OliveWorkflow --input ~/path/to/test-file1.wav --workflow ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ ./OliveWorkflow --input_list ~/path/to/list-of-audio-files.txt --workflow ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

Output Format (Workflow)

Workflows are generally customer/user-specific and can be quite specialized - the output format and structure will depend heavily on the individual workflow itself and the tasks being performed. All of the information pieces that define each scoring type are still reported for each result, but the results are organized into a single JSON structure for the workflow call. This means that the output of a region scoring plugin within the workflow is still one or more sets of:

<region_start_timestamp> <region_end_timestamp> <class_id> <score>

But the data is arranged into the JSON structure and will be nested depending on the structure of the workflow itself and how the audio is routed by the workflow. For more detailed information on the structure of this JSON message and the inner-workings of workflows, please refer to the OLIVE Workflow API documentation. A brief, simplified summary to jump start working with workflow output follows.

The main skeleton structure of the results output is shown below, along with an actual example. The results are provided as a result for each input file, that lists the job name(s), some metadata about the input audio and how it was processed, and then the returned results (if any) for each task, which is generally a plugin.

Workflow analysis results:
[
 {
  "job_name": <workflow job name>,
  "data": [
   {
    "data_id": <data ID, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <data label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ],
   <task 2 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ],
   <task 3 name>: [
    {
     "task_trait": "REGION_SCORER",
     "task_type": <task type>,
     "message_type": "REGION_SCORER_RESULT",
     "plugin": <plugin>,
     "domain": <domain>,
     "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 class name>,
        "score": <region 1 score>
       },
       ...
       {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N class name>,
        "score": <region N score>
       }
      ]
     }
    }
   ]
  }
 }
]
Workflow analysis results:
[
 {
  "job_name": "SAD, SDD, and ASR English Workflow",
  "data": [
   {
    "data_id": "z_eng_englishdemo.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 5.932625,
    "number_channels": 1,
    "label": "z_eng_englishdemo.wav",
    "id": "0b04c7497521d53a5d6939533a55c461795f9d685b1bd19fd9031fc6f3997a8f"
   }
  ],
  "tasks": {
   "SAD_REGIONS": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SAD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sad-dnn-v7.0.2",
     "domain": "multi-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 5.93,
        "class_id": "speech",
        "score": 0.0
       }
      ]
     }
    }
   ],
   "SDD": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SDD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sdd-diarizeEmbedSmolive-v1.0.2",
     "domain": "telClosetalk-smart-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.1,
        "end_t": 5.0,
        "class_id": "unknownspk00",
        "score": 1.4
       }
      ]
     }
    }
   ],
   "ASR": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "ASR",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "asr-dynapy-v3.0.0",
     "domain": "english-tdnnChain-tel-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.15,
        "end_t": 0.51,
        "class_id": "hello",
        "score": 100.0
       },
       {
        "start_t": 0.54,
        "end_t": 0.69,
        "class_id": "my",
        "score": 100.0
       },
       {
        "start_t": 0.69,
        "end_t": 0.87,
        "class_id": "name",
        "score": 99.0
       },
       {
        "start_t": 0.87,
        "end_t": 1.05,
        "class_id": "is",
        "score": 99.0
       },
       {
        "start_t": 1.05,
        "end_t": 1.35,
        "class_id": "evan",
        "score": 88.0
       },
       {
        "start_t": 1.35,
        "end_t": 1.47,
        "class_id": "this",
        "score": 99.0
       },
       {
        "start_t": 1.5,
        "end_t": 1.98,
        "class_id": "audio",
        "score": 95.0
       },
       {
        "start_t": 1.98,
        "end_t": 2.16,
        "class_id": "is",
        "score": 74.0
       },
       {
        "start_t": 2.16,
        "end_t": 2.31,
        "class_id": "for",
        "score": 99.0
       },
       {
        "start_t": 2.31,
        "end_t": 2.4,
        "class_id": "the",
        "score": 99.0
       },
       {
        "start_t": 2.4,
        "end_t": 2.91,
        "class_id": "purposes",
        "score": 100.0
       },
       {
        "start_t": 2.91,
        "end_t": 3.06,
        "class_id": "of",
        "score": 99.0
       },
       {
        "start_t": 3.12,
        "end_t": 3.81,
        "class_id": "demonstrating",
        "score": 100.0
       },
       {
        "start_t": 3.81,
        "end_t": 3.96,
        "class_id": "our",
        "score": 78.0
       },
       {
        "start_t": 4.05,
        "end_t": 4.44,
        "class_id": "language",
        "score": 100.0
       },
       {
        "start_t": 4.44,
        "end_t": 4.53,
        "class_id": "and",
        "score": 93.0
       },
       {
        "start_t": 4.53,
        "end_t": 4.89,
        "class_id": "speaker",
        "score": 100.0
       },
       {
        "start_t": 4.89,
        "end_t": 5.01,
        "class_id": "i.",
        "score": 99.0
       },
       {
        "start_t": 5.01,
        "end_t": 5.22,
        "class_id": "d.",
        "score": 99.0
       },
       {
        "start_t": 5.22,
        "end_t": 5.85,
        "class_id": "capabilities",
        "score": 99.0
       }
      ]
     }
    }
   ]
  }
 }
]

Each task output will typically be for a single plugin, and will be outputting the information provided by a Region Scorer or Global Scorer or Text Transformer in the case of Machine Translation, depending on how the workflow is using the plugin. The format of each result sub part is:

<task name>: [
  {
    "task_trait": "GLOBAL_SCORER",
    "task_type": <task type, generally LID, SID, etc.>,
    "message_type": "GLOBAL_SCORER_RESULT",
    "plugin": <plugin>,
    "domain": <domain>,
    "analysis": {
      "score": [
      {
        "class_id": <class 1>,
        "score": <class 1 score>
      },
      {
        "class_id": <class 2>,
        "score": <class 2 score>
      },
      ...
      {
        "class_id": <class N>,
        "score": <class N score>
      }
     ]
    }
  }
]
<task name>: [
  {
    "task_trait": "REGION_SCORER",
    "task_type": <task type, typically ASR, SDD, SAD, etc.>,
    "message_type": "REGION_SCORER_RESULT",
    "plugin": <plugin>,
    "domain": <domain>,
    "analysis": {
      "region": [
       {
        "start_t": <region 1 start time (s)>,
        "end_t": <region 1 end time (s)>,
        "class_id": <region 1 detected class>,
        "score": <region 1 score>
      },
      {
        "start_t": <region 2 start time (s)>,
        "end_t": <region 2 end time (s)>,
        "class_id": <region 2 detected class>,
        "score": <region 2 score>
      },
      ...
      {
        "start_t": <region N start time (s)>,
        "end_t": <region N end time (s)>,
        "class_id": <region N detected class>,
        "score": <region N score>
      },
     ]
    }
  }
]
<task name>: [
  {
    "task_trait": "TEXT_TRANSFORMER",
    "task_type": <task type, typically MT>,
    "message_type": "TEXT_TRANSFORM_RESULT",
    "plugin": <plugin name>,
    "domain": <domain name>,
    "analysis": {
     "transformation": [
       {
        "class_id": "test_label",
        "transformed_text": <the translated/transformed text returned from the plugin>
       }
     ]
    }
  }
]

Many workflows consist of a single job, and bundle all plugin tasks into this single job, as seen above. More complex workflows, or what OLIVE calls "Conditional Workflows" can pack multiple jobs into a single workflow. This happens when there are certain tasks in the workflow that depend on other tasks in the workflow - for example when OLIVE needs to choose the appropriate Speech Recognition (ASR) language to use, depending on what language is detected being spoken by Language Identification (LID) or Language Detection (LDD). In this case, the LID/LDD is separated into one job, and the ASR into another, that is triggered to run once LID/LDD's decision is known. In this case, the results from each job are grouped accordingly in the results output. Below shows a simplified output from a workflow that includes three jobs; "job 1", "job 2", "job 3", and a real-life example output from the SmartTranscription conditional workflow that also has three jobs;

  1. the first performs Speech Activity Detection and Language Identification (LID);
    • Smart Translation SAD and LID Pre-processing
  2. the second uses the language decision from Language Identification to choose the appropriate (if any) language and domain for Automatic Speech Recognition (ASR) and runs that,
    • Dynamic ASR
  3. the third takes the output transcript from ASR and the language decision from LID and choose the appropriate (if any) language and domain for Text Machine Translation, and runs that.
    • Dynamic MT

As you can see below, these jobs are listed separately in the JSON for each result:

Workflow analysis results:
[
 {
  "job name": <job 1 name>,
  "data": [
   {
    "data_id": <data identifier, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <audio label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name (job 1)>: [
    {
      <task 1 results>
    }
   ],
   ...
   <task N name (job 1)>: [
    {
      <task N results>
    }
   ]
  }
 },
 {
  "job name": <job 2 name>,
  "data": [
   {
    "data_id": <data identifier, typically audio file name>,
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": <sample rate>,
    "duration_seconds": <audio duration>,
    "number_channels": 1,
    "label": <audio label>,
    "id": <input audio UUID>
   }
  ],
  "tasks": {
   <task 1 name (job 2)>: [
    {
      <task 1 results>
    }
   ],
   ...
   <task N name (job 2)>: [
    {
      <task N results>
    }
   ]
  }
 },
 ... <repeat if more jobs>
]
Workflow analysis results:
[
 {
  "job_name": "Smart Translation SAD and LID Pre-processing",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 8.0,
    "number_channels": 1,
    "label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
   }
  ],
  "tasks": {
   "SAD_REGIONS": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "SAD",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "sad-dnn-v7.0.2",
     "domain": "multi-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 8.0,
        "class_id": "speech",
        "score": 0.0
       }
      ]
     }
    }
   ],
   "LID": [
    {
     "task_trait": "GLOBAL_SCORER",
     "task_type": "LID",
     "message_type": "GLOBAL_SCORER_RESULT",
     "plugin": "lid-embedplda-v3.0.1",
     "domain": "multi-v1",
     "analysis": {
      "score": [
       {
        "class_id": "Mandarin",
        "score": 3.5306692
       },
       {
        "class_id": "Korean",
        "score": -1.9072952
       },
       {
        "class_id": "Japanese",
        "score": -3.7805116
       },
       {
        "class_id": "Tagalog",
        "score": -7.4819508
       },
       {
        "class_id": "Vietnamese",
        "score": -8.094855
       },
       {
        "class_id": "Iraqi Arabic",
        "score": -10.63325
       },
       {
        "class_id": "Levantine Arabic",
        "score": -10.694491
       },
       {
        "class_id": "French",
        "score": -11.542379
       },
       {
        "class_id": "Pashto",
        "score": -12.11981
       },
       {
        "class_id": "English",
        "score": -12.323014
       },
       {
        "class_id": "Modern Standard Arabic",
        "score": -12.626052
       },
       {
        "class_id": "Spanish",
        "score": -13.469315
       },
       {
        "class_id": "Iranian Persian",
        "score": -13.763366
       },
       {
        "class_id": "Amharic",
        "score": -17.129797
       },
       {
        "class_id": "Portuguese",
        "score": -17.31257
       },
       {
        "class_id": "Russian",
        "score": -18.770994
       }
      ]
     }
    }
   ]
  }
 },
 {
  "job_name": "Dynamic ASR",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "PREPROCESSED_AUDIO_RESULT",
    "mode": "MONO",
    "merged": false,
    "sample_rate": 8000,
    "duration_seconds": 8.0,
    "number_channels": 1,
    "label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
   }
  ],
  "tasks": {
   "ASR": [
    {
     "task_trait": "REGION_SCORER",
     "task_type": "ASR",
     "message_type": "REGION_SCORER_RESULT",
     "plugin": "asr-dynapy-v3.0.0",
     "domain": "mandarin-tdnnChain-tel-v1",
     "analysis": {
      "region": [
       {
        "start_t": 0.0,
        "end_t": 0.18,
        "class_id": "跟",
        "score": 31.0
       },
       {
        "start_t": 0.18,
        "end_t": 0.36,
        "class_id": "一个",
        "score": 83.0
       },
       {
        "start_t": 0.36,
        "end_t": 0.66,
        "class_id": "肯定",
        "score": 100.0
       },
       {
        "start_t": 0.66,
        "end_t": 0.81,
        "class_id": "是",
        "score": 83.0
       },
       {
        "start_t": 0.81,
        "end_t": 1.23,
        "class_id": "北京",
        "score": 95.0
       },
       {
        "start_t": 1.23,
        "end_t": 1.47,
        "class_id": "啊",
        "score": 96.0
       },
       {
        "start_t": 2.07,
        "end_t": 2.49,
        "class_id": "他俩",
        "score": 96.0
       },
       {
        "start_t": 2.7,
        "end_t": 3.09,
        "class_id": "上海",
        "score": 99.0
       },
       {
        "start_t": 3.09,
        "end_t": 3.21,
        "class_id": "的",
        "score": 99.0
       },
       {
        "start_t": 3.21,
        "end_t": 3.57,
        "class_id": "人口",
        "score": 99.0
       },
       {
        "start_t": 3.57,
        "end_t": 3.87,
        "class_id": "好像",
        "score": 73.0
       },
       {
        "start_t": 3.87,
        "end_t": 3.99,
        "class_id": "没",
        "score": 54.0
       },
       {
        "start_t": 3.99,
        "end_t": 4.32,
        "class_id": "北京",
        "score": 74.0
       },
       {
        "start_t": 4.32,
        "end_t": 4.68,
        "class_id": "多",
        "score": 99.0
       },
       {
        "start_t": 4.86,
        "end_t": 5.19,
        "class_id": "但是",
        "score": 100.0
       },
       {
        "start_t": 5.4,
        "end_t": 5.91,
        "class_id": "不知道",
        "score": 100.0
       },
       {
        "start_t": 6.06,
        "end_t": 6.48,
        "class_id": "@reject@",
        "score": 62.0
       },
       {
        "start_t": 6.69,
        "end_t": 7.05,
        "class_id": "其他",
        "score": 93.0
       },
       {
        "start_t": 7.05,
        "end_t": 7.2,
        "class_id": "的",
        "score": 97.0
       },
       {
        "start_t": 7.65,
        "end_t": 7.89,
        "class_id": "上",
        "score": 32.0
       },
       {
        "start_t": 7.89,
        "end_t": 7.95,
        "class_id": "啊",
        "score": 67.0
       }
      ]
     }
    }
   ]
  }
 },
 {
  "job_name": "Dynamic MT",
  "data": [
   {
    "data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
    "msg_type": "WORKFlOW_TEXT_RESULT",
    "text": "跟 一个 肯定 是 北京 啊 他俩 上海 的 人口 好像 没 北京 多 但是 不知道 @reject@ 其他 的 上 啊"
   }
  ],
  "tasks": {
   "MT": [
    {
     "task_trait": "TEXT_TRANSFORMER",
     "task_type": "MT",
     "message_type": "TEXT_TRANSFORM_RESULT",
     "plugin": "tmt-neural-v1.0.0",
     "domain": "cmn-eng-nmt-v1",
     "analysis": {
      "transformation": [
       {
        "class_id": "test_label",
        "transformed_text": "with someone in beijing they don't seem to have a population in shanghai but we don't know what else to do"
       }
      ]
     }
    }
   ]
  }
 }
]

Enrollment Requests

Enrollments are a sub-set of classes that the user can create and/or modify. These are used for classes that cannot be known ahead of time and therefore can't be pre-loaded into the system, such as specific speakers or keywords of interest. To determine if a plugin supports or requires enrollments, or to check what its default enrolled classes are (if any), refer to that plugin's details page, linked from the navigation or the Release Plugins page.

Enrollment list format

As with analysis, both the Java and Python tools were designed to share as much of a common interface as possible, and as such share an input list format when providing exemplars for enrollment. The audio enrollment list input file is formatted as one or more newline-separated lines containing a path to an audio file and a class or model ID, which can be a speaker name, topic name, or query name for SID, TPD, and QBE respectively. A general example is given below, and more details and plugin-specific enrollment information are provided in the appropriate section in each plugin's documentation. Format:

<audio_path> <model_id>

Example enrollment list file (SID):

/data/speaker1/audiofile1.wav speaker1
/data/speaker1/audiofile2.wav speaker1
/data/speaker7/audiofile1.wav speaker7

Plugin Direct (Enrollment)

Performing an enrollment request is similar to an analysis request and is again very similar between the two tools. The usage statements for each can be examined by invoking each with their -h or --help flag:

$ ./OliveEnroll -h
usage: OliveEnroll
    --cabundlepass <arg>   Specifies the certificate authority passphrase
                        of the certificate authority.
    --cabundlepath <arg>   Specifies the path of the certificate authority
    --certpass <arg>       Specifies the certificate passphrase to unlock
                        the encrypted certificate key
    --certpath <arg>       Specifies the path of the certificate
    --channel <arg>        Process stereo files using channel NUMBER
    --classes              Print class names if also printing
                        plugin/domain names.  Must use with --print
                        option.  Default is to not print class IDs
    --decoded              Sennd audio file as a decoded PCM16 sample
                        buffer instead of a serialized buffer. The file
                        must be a WAV file
    --domain <arg>         Use Domain NAME
    --enroll <arg>         Enroll speaker NAME. If no name specified then,
                        the pem or list option must specify an input
                        file
    --export <arg>         Export speaker NAME to an EnrollmentModel
                        (enrollment.tar.gz)
-h                        Print this help message
-i,--input <arg>          NAME of the input file (input varies by plugin:
                        audio, image, or video)
    --import <arg>         Import speaker from EnrollmentModel FILE
    --input_list <arg>     Batch enroll using this input list FILE having
                        multiple filenames/class IDs or PEM formmated
                        file
    --nobatch              Disable batch enrollment when using pem or list
                        input files, so that files are processed
                        serially
    --options <arg>        Enrollment options from FILE
    --output <arg>         Write any output to DIR, default is ./
-p,--port <arg>           Scenicserver port number. Defauls is 5588
    --path                 Send the path to the audio file instead of a
                        (serialized) buffer.  The server must have
                        access to this path.
    --plugin <arg>         Use Plugin NAME
    --print                Print all plugins and domains that suport
                        enrollment and/or class import and export
    --remove <arg>         Remove audio enrollment for NAME
-s,--server <arg>         Scenicserver hostname. Default is localhost
    --secure               Indicates a secure connection should be made.
                        Requires --certpath, --cabundlepath,
                        --certpass, and --cabundlepass to be set.
-t,--timeout <arg>        timeout (in seconds) when waiting for server
                        response.  Default is 10 seconds
    --unenroll <arg>       Un-enroll all enrollments for speaker NAME
-v,--vec <arg>            PATH to a serialized AudioVector, for plugins
                        that support audio vectors in addition to wav
                        files
$ olivepyenroll -h
usage: olivepyenroll [-h] [-C CLIENT_ID] [-D] [-p PLUGIN] [-d DOMAIN] [-e ENROLL] [-u UNENROLL] [-s SERVER] [-P PORT] [--upload_port UPLOAD_PORT] [-t TIMEOUT] [-i INPUT]
                    [--input_list INPUT_LIST] [--nobatch NOBATCH] [--path] [--upload_files] [--secure] [--certpath CERTPATH] [--keypath KEYPATH] [--keypass KEYPASS]
                    [--cabundlepath CABUNDLEPATH]

options:
-h, --help            show this help message and exit
-C CLIENT_ID, --client-id CLIENT_ID
                        Experimental: the client_id to use
-D, --debug           The domain to use
-p PLUGIN, --plugin PLUGIN
                        The plugin to use.
-d DOMAIN, --domain DOMAIN
                        The domain to use
-e ENROLL, --enroll ENROLL
                        Enroll with this name.
-u UNENROLL, --unenroll UNENROLL
                        Uneroll with this name.
-s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
-P PORT, --port PORT  The port to use.
--upload_port UPLOAD_PORT
                        The upload port to use when specifying --upload_files.
-t TIMEOUT, --timeout TIMEOUT
                        The timeout to use
-i INPUT, --input INPUT
                        The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify the --text flag
--input_list INPUT_LIST
                        A list of files to analyze. One file per line.
--nobatch NOBATCH     Disable batch enrollment when using pem or list input files, so that files are processed individually.
--path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
--upload_files        Must be specified with --path argument. This uploads the files to the server so the client and server do not need to share a filesystem. This can also be used to
                        bypass the 2 GB request size limitation.
--secure              Indicates a secure connection should be made. Requires --certpath, --keypass, --keypath, and --cabundlepath to be set.
--certpath CERTPATH   Specifies the path of the certificate
--keypath KEYPATH     Specifies the path of the certificate key
--keypass KEYPASS     Specifies the certificate passphrase to unlock the encrypted certificate key
--cabundlepath CABUNDLEPATH
                        Specifies the path of the certificate authority

To perform an enrollment request with these tools, you will need these essential pieces of information:

  • Plugin name (--plugin)
  • Domain name (--domain)
  • One of:
    • A properly formatted enrollment list (--input_list), if providing multiple files at once (see below for formatting)
    • An input audio file (--input) AND the name of the class you wish to enroll (--enroll for OliveEnroll, -e or --enroll for olivepyanalyze)

Generically, this looks like this for a single file input:

$ ./OliveEnroll  --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>
$ olivepyenroll --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>

A more specific example:

$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"

Or if providing the enrollment list format shown above, the call is even simpler. Generically:

$ ./OliveEnroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>
$ olivepyenroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>

Specific:

$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt

Where the enrollment_input.txt file might look like:

/some/data/somewhere/inputFile1.wav Logan
/some/other/data/somewhere/else/LoganPodcast.wav Logan
/yet/another/data/directory/charlie-speaks.wav Charlie

Workflow (Enrollment)

In the most basic case, enrollment using a workflow is just as simple as scoring with a workflow. This is becuase most workflows will only have a single enrollment-capable job; a job is a subset of the the tasks a workflow is performing, typically linked to a single plugin. In the rare case that you're using a workflow with multiple supported enrollment jobs, you will need to specify which job to enroll to. See the Advanced Workflow Enrollment section below.

Workflow enrollment is performed by using the olivepyworkflowenroll utility, whose help/usage statement is:

olivepyworkflowenroll usage
usage: olivepyworkflowenroll [-h] [--print_jobs] [--job JOB] [--enroll ENROLL] [--unenroll UNENROLL] [-i INPUT] [--input_list INPUT_LIST] [--path] [-s SERVER] [-P PORT] [-t TIMEOUT]
                             [--secure] [--certpath CERTPATH] [--keypath KEYPATH] [--keypass KEYPASS] [--cabundlepath CABUNDLEPATH]
                             workflow

Perform OLIVE enrollment using a Workflow Definition file

positional arguments:
  workflow              The workflow definition to use.

options:
  -h, --help            show this help message and exit
  --print_jobs          Print the supported workflow enrollment jobs.
  --job JOB             Enroll/Unenroll an Class ID for a job(s) in the specified workflow. If not specified enroll or unenroll for ALL enrollment/unenrollment jobst
  --enroll ENROLL       Enroll using this (class) name. Should be used with the job argument to specify a target job to enroll with (if there are more than one enrollment jobs)
  --unenroll UNENROLL   Enroll using this (class) name. Should be used with the job argument to specify a job to unenroll (if there are more than one unenrollment jobs)
  -i INPUT, --input INPUT
                        The data input to enroll. Either a pathname to an audio/image/video file or a string for text input
  --input_list INPUT_LIST
                        A list of files to enroll. One file per line plus the class id to enroll.
  --path                Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
  -s SERVER, --server SERVER
                        The machine the server is running on. Defaults to localhost.
  -P PORT, --port PORT  The port to use.
  -t TIMEOUT, --timeout TIMEOUT
                        The timeout (in seconds) to wait for a response from the server
  --secure              Indicates a secure connection should be made. Requires --certpath, --keypass, --keypath, and --cabundlepath to be set.
  --certpath CERTPATH   Specifies the path of the certificate
  --keypath KEYPATH     Specifies the path of the certificate key
  --keypass KEYPASS     Specifies the certificate passphrase to unlock the encrypted certificate key
  --cabundlepath CABUNDLEPATH
                        Specifies the path of the certificate authority

If there is only one supported enrollment job in the workflow, using this utility for enrollment is very similar to the enrollment utilities above; but a workflow is provided instead of a plugin and domain combination. As with the other enrollment utilities, olivepyworkflowenroll supports both single-file enrollment and batch enrollment using an enrollment-formatted text file.

Generically, this looks like:

$ olivepyworkflowenroll --input <path to audio file> --enroll <class name> <workflow>
$ olivepyworkflowenroll --input_list <path to enrollment file> <workflow>

And a specific example of each:

$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav  --enroll "Logan" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

Important: Note that in OLIVE, when you enroll a class, you are enrolling to a plugin and domain, and enrollments are shared server-wide. Even when you enroll using a workflow and the olivepyworkflow utility, enrollments are associated with the specific plugin/domain that the workflow is using under the hood. Any enrollments made to a workflow will be available to anyone else who may be using that server instance, and will also be made available to anyone interacting with the individual plugin - whether directly or via a workflow.

As a more concrete example of this, the "SmartTranscription" workflow that is sometimes provided with OLIVE, that performs Speech Activity Detection, Speaker Detection, Language Detection, and Speech Recognition on supported languages has a single plugin that supports enrollments; Speaker Detection. As a result, the workflow is set up to have a single enrollment job, to allow workflow users to enroll new speakers to be detected by this plugin. When enrollment is performed with this workflow, the newly created speaker model is created by and for the Speaker Detection plugin itself, and goes into the global OLIVE enrollment space. If a file is analyzed by directly calling this Speaker Detection plugin, the new enrollment will be part of the pool of target speakers the plugin will search for. More information on this concept of "Workflow Enrollment Jobs" is provided in the next section.

Advanced Workflow Enrollment - Jobs

It's rare, but possible for a workflow to bundle multiple enrollment-capable plugin capabilities into one. One example could be combining Speaker Detection in a workflow that also runs Query-by-Example Keyword Spotting, both of which rely on user enrollments to define their target classes. When this happens, if a user wishes to maintain the ability to enroll separate classes into each enrollable plugin, the workflow needs to expose these different enrollment tasks as separate jobs in the workflow enrollment capabilities.

If this is necessary, the workflow will come from SRI configured appropriately - the user need only be concerned with how to specify which job to enroll with.

To query which enrollment jobs are available to a workflow, use the olivepyworkflowenroll tool with the --print_jobs flag:

$ olivepyworkflowenroll --print_jobs <workflow>

Investigating the "SmartTranscription" workflow we briefly mentioned above:

$ olivepyworkflowenroll --print_jobs SmartTranscriptionFull.workflow.json
enrolling 0 files
Enrollment jobs '['SDD Enrollment']'
Un-Enrollment jobs '['SDD Unenrollment']'

We see that there is only a single Enrollment job available; SDD Enrollment. If there were others, they would be listed in this output. Now that the desired job name is known, enrolling with the specified job is done by supplying that job name to the --job flag; in this case:

$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav  --enroll "Logan" --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json

TLS/Secure Mode

Note that when using OLIVE with TLS enabled, additional arguments are required to pass the appropriate certificates and other info necessary for the TLS configuration.

In addition to what was shown above, these parameters must also be supplied for Java:

  • secure - Enables TLS/secure connection mode.
  • certpath - Specifies the path of the certificate.
  • certpass - Specifies the certificate passphrase to unlock the encrypted certificate key.
  • cabundlepath - Specifies the path of the certificate authority.
  • cabundlepass - Specifies the certificate authority passphrase of the certificate authority.

And these must also be supplied for Python:

  • secure - Indicates a secure connection should be made. Requires --certpath, --keypass, --keypath, and --cabundlepath to be set.
  • certpath - Specifies the path of the certificate.
  • keypath - Specifies the path of the certificate key.
  • keypass - Specifies the certificate passphrase to unlock the encrypted certificate key.
  • cabundlepath - Specifies the path of the certificate authority.

Plugin Direct TLS Examples

Borrowing the global scoring examples from above, if TLS is enabled, they would need to be modified as follows:

$ ./OliveAnalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav \
  --secure \
  --certpath /home/username/cert-directory/test-certs/client.p12 \
  --certpass "my password" \
  --cabundlepath /home/username/cert-directory/test-certs/root-ca.p12 \
  --cabundlepass "another password"
$ olivepyanalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav \ 
  --secure \
  --certpath /home/username/cert-directory/test-certs/client.crt \
  --keypath /home/username/cert-directory/test-certs/client.key \
  --keypass "my password" \
  --cabundlepath /home/username/cert-directory/test-certs/root-ca.crt

Note that the same flags and info wil need to be provided for enrollment requests and any other communication or job submissions to a TLS-enabled OLIVE server.

Workflow TLS Examples

Modifying the workflow analysis examples above to enable TLS/secure communication:

$ ./OliveWorkflow --input_list ~/path/to/list-of-audio-files.txt ~/olive5.4.0/oliveAppData/ --workflow workflows/SmartTranscription.workflow.json \
  --secure \
  --certpath /home/username/cert-directory/test-certs/client.p12 \
  --certpass "my password" \
  --cabundlepath /home/username/cert-directory/test-certs/root-ca.p12 \
  --cabundlepass "another password"
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json \ 
  --secure \
  --certpath /home/username/cert-directory/test-certs/client.crt \
  --keypath /home/username/cert-directory/test-certs/client.key \
  --keypass "my password" \
  --cabundlepath /home/username/cert-directory/test-certs/root-ca.crt

Note that the same flags and info wil need to be provided for workflow enrollment requests and any other communication or job submissions to a TLS-enabled OLIVE server.

Job Cancellation Request

This utility submits a request to the OLIVE server to cancel all jobs currently in progress or pending - this allows the user to recover from submitting a large job or batch of jobs that is no longer relevant, or was perhaps submitted to the wrong plugin(s) or workflow(s), without restarting the entire server.

This functionality is currently only offered in the Python olivepy suite, and will be coming to the Java client in the future. This request is accomplished with the olivepycancel utility. Its usage is:

usage: olivepycancel [-h] [-P PORT] [-s SERVER] [-t TIMEOUT]

    options:
    -h, --help            show this help message and exit
    -P PORT, --port PORT  The port to use.
    -s SERVER, --server SERVER
                            The machine the server is running on. Defaults to localhost.
    -t TIMEOUT, --timeout TIMEOUT
                            The timeout to use

This utility is very simple - an example call with the default OLIVE server settings:

olivepycancel

Adaption

The adaption process is complicated, time-consuming, and plugin/domain specific. Use the SRI provided Python client (olivepylearn) or Java client (OliveLearn) to perform adaptation.

To adapt using the olivepylearn utility:

olivepylearn --plugin sad-dnn --domain multi-v1 -a TEST_NEW_DOMAIN -i /olive/sadRegression/lists/adapt_s.lst

Where that adapt_s.lst looks like this:

/olive/sadRegression/audio/adapt/20131209T225239UTC_10777_A.wav S 20.469 21.719
/olive/sadRegression/audio/adapt/20131209T225239UTC_10777_A.wav NS 10.8000 10.8229
/olive//sadRegression/audio/adapt/20131209T234551UTC_10782_A.wav S 72.898 73.748
/olive//sadRegression/audio/adapt/20131209T234551UTC_10782_A.wav NS 42.754 43.010
/olive//sadRegression/audio/adapt/20131210T184243UTC_10791_A.wav S 79.437 80.427
/olive//sadRegression/audio/adapt/20131210T184243UTC_10791_A.wav NS 61.459 62.003
/olive//sadRegression/audio/adapt/20131212T030311UTC_10817_A.wav S 11.0438 111.638
/olive//sadRegression/audio/adapt/20131212T030311UTC_10817_A.wav NS 69.058 73.090
/olive//sadRegression/audio/adapt/20131212T052052UTC_10823_A.wav S 112.936 113.656
/olive//sadRegression/audio/adapt/20131212T052052UTC_10823_A.wav NS 83.046 83.114
/olive//sadRegression/audio/adapt/20131212T064501UTC_10831_A.wav S 16.940 20.050
/olive//sadRegression/audio/adapt/20131212T064501UTC_10831_A.wav NS 59.794 59.858
/olive//sadRegression/audio/adapt/20131212T084501UTC_10856_A.wav S 87.280 88.651
/olive//sadRegression/audio/adapt/20131212T084501UTC_10856_A.wav NS 82.229 82.461
/olive//sadRegression/audio/adapt/20131212T101501UTC_10870_A.wav S 111.346 111.936
/olive//sadRegression/audio/adapt/20131212T101501UTC_10870_A.wav NS 83.736 84.446
/olive//sadRegression/audio/adapt/20131212T104501UTC_10876_A.wav S 77.291 78.421
/olive//sadRegression/audio/adapt/20131212T104501UTC_10876_A.wav NS 0 4.951
/olive//sadRegression/audio/adapt/20131212T111501UTC_10878_A.wav S 30.349 32.429
/olive//sadRegression/audio/adapt/20131212T111501UTC_10878_A.wav NS 100.299 101.647
/olive//sadRegression/audio/adapt/20131212T114501UTC_10880_A.wav S 46.527 49.147
/olive//sadRegression/audio/adapt/20131212T114501UTC_10880_A.wav NS 44.747 46.148
/olive//sadRegression/audio/adapt/20131212T134501UTC_10884_A.wav S 24.551 25.471
/olive//sadRegression/audio/adapt/20131212T134501UTC_10884_A.wav NS 52.033 52.211
/olive//sadRegression/audio/adapt/20131212T141502UTC_10887_A.wav S 88.358 93.418
/olive//sadRegression/audio/adapt/20131212T141502UTC_10887_A.wav NS 46.564 46.788
/olive//sadRegression/audio/adapt/20131212T151501UTC_10895_A.wav S 10.507 11.077
/olive//sadRegression/audio/adapt/20131212T151501UTC_10895_A.wav NS 41.099 41.227
/olive//sadRegression/audio/adapt/20131212T154502UTC_10906_A.wav S 61.072 63.002
/olive//sadRegression/audio/adapt/20131212T154502UTC_10906_A.wav NS 19.108 19.460
/olive//sadRegression/audio/adapt/20131213T023248UTC_10910_A.wav S 97.182 97.789
/olive//sadRegression/audio/adapt/20131213T023248UTC_10910_A.wav NS 71.711 71.732
/olive//sadRegression/audio/adapt/20131213T041143UTC_10913_A.wav S 114.312 117.115
/olive//sadRegression/audio/adapt/20131213T041143UTC_10913_A.wav NS 31.065 31.154
/olive//sadRegression/audio/adapt/20131213T044200UTC_10917_A.wav S 90.346 91.608
/olive//sadRegression/audio/adapt/20131213T044200UTC_10917_A.wav NS 50.028 51.377
/olive//sadRegression/audio/adapt/20131213T050721UTC_10921_A.wav S 75.986 76.596
/olive//sadRegression/audio/adapt/20131213T050721UTC_10921_A.wav NS 12.485 12.709
/olive//sadRegression/audio/adapt/20131213T071501UTC_11020_A.wav S 72.719 73.046
/olive//sadRegression/audio/adapt/20131213T071501UTC_11020_A.wav NS 51.923 53.379
/olive//sadRegression/audio/adapt/20131213T104502UTC_18520_A.wav NS 11.1192 112.761
/olive//sadRegression/audio/adapt/20131213T121501UTC_18530_A.wav NS 81.277 82.766
/olive//sadRegression/audio/adapt/20131213T124501UTC_18533_A.wav NS 83.702 84.501
/olive//sadRegression/audio/adapt/20131213T134502UTC_18567_A.wav NS 69.379 72.258
/olive//sadRegression/audio/adapt/20131217T015001UTC_18707_A.wav NS 5.099 10.507