OLIVE Java and Python Clients
Introduction
Each OLIVE delivery includes two OLIVE client utilities - one written in Java, one written in Python. Out of the box, these tools allow a user to jump right in with running OLIVE if the GUI is not desired. These can also serve as code examples for integrating with OLIVE. This page primarily covers using these clients for processing audio, rather than integrating with the OLIVE API. For more information on integration, the nitty-gritty details of the OLIVE Enterprise API, and code examples, refer to these integration-focused pages instead:
- OLIVE Enterprise API Primer
- OLIVE Python Client API Documentation
- Integrating a Client API with OLIVE
- Building an OLIVE API Reference Implementation
As far as the usage and capabilities of these tools, they were meant to mirror the Legacy CLI Tools as closely as possible, and shares many input/output formats and assumptions with those tools. As this document is still under construction, referring to this older guide may help fill in some useful information that may currently be missing from this page.
Note that unlike the Legacy CLI tools, that are calling plugin code directly, these client tools require a running OLIVE Server. They are client utilities that are queueing and submitting job requests to the OLIVE server, which then manages the plugins themselves and actual audio processing. If you haven't already, please refer to the appropriate guide for setting up and starting an OLIVE server depending on your installation type:
OLIVE Martini Docker-based Installation
OLIVE Standalone Docker-based Installation
Redhat/CentOS 7 Native Linux Installation
OLIVE Server Guide
Client Setup, Installation, Requirements
As a quick review, the contents of an OLIVE package typically look like this:
- olive5.5.2/
- api/
- java/
- python/
- docs/
- martini/ -or- docker/ -or- runtime/
- OliveGUI/ - (Optional) The OLIVE Nightingale GUI (not included in all deliveries)
- oliveAppData/
- api/
The clients this page describes are contained in the bolded api/
directory above.
Java (OliveAnalyze)
The Java tools are the most full-featured with respect to tasking individual plugins. They are asynchronous, and better able to deal with large amounts of file submissions by parallelizing the submission of large lists of files. If the primary task is enrolling and scoring audio files with individual plugins, the Java tools, what we call the OliveAnalyze
suite.
The tools themselves do not need to be 'installed'. For convenience, their directory can be added to your $PATH
environment variable, so that they can be called from anywhere:
$ export PATH=$PATH:<path>/olive5.5.2/api/java/bin/
$ OliveAnalyze -h
But they can also be left alone and called directly, as long as their full or relative path is present:
# From inside olive5.5.2/api/java/bin:
$ ./OliveAnalyze -h
# From inside olive5.5.2/:
$ ./api/java/bin/OliveAnalyze -h
# From elsewhere:
$ <path>/olive5.5.2/api/java/bin/OliveAnalyze -h
These tools depend on OpenJDK 11 or newer being installed. Refer to OpenJDK for more information on downloading and installing this for your operating system.
The full list of utilities in this suite are as follows:
OliveAnalyze
OliveAnalyzeText
OliveEnroll
OliveLearn
(rarely used)
But the most commonly used are OliveAnalyze
for scoring requests, and OliveEnroll
for enrollment requests. Examples are provided for each of these below, and for more advanced users that need different tools, each utility has its own help statement that can be accessed with the -h
flag:
$ OliveAnalyzeText -h
The arguments and formatting for each tool is very similar, so familiarity with the OliveAnalyze
and OliveEnroll
examples below should allow use of most of these tools.
Python (olivepyanalyze)
The Python client, what we call the olivepyanalyze
suite, is not as fully-featured with respect to batch-processing of audio files. It performs synchronous requests to the OLIVE server, and so it will sequentially score each provided audio file, rather than submitting jobs in parallel. For this reason, the Java OliveAnalyze
tools are recommended for batch processing of individual plugin tasks.
Only the python client currently has workflow support, however, and the python workflow client utility, olivepyworkflow
is not limited by the synchronous restriction of olivepyanalyze
- so when operating with workflows it is the clear choice.
The python client tools require Python 3.8 or newer - please refer to Python for downloading and installing Python.
Installing these tools has been simplified by providing them in the form of a Python wheel, that can be easily installed with pip
.
$ cd olive5.5.2/api/python
$ ls
olivepy-5.5.2-py3-none-any.whl
olivepy-5.5.2.tar.gz
$ python3 -m pip install olivepy-5.5.2-py3-none-any.whl
This will fetch and install (if necessary) the olivepy dependencies, and install the olivepy
tools. Those dependencies are:
The olivepy utilities closely mirror the Java utilities, with the addition of the workflow tool, and are as follows:
olivepyanalyze
olivepyenroll
olivepylearn
(rarely used)olivepyworkflow
olivepyworkflowenroll
The olivepyworkflow
tools are the most important, and examples are provided below for both scoring with olivepyworkflow
and enrollment with olivepyworkflowenroll
. We also provide examples for olivepyanalyze
and olivepyenroll
that mirror the Java examples.
Scoring/Analysis Requests
Plugin Scoring Types
In general, the output format will depend on the type of ‘scorer’ the plugin being used is.
For a deeper dive into OLIVE scoring types, please refer to the appropriate section in the OLIVE Plugin Traits Guide, but a brief overview follows. The most common types of plugins in OLIVE are:
Global Scorer
Any plugin that reports a single score for a given model over the entire test audio file is a global scoring plugin. Every input test audio file will be assigned a single score for each enrolled target model, as measured by looking at the entire file at once.
Speaker and Language Identification are examples of global scorers.
OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.
Global Scorer Output
In the case of global scorers like LID and SID, the output file, which by default is called output.txt, contains one or more lines containing the audio path, speaker/language ID (class id), and the score:
<audio_path> <class_id> <score>
For example, a Speaker Identification analysis run, with three enrolled speakers (Alex, Taylor, Blake) might return:
/data/sid/audio/file1.wav Alex -0.5348
/data/sid/audio/file1.wav Taylor 3.2122
/data/sid/audio/file1.wav Blake -5.5340
/data/sid/audio/file2.wav Alex 0.5333
/data/sid/audio/file2.wav Taylor -4.9444
/data/sid/audio/file2.wav Blake -2.6564
Note the actual meanings of the scores and available classes will vary from plugin-to-plugin. Please refer to individual plugin documentation for more guidance on what the scores mean and what ranges are acceptable.
Also note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a global-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, class_id, score) are still provided, but packed into a json structure.
Region scorer
Region scoring plugins are capable of considering each audio file in small pieces at a time. Scores are reported for enrolled target models along with the location within that audio file that they are thought to occur. This allows OLIVE to pinpoint individual keywords or phrases or pick out one specific speaker in a recording where several people may be talking.
Automatic Speech Recognition (ASR), Language Detection (LDD), and Speaker Detection (SDD) are all region scorers.
OLIVE typically calls a global scoring plugin an "Identification" plugin, whereas a region scoring plugin to pinpoint the same class types would instead be called a "Detection" plugin. For example, Speaker Identification versus Speaker Detection; the former assumes the entire audio contains a single speaker, where the latter makes no such assumption, and attempts to localize any detected speakers of interest.
Region Scorer Output
Region scoring plugins will generate a single output file, that is also called output.txt by default, just like global scorers. The file looks very similar to a global scorer’s output, but includes a temporal component to each line that represents the start and end of each scored region. In practice, this looks like:
<audio_path> <region_start_timestamp> <region_end_timestamp> <class_id> <score>
For example, a language detection plugin might output something like this:
/data/mixed-language/testFile1.wav 2.170 9.570 Arabic 0.912
/data/mixed-language/testFile1.wav 10.390 15.930 French 0.693
/data/mixed-language/testFile1.wav 17.639 22.549 English 0.832
/data/mixed-language/testFile2.wav 0.142 35.223 Pashto 0.977
Each test file can have multiple regions where scores are reported, depending on the individual plugin. The region boundary timestamps are in seconds. More specific examples can be found in the respective plugin-specific documentation pages. As with global scoring, note that the output format described here is literally what will be returned when calling a plugin directly with OliveAnalyze or olivepyanalyze - but when performing a region-scoring task as part of analysis with a workflow,these same informational pieces (audio_path or object, region start and end timestamps, class_id, score) are still provided, but packed into a json structure.
Plugin Direct (Analysis)
Performing an analysis request with both tools is very similar, as the tools were designed to closely mirror each other so that familiarity with one would easily transfer to the other. The usage statements for each can be examined by invoking each with their -h
or --help
flag:
$ ./OliveAnalyze -h
usage: OliveAnalyze
--align Perform audio alignment analysis. Must specify
the two files to compare using an input list file
via the--list argument
--apply_update Request the plugin is update (if supported)
--box Perform bounding box analysis. Must specify an
image or video input
--channel <arg> Process stereo files using channel NUMBER
--class_ids <arg> Use Class(s) from FILE for scoring. Each line in
the file contains a single class, including any
white space
--compare Perform audio compare analysis. Must specify the
two files to compare using an input list file via
the--list argument
--decoded Send audio file as decoded PCM16 samples instead
of sending as serialized buffer. Input file must
be a wav file
--domain <arg> Use Domain NAME
--enhance Perform audio conversion (enhancement)
--frame Perform frame scoring analysis
--global Perform global scoring analysis
-h Print this help message
-i,--input <arg> NAME of the input file (audio/video/image as
required by the plugin
--input_list <arg> Use an input list FILE having multiple
filenames/regions or PEM formatted
-l,--load load a plugin now, must use --plugin and --domain
to specify the plugin/domain to preload
--options <arg> options from FILE
--output <arg> Write any output to DIR, default is ./
-p,--port <arg> Scenicserver port number. Defauls is 5588
--path Send audio file path instead of a buffer. Server
and client must share a filesystem to use this
option
--plugin <arg> Use Plugin NAME
--print <arg> Print all available plugins and domains.
Optionally add 'verbose' as a print option to
print full plugin details including traits and
classes
-r,--unload unload a loaded plugin now, must use --plugin and
--domain to specify the plugin/domain to unload
--region Perform region scoring analysis
-s,--server <arg> Scenicserver hostname. Default is localhost
--shutdown Request a clean shutdown of the server
--status Print the current status of the server
-t,--timeout <arg> timeout (in seconds) when waiting for server
response. Default is 10 seconds
--threshold <arg> Apply threshold NUMBER when scoring
--update_status Get the plugin's update status
-v,--vec <arg> PATH to a serialized AudioVector, for plugins
that support audio vectors in addition to wav
files
--vector Perform audio vectorization
--workflow Request a workflow
$ olivepyanalyze ---h
usage: olivepyanalyze [-h] [-C CLIENT_ID] [-p PLUGIN] [-d DOMAIN] [-G] [-e] [-f] [-g] [-r] [-b] [-P PORT] [-s SERVER] [-t TIMEOUT] [-i INPUT]
[--input_list INPUT_LIST] [--text] [--options OPTIONS] [--class_ids CLASS_IDS] [--debug] [--path] [--print]
optional arguments:
-h, --help show this help message and exit
-C CLIENT_ID, --client-id CLIENT_ID
Experimental: the client_id to use
-p PLUGIN, --plugin PLUGIN
The plugin to use.
-d DOMAIN, --domain DOMAIN
The domain to use
-G, --guess Experimental: guess the type of analysis to use based on the plugin/domain.
-e, --enhance Enhance the audio of a wave file, which must be passed in with the --wav option.
-f, --frame Do frame based analysis of a wave file, which must be passed in with the --wav option.
-g, --global Do global analysis of a wave file, which must be passed in with the --wav option.
-r, --region Do region based analysis of a wave file, which must be passed in with the --wav option.
-b, --box Do bounding box based analysis of an input file, which must be passed in with the --wav option.
-P PORT, --port PORT The port to use.
-s SERVER, --server SERVER
The machine the server is running on. Defaults to localhost.
-t TIMEOUT, --timeout TIMEOUT
The timeout to use
-i INPUT, --input INPUT
The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
the --text flag
--input_list INPUT_LIST
A list of files to analyze. One file per line.
--text Indicates that input (or input list) is a literal text string to send in the analysis request.
--options OPTIONS Optional file containing plugin properties ans name/value pairs.
--class_ids CLASS_IDS
Optional file containing plugin properties ans name/value pairs.
--debug Debug mode
--path Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
--print Print all available plugins and domains
To perform a scoring request with these tools, you will need these essential pieces of information:
- Plugin name (
--plugin
) - Domain name (
--domain
) - Scoring type to perform (
--region
for region-scoring,--global
for global-scoring, others for less-common plugins) - Input audio file or list of input audio files (
--input
for a single file,--input_list
for a list of files)
The flag for providing each piece of information is the same for both tools, as shown in the list above.
For more information on what the difference is between a plugin and a domain, refer to the Plugins Overview. For more information on the domains available for each plugin, refer to documentation page for that specific plugin.
To see which plugins and domains you have installed and running in your specific OLIVE environment, refer to the server startup status message, that appears when you start the server:
martini.sh start
, thenmartini.sh log
once the server is running for martini-based OLIVE packages (most common)./run.sh
for non-martini docker OLIVE packagesoliveserver
for native linux OLIVE packages
Or exercise the --print
option for each tool to query the server and print the available plugins and domains:
$ ./OliveAnalyze --print
$ olivepyanalyze --print
Example output:
2022-06-14 12:12:25.786 INFO com.sri.speech.olive.api.Server - Connected to localhost - request port: 5588 status_port: 5589
Found 8 plugin(s):
Plugin: sad-dnn-v7.0.2 (SAD,Speech) v7.0.2 has 2 domain(s):
Domain: fast-multi-v1, Description: Trained with Telephony, PTT and Music data
Domain: multi-v1, Description: Trained with Telephony, PTT and Music data
Plugin: asr-dynapy-v3.0.0 (ASR,Content) v3.0.0 has 9 domain(s):
Domain: english-tdnnChain-tel-v1, Description: Large vocabulary English DNN model for 8K data
Domain: farsi-tdnnChain-tel-v1, Description: Large vocabulary Farsi DNN model for 8K data
Domain: french-tdnnChain-tel-v2, Description: Large vocabulary African French DNN Chain model for 8K data
Domain: iraqiArabic-tdnnChain-tel-v1, Description: Large vocabulary Iraqi Arabic DNN Chain model for 8K data
Domain: levantineArabic-tdnnChain-tel-v1, Description: Large vocabulary Levantine Arabic DNN Chain model for 8K data
Domain: mandarin-tdnnChain-tel-v1, Description: Large vocabulary Mandarin DNN model for clean CTS 8K data
Domain: pashto-tdnnChain-tel-v1, Description: Large vocabulary Pashto DNN Chain model for 8K data
Domain: russian-tdnnChain-tel-v2, Description: Large vocabulary Russian DNN model for 8K data
Domain: spanish-tdnnChain-tel-v1, Description: Large vocabulary Spanish DNN model for clean CTS 8K data
Plugin: sdd-diarizeEmbedSmolive-v1.0.0 (SDD,Speaker) v1.0.0 has 1 domain(s):
Domain: telClosetalk-int8-v1, Description: Speaker Embeddings Framework
Plugin: tmt-neural-v1.0.0 (TMT,Content) v1.0.0 has 3 domain(s):
Domain: cmn-eng-nmt-v1, Description: Mandarin Chinese to English NMT
Domain: rus-eng-nmt-v1, Description: Russian to English NMT
Domain: spa-eng-nmt-v3, Description: Spanish to English NMT
Plugin: ldd-embedplda-v1.0.1 (LDD,Language) v1.0.1 has 1 domain(s):
Domain: multi-v1, Description: PNCC bottleneck domain suitable for mixed conditions (tel/mic/compression)
Plugin: sdd-diarizeEmbedSmolive-v1.0.2 (SDD,Speaker) v1.0.2 has 1 domain(s):
Domain: telClosetalk-smart-v1, Description: Speaker Embeddings Framework
Plugin: sid-dplda-v2.0.2 (SID,Speaker) v2.0.2 has 1 domain(s):
Domain: multi-v1, Description: Speaker Embeddings DPLDA
Plugin: lid-embedplda-v3.0.1 (LID,Language) v3.0.1 has 1 domain(s):
Domain: multi-v1, Description: PNCC Bottleneck embeddings suitable for mixed conditions (tel/mic/compression)
Examples
To perform a global score analysis on a single file with the speaker identification plugin sid-dplda-v2.0.2
, using the multi-v1
domain, the calls for each would look like this:
$ ./OliveAnalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav
$ olivepyanalyze --plugin sid-dplda-v2.0.2 --domain multi-v1 --global --input ~/path/to/test-file1.wav
Performing region scoring instead, using a transcription plugin, asr-dynapy-v3.0.0
, via the english domain english-tdnnChain-tel-v1
on a list of audio files would be performed with:
$ ./OliveAnalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt
$ olivepyanalyze --plugin asr-dynapy-v3.0.0 --domain english-tdnnChain-tel-v1 --region --input_list ~/path/to/list-of-audio-files.txt
Where the format of the input list is simply a text file with a path to an audio file on each line. For example:
/data/mixed-language/testFile1.wav
/data/mixed-language/testFile2.wav
/data/mixed-language/testFile3.wav
/moreData/test-files/unknown1.wav
Workflow (Analysis)
OLIVE Workflows provide a simple way of creating a sort of 'recipe' that specifies how to deal with the input data and one or more OLIVE plugins. It allows complex operations to be requested and performed with a single, simple call to the system by allowing complexities and specific knowledge to be encapsulated within the workflow itself, rather than known and implemented by the user at run time. Due to off-loading this burden, operating with workflows is much simpler than calling the plugin(s) directly - typically all that is needed from the user to request an analysis from olivepyworkflow
is the workflow itself, and one or more input files. There are more options available to the olivepyworkflow
client, as shown in the usage statement:
$ olivepyworkflow
usage: olivepyworkflow [-h] [--tasks] [--class_ids] [--print_actualized] [--print_workflow] [-s SERVER] [-P PORT] [-t TIMEOUT] [-i INPUT]
[--input_list INPUT_LIST] [--text] [--options OPTIONS] [--path] [--debug]
workflow
Perform OLIVE analysis using a Workflow Definition file
positional arguments:
workflow The workflow definition to use.
optional arguments:
-h, --help show this help message and exit
--tasks Print the workflow analysis tasks.
--class_ids Print the class IDs available for analysis in the specified workflow.
--print_actualized Print the actualized workflow info.
--print_workflow Print the workflow definition file info (before it is actualized, if requested)
-s SERVER, --server SERVER
The machine the server is running on. Defaults to localhost.
-P PORT, --port PORT The port to use.
-t TIMEOUT, --timeout TIMEOUT
The timeout (in seconds) to wait for a response from the server
-i INPUT, --input INPUT
The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
the --text flag
--input_list INPUT_LIST
A list of files to analyze. One file per line.
--text Indicates that input (or input list) is a literal text string to send in the analysis request.
--options OPTIONS A JSON formatted string of workflow options such as [{"task":"SAD", "options":{"filter_length":99, "interpolate":1.0}] or
{"filter_length":99, "interpolate":1.0, "name":"midge"}, where the former options are only applied to the SAD task, and the later are
applied to all tasks
--path Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
--debug Debug mode
But their use rarely necessary, and is reserved for advanced users or specific system testing.
Generically, calling olivepyworkflow
will look like this:
$ olivepyworkflow --input ~/path/to/test-file1.wav <workflow>
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt <workflow>
As an example of the power of workflows, this request calls the SmartTranscription
workflow - that performs Speech Activity Detection (region scoring), Speaker Diarization and Detection (region scoring), Language Detection (region scoring), and then Automatic Speech Recognition (region scoring) on any sections of each input file that are detected to be a language that ASR currently has support for, and returns all of the appropriate results in a JSON structure. Performing this same task by calling plugins directly, this same functionality would be a minimum of 4 separate calls to OLIVE; significantly more if more than one language is detected being spoken in the file.
$ olivepyworkflow --input ~/path/to/test-file1.wav ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflow --input_list ~/path/to/list-of-audio-files.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
Output Format (Workflow)
Workflows are generally customer/user-specific and can be quite specialized - the output format and structure will depend heavily on the individual workflow itself and the tasks being performed. All of the information pieces that define each scoring type are still reported for each result, but the results are organized into a single JSON structure for the workflow call. This means that the output of a region scoring plugin within the workflow is still one or more sets of:
<region_start_timestamp> <region_end_timestamp> <class_id> <score>
But the data is arranged into the JSON structure and will be nested depending on the structure of the workflow itself and how the audio is routed by the workflow. For more detailed information on the structure of this JSON message and the inner-workings of workflows, please refer to the OLIVE Workflow API documentation. A brief, simplified summary to jump start working with workflow output follows.
The main skeleton structure of the results output is shown below, along with an actual example. The results are provided as a result for each input file, that lists the job name(s), some metadata about the input audio and how it was processed, and then the returned results (if any) for each task, which is generally a plugin.
Workflow analysis results:
[
{
"job_name": <workflow job name>,
"data": [
{
"data_id": <data ID, typically audio file name>,
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": <sample rate>,
"duration_seconds": <audio duration>,
"number_channels": 1,
"label": <data label>,
"id": <input audio UUID>
}
],
"tasks": {
<task 1 name>: [
{
"task_trait": "REGION_SCORER",
"task_type": <task type>,
"message_type": "REGION_SCORER_RESULT",
"plugin": <plugin>,
"domain": <domain>,
"analysis": {
"region": [
{
"start_t": <region 1 start time (s)>,
"end_t": <region 1 end time (s)>,
"class_id": <region 1 class name>,
"score": <region 1 score>
},
...
{
"start_t": <region N start time (s)>,
"end_t": <region N end time (s)>,
"class_id": <region N class name>,
"score": <region N score>
}
]
}
}
],
<task 2 name>: [
{
"task_trait": "REGION_SCORER",
"task_type": <task type>,
"message_type": "REGION_SCORER_RESULT",
"plugin": <plugin>,
"domain": <domain>,
"analysis": {
"region": [
{
"start_t": <region 1 start time (s)>,
"end_t": <region 1 end time (s)>,
"class_id": <region 1 class name>,
"score": <region 1 score>
},
...
{
"start_t": <region N start time (s)>,
"end_t": <region N end time (s)>,
"class_id": <region N class name>,
"score": <region N score>
}
]
}
}
],
<task 3 name>: [
{
"task_trait": "REGION_SCORER",
"task_type": <task type>,
"message_type": "REGION_SCORER_RESULT",
"plugin": <plugin>,
"domain": <domain>,
"analysis": {
"region": [
{
"start_t": <region 1 start time (s)>,
"end_t": <region 1 end time (s)>,
"class_id": <region 1 class name>,
"score": <region 1 score>
},
...
{
"start_t": <region N start time (s)>,
"end_t": <region N end time (s)>,
"class_id": <region N class name>,
"score": <region N score>
}
]
}
}
]
}
}
]
Workflow analysis results:
[
{
"job_name": "SAD, SDD, and ASR English Workflow",
"data": [
{
"data_id": "z_eng_englishdemo.wav",
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": 8000,
"duration_seconds": 5.932625,
"number_channels": 1,
"label": "z_eng_englishdemo.wav",
"id": "0b04c7497521d53a5d6939533a55c461795f9d685b1bd19fd9031fc6f3997a8f"
}
],
"tasks": {
"SAD_REGIONS": [
{
"task_trait": "REGION_SCORER",
"task_type": "SAD",
"message_type": "REGION_SCORER_RESULT",
"plugin": "sad-dnn-v7.0.2",
"domain": "multi-v1",
"analysis": {
"region": [
{
"start_t": 0.0,
"end_t": 5.93,
"class_id": "speech",
"score": 0.0
}
]
}
}
],
"SDD": [
{
"task_trait": "REGION_SCORER",
"task_type": "SDD",
"message_type": "REGION_SCORER_RESULT",
"plugin": "sdd-diarizeEmbedSmolive-v1.0.2",
"domain": "telClosetalk-smart-v1",
"analysis": {
"region": [
{
"start_t": 0.1,
"end_t": 5.0,
"class_id": "unknownspk00",
"score": 1.4
}
]
}
}
],
"ASR": [
{
"task_trait": "REGION_SCORER",
"task_type": "ASR",
"message_type": "REGION_SCORER_RESULT",
"plugin": "asr-dynapy-v3.0.0",
"domain": "english-tdnnChain-tel-v1",
"analysis": {
"region": [
{
"start_t": 0.15,
"end_t": 0.51,
"class_id": "hello",
"score": 100.0
},
{
"start_t": 0.54,
"end_t": 0.69,
"class_id": "my",
"score": 100.0
},
{
"start_t": 0.69,
"end_t": 0.87,
"class_id": "name",
"score": 99.0
},
{
"start_t": 0.87,
"end_t": 1.05,
"class_id": "is",
"score": 99.0
},
{
"start_t": 1.05,
"end_t": 1.35,
"class_id": "evan",
"score": 88.0
},
{
"start_t": 1.35,
"end_t": 1.47,
"class_id": "this",
"score": 99.0
},
{
"start_t": 1.5,
"end_t": 1.98,
"class_id": "audio",
"score": 95.0
},
{
"start_t": 1.98,
"end_t": 2.16,
"class_id": "is",
"score": 74.0
},
{
"start_t": 2.16,
"end_t": 2.31,
"class_id": "for",
"score": 99.0
},
{
"start_t": 2.31,
"end_t": 2.4,
"class_id": "the",
"score": 99.0
},
{
"start_t": 2.4,
"end_t": 2.91,
"class_id": "purposes",
"score": 100.0
},
{
"start_t": 2.91,
"end_t": 3.06,
"class_id": "of",
"score": 99.0
},
{
"start_t": 3.12,
"end_t": 3.81,
"class_id": "demonstrating",
"score": 100.0
},
{
"start_t": 3.81,
"end_t": 3.96,
"class_id": "our",
"score": 78.0
},
{
"start_t": 4.05,
"end_t": 4.44,
"class_id": "language",
"score": 100.0
},
{
"start_t": 4.44,
"end_t": 4.53,
"class_id": "and",
"score": 93.0
},
{
"start_t": 4.53,
"end_t": 4.89,
"class_id": "speaker",
"score": 100.0
},
{
"start_t": 4.89,
"end_t": 5.01,
"class_id": "i.",
"score": 99.0
},
{
"start_t": 5.01,
"end_t": 5.22,
"class_id": "d.",
"score": 99.0
},
{
"start_t": 5.22,
"end_t": 5.85,
"class_id": "capabilities",
"score": 99.0
}
]
}
}
]
}
}
]
Each task output will typically be for a single plugin, and will be outputting the information provided by a Region Scorer or Global Scorer or Text Transformer in the case of Machine Translation, depending on how the workflow is using the plugin. The format of each result sub part is:
<task name>: [
{
"task_trait": "GLOBAL_SCORER",
"task_type": <task type, generally LID, SID, etc.>,
"message_type": "GLOBAL_SCORER_RESULT",
"plugin": <plugin>,
"domain": <domain>,
"analysis": {
"score": [
{
"class_id": <class 1>,
"score": <class 1 score>
},
{
"class_id": <class 2>,
"score": <class 2 score>
},
...
{
"class_id": <class N>,
"score": <class N score>
}
]
}
}
]
<task name>: [
{
"task_trait": "REGION_SCORER",
"task_type": <task type, typically ASR, SDD, SAD, etc.>,
"message_type": "REGION_SCORER_RESULT",
"plugin": <plugin>,
"domain": <domain>,
"analysis": {
"region": [
{
"start_t": <region 1 start time (s)>,
"end_t": <region 1 end time (s)>,
"class_id": <region 1 detected class>,
"score": <region 1 score>
},
{
"start_t": <region 2 start time (s)>,
"end_t": <region 2 end time (s)>,
"class_id": <region 2 detected class>,
"score": <region 2 score>
},
...
{
"start_t": <region N start time (s)>,
"end_t": <region N end time (s)>,
"class_id": <region N detected class>,
"score": <region N score>
},
]
}
}
]
<task name>: [
{
"task_trait": "TEXT_TRANSFORMER",
"task_type": <task type, typically MT>,
"message_type": "TEXT_TRANSFORM_RESULT",
"plugin": <plugin name>,
"domain": <domain name>,
"analysis": {
"transformation": [
{
"class_id": "test_label",
"transformed_text": <the translated/transformed text returned from the plugin>
}
]
}
}
]
Many workflows consist of a single job
, and bundle all plugin tasks
into this single job, as seen above. More complex workflows, or what OLIVE calls "Conditional Workflows" can pack multiple jobs
into a single workflow. This happens when there are certain tasks in the workflow that depend on other tasks in the workflow - for example when OLIVE needs to choose the appropriate Speech Recognition (ASR) language to use, depending on what language is detected being spoken by Language Identification (LID) or Language Detection (LDD). In this case, the LID/LDD is separated into one job, and the ASR into another, that is triggered to run once LID/LDD's decision is known. In this case, the results from each job are grouped accordingly in the results output. Below shows a simplified output from a workflow that includes three jobs; "job 1", "job 2", "job 3", and a real-life example output from the SmartTranscription conditional workflow that also has three jobs;
- the first performs Speech Activity Detection and Language Identification (LID);
Smart Translation SAD and LID Pre-processing
- the second uses the language decision from Language Identification to choose the appropriate (if any) language and domain for Automatic Speech Recognition (ASR) and runs that,
Dynamic ASR
- the third takes the output transcript from ASR and the language decision from LID and choose the appropriate (if any) language and domain for Text Machine Translation, and runs that.
Dynamic MT
As you can see below, these jobs are listed separately in the JSON for each result:
Workflow analysis results:
[
{
"job name": <job 1 name>,
"data": [
{
"data_id": <data identifier, typically audio file name>,
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": <sample rate>,
"duration_seconds": <audio duration>,
"number_channels": 1,
"label": <audio label>,
"id": <input audio UUID>
}
],
"tasks": {
<task 1 name (job 1)>: [
{
<task 1 results>
}
],
...
<task N name (job 1)>: [
{
<task N results>
}
]
}
},
{
"job name": <job 2 name>,
"data": [
{
"data_id": <data identifier, typically audio file name>,
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": <sample rate>,
"duration_seconds": <audio duration>,
"number_channels": 1,
"label": <audio label>,
"id": <input audio UUID>
}
],
"tasks": {
<task 1 name (job 2)>: [
{
<task 1 results>
}
],
...
<task N name (job 2)>: [
{
<task N results>
}
]
}
},
... <repeat if more jobs>
]
Workflow analysis results:
[
{
"job_name": "Smart Translation SAD and LID Pre-processing",
"data": [
{
"data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": 8000,
"duration_seconds": 8.0,
"number_channels": 1,
"label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
"id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
}
],
"tasks": {
"SAD_REGIONS": [
{
"task_trait": "REGION_SCORER",
"task_type": "SAD",
"message_type": "REGION_SCORER_RESULT",
"plugin": "sad-dnn-v7.0.2",
"domain": "multi-v1",
"analysis": {
"region": [
{
"start_t": 0.0,
"end_t": 8.0,
"class_id": "speech",
"score": 0.0
}
]
}
}
],
"LID": [
{
"task_trait": "GLOBAL_SCORER",
"task_type": "LID",
"message_type": "GLOBAL_SCORER_RESULT",
"plugin": "lid-embedplda-v3.0.1",
"domain": "multi-v1",
"analysis": {
"score": [
{
"class_id": "Mandarin",
"score": 3.5306692
},
{
"class_id": "Korean",
"score": -1.9072952
},
{
"class_id": "Japanese",
"score": -3.7805116
},
{
"class_id": "Tagalog",
"score": -7.4819508
},
{
"class_id": "Vietnamese",
"score": -8.094855
},
{
"class_id": "Iraqi Arabic",
"score": -10.63325
},
{
"class_id": "Levantine Arabic",
"score": -10.694491
},
{
"class_id": "French",
"score": -11.542379
},
{
"class_id": "Pashto",
"score": -12.11981
},
{
"class_id": "English",
"score": -12.323014
},
{
"class_id": "Modern Standard Arabic",
"score": -12.626052
},
{
"class_id": "Spanish",
"score": -13.469315
},
{
"class_id": "Iranian Persian",
"score": -13.763366
},
{
"class_id": "Amharic",
"score": -17.129797
},
{
"class_id": "Portuguese",
"score": -17.31257
},
{
"class_id": "Russian",
"score": -18.770994
}
]
}
}
]
}
},
{
"job_name": "Dynamic ASR",
"data": [
{
"data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
"msg_type": "PREPROCESSED_AUDIO_RESULT",
"mode": "MONO",
"merged": false,
"sample_rate": 8000,
"duration_seconds": 8.0,
"number_channels": 1,
"label": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
"id": "68984a7356fa1ea05f8e985868eb93e066ce80a0f4bf848edf55d547cfcbab41"
}
],
"tasks": {
"ASR": [
{
"task_trait": "REGION_SCORER",
"task_type": "ASR",
"message_type": "REGION_SCORER_RESULT",
"plugin": "asr-dynapy-v3.0.0",
"domain": "mandarin-tdnnChain-tel-v1",
"analysis": {
"region": [
{
"start_t": 0.0,
"end_t": 0.18,
"class_id": "跟",
"score": 31.0
},
{
"start_t": 0.18,
"end_t": 0.36,
"class_id": "一个",
"score": 83.0
},
{
"start_t": 0.36,
"end_t": 0.66,
"class_id": "肯定",
"score": 100.0
},
{
"start_t": 0.66,
"end_t": 0.81,
"class_id": "是",
"score": 83.0
},
{
"start_t": 0.81,
"end_t": 1.23,
"class_id": "北京",
"score": 95.0
},
{
"start_t": 1.23,
"end_t": 1.47,
"class_id": "啊",
"score": 96.0
},
{
"start_t": 2.07,
"end_t": 2.49,
"class_id": "他俩",
"score": 96.0
},
{
"start_t": 2.7,
"end_t": 3.09,
"class_id": "上海",
"score": 99.0
},
{
"start_t": 3.09,
"end_t": 3.21,
"class_id": "的",
"score": 99.0
},
{
"start_t": 3.21,
"end_t": 3.57,
"class_id": "人口",
"score": 99.0
},
{
"start_t": 3.57,
"end_t": 3.87,
"class_id": "好像",
"score": 73.0
},
{
"start_t": 3.87,
"end_t": 3.99,
"class_id": "没",
"score": 54.0
},
{
"start_t": 3.99,
"end_t": 4.32,
"class_id": "北京",
"score": 74.0
},
{
"start_t": 4.32,
"end_t": 4.68,
"class_id": "多",
"score": 99.0
},
{
"start_t": 4.86,
"end_t": 5.19,
"class_id": "但是",
"score": 100.0
},
{
"start_t": 5.4,
"end_t": 5.91,
"class_id": "不知道",
"score": 100.0
},
{
"start_t": 6.06,
"end_t": 6.48,
"class_id": "@reject@",
"score": 62.0
},
{
"start_t": 6.69,
"end_t": 7.05,
"class_id": "其他",
"score": 93.0
},
{
"start_t": 7.05,
"end_t": 7.2,
"class_id": "的",
"score": 97.0
},
{
"start_t": 7.65,
"end_t": 7.89,
"class_id": "上",
"score": 32.0
},
{
"start_t": 7.89,
"end_t": 7.95,
"class_id": "啊",
"score": 67.0
}
]
}
}
]
}
},
{
"job_name": "Dynamic MT",
"data": [
{
"data_id": "fvccmn-2009-12-21_019_2_020_2_cnv_R-030s_2.wav",
"msg_type": "WORKFlOW_TEXT_RESULT",
"text": "跟 一个 肯定 是 北京 啊 他俩 上海 的 人口 好像 没 北京 多 但是 不知道 @reject@ 其他 的 上 啊"
}
],
"tasks": {
"MT": [
{
"task_trait": "TEXT_TRANSFORMER",
"task_type": "MT",
"message_type": "TEXT_TRANSFORM_RESULT",
"plugin": "tmt-neural-v1.0.0",
"domain": "cmn-eng-nmt-v1",
"analysis": {
"transformation": [
{
"class_id": "test_label",
"transformed_text": "with someone in beijing they don't seem to have a population in shanghai but we don't know what else to do"
}
]
}
}
]
}
}
]
Enrollment Requests
Enrollments are a sub-set of classes that the user can create and/or modify. These are used for classes that cannot be known ahead of time and therefore can't be pre-loaded into the system, such as specific speakers or keywords of interest. To determine if a plugin supports or requires enrollments, or to check what its default enrolled classes are (if any), refer to that plugin's details page, linked from the navigation or the Release Plugins page.
Enrollment list format
As with analysis, both the Java and Python tools were designed to share as much of a common interface as possible, and as such share an input list format when providing exemplars for enrollment. The audio enrollment list input file is formatted as one or more newline-separated lines containing a path to an audio file and a class or model ID, which can be a speaker name, topic name, or query name for SID, TPD, and QBE respectively. A general example is given below, and more details and plugin-specific enrollment information are provided in the appropriate section in each plugin's documentation. Format:
<audio_path> <model_id>
Example enrollment list file (SID):
/data/speaker1/audiofile1.wav speaker1
/data/speaker1/audiofile2.wav speaker1
/data/speaker7/audiofile1.wav speaker7
Plugin Direct (Enrollment)
Performing an enrollment request is similar to an analysis request and is again very similar between the two tools. The usage statements for each can be examined by invoking each with their -h
or --help
flag:
$ ./OliveEnroll -h
usage: OliveEnroll
--channel <arg> Process stereo files using channel NUMBER
--classes Print class names if also printing plugin/domain
names. Must use with --print option. Default is
to not print class IDs
--decoded Sennd audio file as a decoded PCM16 sample buffer
instead of a serialized buffer. The file must be
a WAV file
--domain <arg> Use Domain NAME
--enroll <arg> Enroll speaker NAME. If no name specified then,
the pem or list option must specify an input file
--export <arg> Export speaker NAME to an EnrollmentModel
(enrollment.tar.gz)
-h Print this help message
-i,--input <arg> NAME of the input file (input varies by plugin:
audio, image, or video)
--import <arg> Import speaker from EnrollmentModel FILE
--input_list <arg> Batch enroll using this input list FILE having
multiple filenames/class IDs or PEM formmated
file
--nobatch Disable batch enrollment when using pem or list
input files, so that files are processed serially
--options <arg> Enrollment options from FILE
--output <arg> Write any output to DIR, default is ./
-p,--port <arg> Scenicserver port number. Defauls is 5588
--path Send the path to the audio file instead of a
(serialized) buffer. The server must have access
to this path.
--plugin <arg> Use Plugin NAME
--print Print all plugins and domains that suport
enrollment and/or class import and export
--remove <arg> Remove audio enrollment for NAME
$ olivepyenroll -h
usage: olivepyenroll [-h] [-C CLIENT_ID] [-D] [-p PLUGIN] [-d DOMAIN] [-e ENROLL] [-u UNENROLL] [-s SERVER] [-P PORT] [-t TIMEOUT] [-i INPUT] [--input_list INPUT_LIST] [--path]
optional arguments:
-h, --help show this help message and exit
-C CLIENT_ID, --client-id CLIENT_ID
Experimental: the client_id to use
-D, --debug The domain to use
-p PLUGIN, --plugin PLUGIN
The plugin to use.
-d DOMAIN, --domain DOMAIN
The domain to use
-e ENROLL, --enroll ENROLL
Enroll with this name.
-u UNENROLL, --unenroll UNENROLL
Uneroll with this name.
-s SERVER, --server SERVER
The machine the server is running on. Defaults to localhost.
-P PORT, --port PORT The port to use.
-t TIMEOUT, --timeout TIMEOUT
The timeout to use
-i INPUT, --input INPUT
The data input to analyze. Either a pathname to an audio/image/video file or a string for text input. For text input, also specify
the --text flag
--input_list INPUT_LIST
A list of files to analyze. One file per line.
--path Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
To perform an enrollment request with these tools, you will need these essential pieces of information:
- Plugin name (
--plugin
) - Domain name (
--domain
) - One of:
- A properly formatted enrollment list (
--input_list
), if providing multiple files at once (see below for formatting) - An input audio file (
--input
) AND the name of the class you wish to enroll (--enroll
forOliveEnroll
,-e
or--enroll
forolivepyanalyze
)
- A properly formatted enrollment list (
Generically, this looks like this for a single file input:
$ ./OliveEnroll --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>
$ olivepyenroll --plugin <plugin> --domain <domain> --input <path to audio file> --enroll <class name>
A more specific example:
$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input ~/path/to/enroll-file1.wav --enroll "Logan"
Or if providing the enrollment list format shown above, the call is even simpler. Generically:
$ ./OliveEnroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>
$ olivepyenroll --plugin <plugin> --domain <domain> --input_list <path to enrollment text file>
Specific:
$ ./OliveEnroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt
$ olivepyenroll --plugin sid-dplda-v2.0.2 --domain multi-v1 --input_list ~/path/to/enrollment_input.txt
Where the enrollment_input.txt
file might look like:
/some/data/somewhere/inputFile1.wav Logan
/some/other/data/somewhere/else/LoganPodcast.wav Logan
/yet/another/data/directory/charlie-speaks.wav Charlie
Workflow (Enrollment)
In the most basic case, enrollment using a workflow is just as simple as scoring with a workflow. This is becuase most workflows will only have a single enrollment-capable job; a job is a subset of the the tasks a workflow is performing, typically linked to a single plugin. In the rare case that you're using a workflow with multiple supported enrollment jobs, you will need to specify which job to enroll to. See the Advanced Workflow Enrollment section below.
Workflow enrollment is performed by using the olivepyworkflowenroll
utility, whose help/usage statement is:
$ olivepyworkflowenroll -h
usage: olivepyworkflowenroll [-h] [--print_jobs] [--job JOB] [--enroll ENROLL] [--unenroll UNENROLL] [-i INPUT] [--input_list INPUT_LIST] [--path]
[-s SERVER] [-P PORT] [-t TIMEOUT]
workflow
Perform OLIVE enrollment using a Workflow Definition file
positional arguments:
workflow The workflow definition to use.
optional arguments:
-h, --help show this help message and exit
--print_jobs Print the supported workflow enrollment jobs.
--job JOB Enroll/Unenroll an Class ID for a job(s) in the specified workflow. If not specified enroll or unenroll for ALL enrollment/unenrollment jobs
--enroll ENROLL Enroll using this (class) name. Should be used with the job argument to specify a target job to enroll with (if there are more than one enrollment jobs)
--unenroll UNENROLL Enroll using this (class) name. Should be used with the job argument to specify a job to unenroll (if there are more than one unenrollment jobs)
-i INPUT, --input INPUT
The data input to enroll. Either a pathname to an audio/image/video file or a string for text input
--input_list INPUT_LIST
A list of files to enroll. One file per line plus the class id to enroll.
--path Send the path of the audio instead of a buffer. Server and client must share a filesystem to use this option
-s SERVER, --server SERVER
The machine the server is running on. Defaults to localhost.
-P PORT, --port PORT The port to use.
-t TIMEOUT, --timeout TIMEOUT
The timeout (in seconds) to wait for a response from the server
If there is only one supported enrollment job in the workflow, using this utility for enrollment is very similar to the enrollment utilities above; but a workflow is provided instead of a plugin and domain combination. As with the other enrollment utilities, olivepyworkflowenroll
supports both single-file enrollment and batch enrollment using an enrollment-formatted text file.
Generically, this looks like:
$ olivepyworkflowenroll --input_list <path to enrollment file> <workflow>
$ olivepyworkflowenroll --input <path to audio file> --enroll <class name> <workflow>
And a specific example of each:
$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav --enroll "Logan" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
Important: Note that in OLIVE, when you enroll a class, you are enrolling to a plugin and domain, and enrollments are shared server-wide. Even when you enroll using a workflow and the olivepyworkflow
utility, enrollments are associated with the specific plugin/domain that the workflow is using under the hood. Any enrollments made to a workflow will be available to anyone else who may be using that server instance, and will also be made available to anyone interacting with the individual plugin - whether directly or via a workflow.
As a more concrete example of this, the "SmartTranscription" workflow that is sometimes provided with OLIVE, that performs Speech Activity Detection, Speaker Detection, Language Detection, and Speech Recognition on supported languages has a single plugin that supports enrollments; Speaker Detection. As a result, the workflow is set up to have a single enrollment job, to allow workflow users to enroll new speakers to be detected by this plugin. When enrollment is performed with this workflow, the newly created speaker model is created by and for the Speaker Detection plugin itself, and goes into the global OLIVE enrollment space. If a file is analyzed by directly calling this Speaker Detection plugin, the new enrollment will be part of the pool of target speakers the plugin will search for. More information on this concept of "Workflow Enrollment Jobs" is provided in the next section.
Advanced Workflow Enrollment - Jobs
It's rare, but possible for a workflow to bundle multiple enrollment-capable plugin capabilities into one. One example could be combining Speaker Detection in a workflow that also runs Query-by-Example Keyword Spotting, both of which rely on user enrollments to define their target classes. When this happens, if a user wishes to maintain the ability to enroll separate classes into each enrollable plugin, the workflow needs to expose these different enrollment tasks as separate jobs
in the workflow enrollment capabilities.
If this is necessary, the workflow will come from SRI configured appropriately - the user need only be concerned with how to specify which job to enroll with.
To query which enrollment jobs are available to a workflow, use the olivepyworkflowenroll
tool with the --print_jobs
flag:
$ olivepyworkflowenroll --print_jobs <workflow>
Investigating the "SmartTranscription" workflow we briefly mentioned above:
$ olivepyworkflowenroll --print_jobs SmartTranscriptionFull.workflow.json
enrolling 0 files
Enrollment jobs '['SDD Enrollment']'
Un-Enrollment jobs '['SDD Unenrollment']'
We see that there is only a single Enrollment job available; SDD Enrollment
. If there were others, they would be listed in this output. Now that the desired job name is known, enrolling with the specified job is done by supplying that job name to the --job
flag; in this case:
$ olivepyworkflowenroll --input ~/path/to/enroll-file1.wav --enroll "Logan" --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
$ olivepyworkflowenroll --input_list ~/path/to/enrollment_input.txt --job "SDD Enrollment" ~/olive5.4.0/oliveAppData/workflows/SmartTranscription.workflow.json
Adaption
The adaption process is complicated, time-consuming, and plugin/domain specific. Use the SRI provided Python client (olivepylearn) or Java client (OliveLearn) to perform adaptation.
To adapt using the olivepylearn utility:
olivepylearn --plugin sad-dnn --domain multi-v1 -a TEST_NEW_DOMAIN -i /olive/sadRegression/lists/adapt_s.lst
Where that adapt_s.lst looks like this:
/olive/sadRegression/audio/adapt/20131209T225239UTC_10777_A.wav S 20.469 21.719
/olive/sadRegression/audio/adapt/20131209T225239UTC_10777_A.wav NS 10.8000 10.8229
/olive//sadRegression/audio/adapt/20131209T234551UTC_10782_A.wav S 72.898 73.748
/olive//sadRegression/audio/adapt/20131209T234551UTC_10782_A.wav NS 42.754 43.010
/olive//sadRegression/audio/adapt/20131210T184243UTC_10791_A.wav S 79.437 80.427
/olive//sadRegression/audio/adapt/20131210T184243UTC_10791_A.wav NS 61.459 62.003
/olive//sadRegression/audio/adapt/20131212T030311UTC_10817_A.wav S 11.0438 111.638
/olive//sadRegression/audio/adapt/20131212T030311UTC_10817_A.wav NS 69.058 73.090
/olive//sadRegression/audio/adapt/20131212T052052UTC_10823_A.wav S 112.936 113.656
/olive//sadRegression/audio/adapt/20131212T052052UTC_10823_A.wav NS 83.046 83.114
/olive//sadRegression/audio/adapt/20131212T064501UTC_10831_A.wav S 16.940 20.050
/olive//sadRegression/audio/adapt/20131212T064501UTC_10831_A.wav NS 59.794 59.858
/olive//sadRegression/audio/adapt/20131212T084501UTC_10856_A.wav S 87.280 88.651
/olive//sadRegression/audio/adapt/20131212T084501UTC_10856_A.wav NS 82.229 82.461
/olive//sadRegression/audio/adapt/20131212T101501UTC_10870_A.wav S 111.346 111.936
/olive//sadRegression/audio/adapt/20131212T101501UTC_10870_A.wav NS 83.736 84.446
/olive//sadRegression/audio/adapt/20131212T104501UTC_10876_A.wav S 77.291 78.421
/olive//sadRegression/audio/adapt/20131212T104501UTC_10876_A.wav NS 0 4.951
/olive//sadRegression/audio/adapt/20131212T111501UTC_10878_A.wav S 30.349 32.429
/olive//sadRegression/audio/adapt/20131212T111501UTC_10878_A.wav NS 100.299 101.647
/olive//sadRegression/audio/adapt/20131212T114501UTC_10880_A.wav S 46.527 49.147
/olive//sadRegression/audio/adapt/20131212T114501UTC_10880_A.wav NS 44.747 46.148
/olive//sadRegression/audio/adapt/20131212T134501UTC_10884_A.wav S 24.551 25.471
/olive//sadRegression/audio/adapt/20131212T134501UTC_10884_A.wav NS 52.033 52.211
/olive//sadRegression/audio/adapt/20131212T141502UTC_10887_A.wav S 88.358 93.418
/olive//sadRegression/audio/adapt/20131212T141502UTC_10887_A.wav NS 46.564 46.788
/olive//sadRegression/audio/adapt/20131212T151501UTC_10895_A.wav S 10.507 11.077
/olive//sadRegression/audio/adapt/20131212T151501UTC_10895_A.wav NS 41.099 41.227
/olive//sadRegression/audio/adapt/20131212T154502UTC_10906_A.wav S 61.072 63.002
/olive//sadRegression/audio/adapt/20131212T154502UTC_10906_A.wav NS 19.108 19.460
/olive//sadRegression/audio/adapt/20131213T023248UTC_10910_A.wav S 97.182 97.789
/olive//sadRegression/audio/adapt/20131213T023248UTC_10910_A.wav NS 71.711 71.732
/olive//sadRegression/audio/adapt/20131213T041143UTC_10913_A.wav S 114.312 117.115
/olive//sadRegression/audio/adapt/20131213T041143UTC_10913_A.wav NS 31.065 31.154
/olive//sadRegression/audio/adapt/20131213T044200UTC_10917_A.wav S 90.346 91.608
/olive//sadRegression/audio/adapt/20131213T044200UTC_10917_A.wav NS 50.028 51.377
/olive//sadRegression/audio/adapt/20131213T050721UTC_10921_A.wav S 75.986 76.596
/olive//sadRegression/audio/adapt/20131213T050721UTC_10921_A.wav NS 12.485 12.709
/olive//sadRegression/audio/adapt/20131213T071501UTC_11020_A.wav S 72.719 73.046
/olive//sadRegression/audio/adapt/20131213T071501UTC_11020_A.wav NS 51.923 53.379
/olive//sadRegression/audio/adapt/20131213T104502UTC_18520_A.wav NS 11.1192 112.761
/olive//sadRegression/audio/adapt/20131213T121501UTC_18530_A.wav NS 81.277 82.766
/olive//sadRegression/audio/adapt/20131213T124501UTC_18533_A.wav NS 83.702 84.501
/olive//sadRegression/audio/adapt/20131213T134502UTC_18567_A.wav NS 69.379 72.258
/olive//sadRegression/audio/adapt/20131217T015001UTC_18707_A.wav NS 5.099 10.507