Skip to content

OLIVE Plugin Traits

Traits Overview

The functionality of each OLIVE plugin is defined by the API Traits that it implements. Each Trait defines a message or set of messages that the plugin must implement to perform an associated task. The available Traits are listed below, along with their associated implemented API messages, in the format:

  • Trait
    • Implemented Message 1
    • ...
    • Implemented Message N

Traits and their Corresponding Messages

The Trait and Message List follows. Each Trait name links to the relevant section on this page . Each API message entry links to the relevant portion of the Protocol Buffers Message Definition Reference Page.

In addition to the messages above, the following messages exist for interacting with the server itself and are independent of individual plugins or plugin Traits:

Full message definition details for all of the messages mentioned here

Request/Result Message Pairs

In general, each Request message is paired with a Result message, that defines the structure and contents of the server's reply to a given client Request message. The table below defines the pairs of the Result messages for the Request messages mentioned above; and the full message definition details can be found in the OLIVE API Message Protocol Documentation, which each entry below is linked to.

Request Message Result (Response) Message
GlobalScorerRequest GlobalScorerResult
RegionScorerRequest RegionScorerResult
FrameScorerRequest FrameScorerResult
ClassModificationRequest ClassModificationResult
ClassRemovalRequest ClassRemovalResult
ClassExportRequest ClassExportResult
ClassImportRequest ClassImportResult
AudioModificationRequest AudioModificationResult
PluginAudioVectorRequest PluginAudioVectorResult
PreprocessAudioTrainRequest PreprocessAudioTrainResult
PreprocessAudioAdaptRequest PreprocessAudioAdaptResult
SupervisedTrainingRequest SupervisedTrainingResult
UnsupervisedAdaptationRequest UnsupervisedAdaptationResult
GetUpdateStatusRequest GetUpdateStatusResult
ApplyUpdateRequest ApplyUpdateResult
GlobalComparerRequest GlobalComparerResult
TextTransformationRequest TextTransformationResult
AudioAlignmentScoreRequest AudioAlignmentScoreResult

or

Task Request Message Result (Response) Message
Global Score GlobalScorerRequest GlobalScorerResult
Region Score RegionScorerRequest RegionScorerResult
Frame Score FrameScorerRequest FrameScorerResult
Create or modify a class enrollment ClassModificationRequest ClassModificationResult
Remove an enrolled class ClassRemovalRequest ClassRemovalResult
Export a class for later use or for model sharing ClassExportRequest ClassExportResult
Import a previously exported class ClassImportRequest ClassImportResult
Submit audio for modification (enhancement) AudioModificationRequest AudioModificationResult
Request a vectorized representation of audio PluginAudioVectorRequest PluginAudioVectorResult
Prepare audio for submission to training a new domain from scratch PreprocessAudioTrainRequest PreprocessAudioTrainResult
Prepare audio for submission to adapt an existing domain PreprocessAudioAdaptRequest PreprocessAudioAdaptResult
Query a plugin to see if it is ready to apply an "Update" GetUpdateStatusRequest GetUpdateStatusResult
Instruct a plugin to perform an "Update" ApplyUpdateRequest ApplyUpdateResult
Submit two audio files for forensic comparison GlobalComparerRequest GlobalComparerResult
Submit text (string) for translation TextTransformationRequest TextTransformationResult
Submit two or more audio files for alignment shift scores AudioAlignmentScoreRequest AudioAlignmentScoreResult

Traits Deep Dive

This section provides more details on what the "real world" usage and definition of each of the above Traits actually entails. If anything is unclear or could use more expansion, please contact us and we will provide updates and/or clarification.

GlobalScorer

A GlobalScorer plugin has the output of reporting a single score for each relevant plugin class representing the likelihood that the entire audio file or clip contains this class. For example, when audio is sent to a Speaker Identification (SID) plugin with the GlobalScorer Trait, if there are 3 enrolled speakers at that time, the plugin will return a single score for each enrolled speaker representing the likelihood that the audio is comprised of speech from each speaker, respectively.

The output of such a request will contain information that looks something like this:

/data/sid/audio/file1.wav speaker1 -0.5348
/data/sid/audio/file1.wav speaker2 3.2122
/data/sid/audio/file1.wav speaker3 -5.5340

Keep in mind that speakers and speaker identification are just examples in this case, and that this generalizes to any global scoring plugin and whatever classes it is meant to distinguish between. This same example could be represented more generically as:

/data/sid/audio/file1.wav class1 -0.5348
/data/sid/audio/file1.wav class2 3.2122
/data/sid/audio/file1.wav class3 -5.5340

Due to the nature of global scoring plugins, they are best used when the attribute that's being estimated is known to be, or very likely to be static. For example, using global scoring SID on one side of a 4-wire telephone conversation, where the speaker is likely to remain the same throughout, or using global scoring LID on a TV broadcast that is expected to only contain a single language.

These plugins can miss the nuances of or be confused by things like code-switching within recordings, or unexpected speaker changes.

The benefits of global scoring plugins are that they are often very fast, since they treat the entirety of the submitted audio as a single unit, and do not worry about trying to subdivide or chunk it in any way. For a first-pass or quick triage approach, or when the data is known or suspected to be homogenous in the ways discussed above, these plugins are very effective. When finer grained results are necessary, though, a RegionScorer or FrameScorer may be more appropriate.

RegionScorer

For each audio file or recording submitted to a RegionScorer plugin, results are returned consisting of 0 or more regions with an associated score and plugin class. Typically, regions are returned in the case of a 'detection' of an instance of the respective class. A 'region' is a pair of timestamps, referring to the start time and end time of the detected class presence, and includes the name of the respective class, and an associated score. An example of this might be a keyword spotting plugin returning the keyword class name, as well as the location and likelihood of that keyword's presence. This might look something like:

/data/test/testFile1.wav 0.630 1.170 Airplane 4.3725
/data/test/testFile1.wav 1.520 2.010 Watermelon -1.1978

Another example of output you may see from a region scoring plugin follows, showing what a region scoring speaker detection plugin might output. In this example, an enrolled speaker, speaker2 was detected in testFile1.wav from 0.630 s to 1.170 s with a likelihood score of 4.3725. Likewise for 1.520 s to 2.010 s in the same file for the enrolled speaker called speaker1, this time with a likelihood score of -1.1978.

/data/test/testFile1.wav 0.630 1.170 speaker2 4.3725
/data/test/testFile1.wav 1.520 2.010 speaker1 -1.1978

An even simpler output of this type may just label the regions within a file that the plugin determines contain speech. Again, these are just arbitrary examples using a specific plugin type to more easily describe the scoring type; a more generic output example could be:

/data/test/testFile1.wav 0.630 1.170 class1 4.3725
/data/test/testFile1.wav 1.520 2.010 class2 -1.1978

RegionScorer plugins allow a finer resolution with respect to results granularity, and allow plugins to be more flexible and deal with transitions between languages or speakers or other classes of interest within a given audio file or recording. Sometimes this comes at a cost of increased processing complexity and/or slower runtime.

FrameScorer

A plugin with the FrameScorer Trait that is queried with a FrameScorerRequest will provide score results for each frame of whatever audio has been submitted. Unless otherwise noted, an audio frame is 10 milliseconds. The most common OLIVE plugin that has the FrameScorer Trait are speech activity detection (SAD) plugins, where the score for each frame represents the likelihood of speech being present in that 10 ms audio frame.

The output in this case is simply a sequential list of numbers, corresponding to the output score for each frame, in order:

1.9047
1.8088
1.2382
-0.8862
-2.5509

In this example, these frames can then be processed to turn them into region scores, labeling the locations where speech has been detected as present within the file. Returning raw frame scores as a result allows more down-stream flexibility, allowing thresholds to be adjusted and regions re-labeled if desired, for example to allow for tuning for more difficult or unexpected audio conditions.

ClassModifier

Any plugin capable of adding or removing classes to or from its set of enrollments or target classes carries the ClassModifier Trait. This trait means the set of classes the plugin is interested in is mutable, and can be altered. Typically this is done by providing labeled data that belongs to the new class of interest to the server, which then enrolls a new model representing what it has learned about distinguishing this class from others. Existing class models can also be augmented by providing the system with additional data with this class label.

In addition to adding new classes, and improving/augmenting existing ones, it is also possible to remove enrolled classes from domains of plugins carrying this trait, using the ClassRemovalRequest message.

ClassExporter

A plugin that implements the ClassExporter Trait is capable of exporting enrolled class models in a form that can be either imported back into the same system, as a way to save the model for preservation, or it can be imported into a different system, so long as that system has the same plugin and domain that was used to initially create the class model. This allows enrollments to be shared between systems.

In general, exported models are specific to the plugin and domain that created them, so care must be taken to ensure models are not mixed in to other plugins. It is up to the client program or end user to keep tabs on where the exported models came from and what class they represent, and to manage these models once they've been exported.

AudioConverter

An AudioConverter plugin has audio as both its input and as its output. This Trait allows the system to take an audio file or buffer, potentially perform some modification(s) on it, and return audio to the requestor, this can be the same audio untouched, a modified version of this audio, or completely different audio, depending on the purpose of the plugin. The only current plugin to implement the AudioConverter Trait is a speech enhancement plugin for enhancing the speech intelligibility of the submitted audio.

AudioVectorizer

This is a prototype feature that allows the system to preprocess an audio file or model to perform the compute-heavy steps of feature extraction and others, so that at a later time a user can either enroll a model or score a file very quickly, since the most time-consuming steps have already been performed. Like the ClassExporter Trait and exported classes/models, the vectorized audio representations are plugin/domain specific and cannot be used with plugins other than the one that created them. This is helpful if enrollments will be frequently rolled in or out of the system, or if the same audio files will be frequently re-tested, to avoid wasted repeat compute cycles.

LearningTrait

There are three total LearningTraits:

  1. SupervisedAdapter
  2. SupervisedTrainer
  3. UnsupervisedAdapter

Only SupervisedAdapter is currently supported by plugins in the OLIVE ecosystem; the others are deprecated or prototype features and should be ignored.

SupervisedAdapter

Plugins with the SupervisedAdapter Trait enable the ability to perform Supervised Adaptation of a domain. This is human assisted improvement of a plugin's domain, generally with feedback to the system in the form of annotations of target phenomena, or in some cases, error corrections. For more information on Adaptation, refer to this section.

UpdateTrait

Certain systems, such as recent speaker and language recognition plugins, sid-embed-v5 and lid-embedplda-v1], have the capability to adapt themselves based purely on raw data provided by the user in the normal use of the system (enrollment and test data). These systems collect statistics over from the data feed through the system that can be used to update system parameters in unsupervised (autonomous) adaptation, thereby improving the performance of the plugin in the conditions in which is has been deployed. Since the use of this data in adaptation changes the behavior of the plugin the system does not automatically update itself, but rather requires the end user to "trigger" the update and use the data the system has collected to adapt. Implementing and invoking the associated Update Request Messages to start the Update process will use the accrued data from the test and enroll conditions to pdate system parameters and apply the update. One can always revert to the plugin's original state by clearing out the data and statistics collected the learn directory from the server's storage.

GlobalComparer

A plugin that supports the GlobalComparer Trait has the capability of accepting two waveforms as input, performing some sort of analysis or comparison of the two, and returning a PDF report summarizing the analysis.

TextTransformer

A plugin with the TextTransformer Trait is used to translate a text input when queried with a TextTransformationRequest, providing translation results for the submitted string (it does not take an audio input, unlike other scoring traits).

The output in this case is simply a string, which is the translation result

AudioAlignmentScorer

A plugin with the AudioAlignmentScorer Trait can be used to provide alignment shift scores for two or more audio inputs using an AudioAlignmentScoreRequest.

The output is a set of shift scores betwen each combination of two audio inputs in the AudioAlignmentScoreRequest