Skip to content

OLIVE Enterprise API Primer

Introduction

The OLIVE Enterprise API enables third-party tools and existing workflows to interface with the OLIVE backend system. This page provides an introduction to the concepts and information needed to begin implementing OLIVE’s speech processing tools and capabilities within a client application by integrating through the OLIVE Enterprise API (previously known as the SCENIC Enterprise API).

In general, the three main components of an OLIVE-based audio processing system are:

  1. The OLIVE Server
  2. The OLIVE Plugins
  3. An OLIVE-enabled Client

Links are provided above and throughout this page for more information about the OLIVE Server and the Plugins. The main focus of this page is covering how to help implement or create an OLIVE-enabled Client.

SRI offers a Java-based Reference API implementation for OLIVE to allow integrators relying on Java to quickly start folding OLIVE functionality into their new or current project. This code is also available as example code for integrators using different programming languages to use as a general reference for accomplishing certain tasks. This guide covers both implementing the existing Java-based API reference implementation, as well as some information that you will need to build your own reference implentation, if desired. If you're interested in having a reference implementation built and provided in a language other than Java, please refer to this section below.

The next section starts by introducing general operating concepts of the OLIVE system, with links to resources that expand on this information if you'd like to dig a bit deeper. Following that is a section that describes the steps a client uses to connect to the OLIVE server, submit requests for processing, and unpack results returned from the server.

If you are interested in full Java examples of clients requesting various tasks to be performed using this API, continue on to the Java API Client Integration Guide.

Although the current reference OLIVE API is Java based, this message based API can be implemented in different language. Should you have a need for a non-Java OLIVE API, this page of the documentation contains useful information for building an entirely new OLIVE API in a language other than Java.


Useful Concepts to Know

Before you start using the OLIVE API to put together client programs, it is useful to have some background information about how the OLIVE system operates. The OLIVE system is client/server based, using Google Protocol Buffers (Protobuf) transported over ZMQ sockets to communicate between the client and server. Therefore any API implementation that allows client access to the backend OLIVE system must also follow the same client/server model of exchanging Google Protocol Buffer based messages over ZMQ sockets.

The OLIVE server accepts client connections at pre-selected but configurable ports. Once a connection is established between a client and the server, task requests and results (implemented as Protobuf messages) can be sent and received over this connection.

Via the server, client requests are directed to and performed by OLIVE speech processing agents (plugins). Also via the server, results of completed requests are returned to the client. Task requests and results are sent in the form of OLIVE messages which contain information pertinent to the type of request and the type of plugin performing the request. The contemts and details of all of the possible messages, both requests and responses, are detailed in the API Message Reference page, but more details about which of these will be important to you and how to actually use these messages follow below.

If you are new to the OLIVE Enterprise API then you will quickly discover that the API does not offer explicit API calls for making “SAD”, “SID”, or “LID” requests. The reason for this is the API is based on the OLIVE Plugin Framework used by the OLIVE server, which abstracts speech systems such as SAD, SID, and LID into “traits”. It is these “traits” that are expressed in this API, so score and class modification requests are based on the plugins traits and not the type of speech system. The OLIVE server identifies plugins by the task types (SAD, SID, LID, etc.) they claim they can perform, and the details of how these plugins can go about those tasks are defined by the Traits that they implement. For example, a plugin with a Speaker Identification task often implements the GlobalScorer Trait. From the table below or from the Traits Info Page, you can see the messages that accompany implementing this Trait, which define the functionality associated with being a GlobalScorer, and let us see the types of output we can expect to retrieve from such a plugin.

The following subsections describe each of these basic concepts in more details.

 

OLIVE Plugin Tasks and Traits

The OLIVE server identifies plugins by the type of tasks they can perform. For example, a plugin designed to recognize and identify the voice of a pre-enrolled speaker carries the Task of Speaker Identification, or SID. Likewise, a plugin that is designed to label speech regions within an audio clip has the Task of Speech Activity Detection, or SAD.

In order to perform these Tasks, plugins require some abilities - for example, a SID task plugin requires the ability to perform enrollment of speakers and scoring of audio against those enrollments; a SAD type plugin must be able to perform a scoring task to calculate the likelihood of speech throughout an audio clip, and sometimes adaptation tasks; etc. These abilities are defined by what OLIVE calls Traits. More detailed information on the available OLIVE Traits can be found on the respective info page, but a quick primer is also included below.

Continuing with the previous example, a SID type plugin needs to be able to perform the scoring task. Historically, most SID plugins assume that the audio being passed to it is homogenous, and consists of a single speaker, and it is therefore possible to score all of the audio and return a single score for each enrolled speaker model representing whether or not the candidate speech is likely from this speaker. This type of score is called a global score, and the Trait that the plugin implements to gain this functionality is called GlobalScorer. "Implementing" this Trait means that the plugin contains routine definitions that allow it to receive and appropriately respond to the the OLIVE API Messages associated with that Trait. In order to have speaker models to score against, a SID plugin also needs the ability to enroll speaker models as classes. The ability to modify enrolled classes comes from the ClassModifier Trait and its associated API messages.

For a complete list of the available Traits and their associated request messages, including the appropriate reply message to each one, refer to the Plugin Traits info page and/or the API Message Reference page. For a high level overview of what this means in terms of available plugins, continue below.

The following table shows the likely traits for the scoring functionality of selected OLIVE plugin task types:

Plugin Type Scoring Trait
Speech Activity Detection (SAD) FrameScorer and/or RegionScorer
Language Identification (LID) GlobalScorer
Speaker Identification (SID) GlobalScorer
Speaker Diarization and Detection (SDD) RegionScorer
Keyword Spotting (KWS) RegionScorer
Query By Example KWS (QBE) RegionScorer
Topic Detection (TPD) RegionScorer
Speech Enhancement (ENH) AudioConverter
Voice Type Discrimination (VTD) FrameScorer and/or RegionScorer

For plugins that allow or require enrollment functionality, the associated Trait is ClassModifier. The following plugin types may currently have this Trait for the enrollment task:

Plugin Type Enrollment Trait
Language Identification (LID)* ClassModifier
Speaker Identification (SID) ClassModifier
Speaker Diarization and Detection (SDD) ClassModifier
Query By Example KWS (QBE) ClassModifier
Topic Detection (TPD) ClassModifier

*Note that not all LID plugins allow or support language/class enrollment. When in doubt, refer to individual plugin documentation, or check the plugin's implemented Traits. Please remember, these tables may not be true for all plugins and some Plugins may support additional Traits. This mapping is only intended to help introduce the OLIVE Enterprise API and its underlying Plugin Framework to new developers.

Some SAD plugins also allow the end user to perform domain adaptation to improve plugin performance in certain audio conditions. The Trait listed below is associated with this task.

Plugin Type Adaptation Trait
Speech Activity Detection (SAD) SupervisedAdapter

 

OLIVE Message Requests / Results By Plugin Traits

Now that you know a bit about the available Plugin Tasks and the Traits they're likely to implement, we will discuss the Messages that actually allow for requests to be made to the plugins, and for information to be passed back from the plugins to the client.

A client connected to the OLIVE server can submit message to the server to request information from plugins. The table below shows what requests are generally available for selected plugin types. Note that it is possible to create plugins that may stray from this list and may implement a different Trait than what is shown below.

Scoring Traits

Plugin Trait Task Request Message Result Message
Global Scorer LID, SID GlobalScorerRequest
GlobalScorerStereoRequest
GlobalScorerResult
GlobalScorerStereoResult
RegionScorer SDD, SAD*, KWS, QBE, TPD RegionScorerRequest
RegionScorerStereoRequest
RegionScorerResult
RegionScorerStereoResult
FrameScorer SAD* FrameScorerRequest
FrameScorerStereoRequest
FrameScorerResult
FrameScorerStereoResult

*Note that not all SAD plugins support FrameScorer and/or RegionScorer. Please refer to specific plugin documentation or consult with SRI if unsure.

As you can see from this table, the same few API messages are reused for most scoring requests, meaning the actual code implementation for these tasks can be kept simple.

Other Traits

Plugin Trait Task Functionality Request Message Result Message
AudioConverter ENH Audio Modification, Speech Enhancement AudioModificationRequest AudioModificationResult
GlobalComparer FOR Forensic Audio Comparison GlobalComparerRequest GlobalComparerResult
LearningTrait / SupervisedAdapter SAD Audio Condition Domain Adaptation SupervisedAdaptationRequest
PreprocessAudioAdaptRequest
SupervisedAdaptationResult
PreprocessAudioAdaptResult

Other Useful OLIVE Message Types

Besides the messages related to plugin tasking and interaction mentioned in the two sections above, there are several additional messages which are useful to know for server management and other non-plugin-specific tasks.

A comprehensive list of OLIVE API Messages is available in OLIVE API Message Reference.

Information Persistence

As of OLIVE 4.0 the backend OLIVE server and API no longer support persistence. It is the responsibility of the client to store, manage, and reference results from the OLIVE server. The OLIVE server does persist enrolled class models and some collected adaptation information.

Dependencies

The OLIVE Enterprise API utilizes the following dependencies:

  • Google Protocol Buffers 3.4: Used to define the messages that comprise the OLIVE API. Most messages are in the form of request/reply.
  • ZeroMQ 3.2.3: Provides inter-process communication over several possible mechanisms including TCP.
  • Protobuf-net: Optional - needed if you wish to integrate from a .NET/Mono application.

You will need versions of these software dependencies appropriate for your system architecture/operating system in order to communicate with the OLIVE server.

Supported Languages

Given the dependencies described in the previous section it is possible to utilize the OLIVE API from the following programming languages/runtimes:

  • Java (or other JVM languages that provide Java interoperability)
  • C# (or other .NET language, via the protobuf.net library. This is an extra dependency)
  • Python
  • C++

Note that because the Java-based OLIVE UI utilizes the API, SRI has already developed a Java client library to facilitate use of the API from Java. For more information see the “Java Client Library” section of this document.