Skip to content

lid-embedplda-v2 (Language Identification)

Version Changelog

Plugin Version Change
v2.0.0 Initial plugin release, functionally identical to v1.0.0, but updated to be compatible with OLIVE 5.0.0
v2.0.1 (latest) Updated to be compatible with OLIVE 5.1.0

Description

LID plugins detect one or more language or dialect classes in an audio segment as a global score. A plugin domain could consist of 50 or more languages and dialects in a single plugin, or as few as one for use cases where the customer is only focused on a single target class. Some plugin domains are solely focused on dialect or sub-language recognition, such as languages of China. Several LID plugins allow users to add new classes or augment existing classes with more data for the class to improve accuracy.

Language recognition plugin for clean telephone or microphone data, based on a language embeddings DNN fed with acoustic DNN bottleneck features, and language classification using a PLDA backend and duration-aware calibration. This plug-in has been reconfigured to allow enrollment and addition of new classes. Unsupervised adaptation through target mean normalization, and supervised PLDA and calibration updates from enrollments have been implemented via the update function. These updates must be invoked by the user via the API.

Domains

  • multi-v1
    • Generic domain for most close talking conditions with signal-to-noise ratio above 10 dB. Currently set up with 10 languages configured (optionally configurable to up to 63 languages). See below for the currently-configured and available languages. See the configuring languages section for instructions on reconfiguring the available languages if necessary.

Inputs

Audio file or buffer and an optional identifier.

Outputs

Generally, a list of scores for all classes in the domain, for the entire segment. As with SAD and SID, scores are generally log-likelihood ratios where a score of greater than “0” is considered a detection. Plugins may be altered to return only detections, rather than a list of classes and scores, but this is generally done on the client side for sake of flexibility.

An example output excerpt:

    input-audio.wav amh -19.9012123573
    input-audio.wav ara -15.8882738579
    input-audio.wav cmn -15.5530382622
    input-audio.wav eng -14.1870705116
    input-audio.wav fas -17.3224474419
    input-audio.wav fre 10.1847232353
    input-audio.wav hau -15.1134468544
    input-audio.wav jpn -21.0655495155
    input-audio.wav kor -18.3601671684
    input-audio.wav pus -16.2738787163
    input-audio.wav rus -10.4046117294
    input-audio.wav spa -18.1588427055
    input-audio.wav tur -14.0825478065
    input-audio.wav urd -20.4127785194
    input-audio.wav vie -18.552107476

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

Compatibility

OLIVE 5.1+

Limitations

Known or potential limitations of the plugin are outlined below.

All current LID plugins assume that an audio segment contains only a single language and may be scored as a unit. If a segment contains multiple languages the entire segment will still be scored as a unit. In many cases, a minimum duration of speech of 2 seconds is required in order to output scores. This value can optionally be overwritten, but scores provided for such short segments will be volatile.

Minimum Speech Duration

The system will only attempt to perform language identification if the submitted audio segment contains more than 2 seconds of detected speech.

Languages of Low Confidence

Many of the language models that are included and hidden within the domain's data model, disabled by default, do not contain enough data within the model for reliable detection of this language, and are included solely to help with score calibration, and differentiating other languages. If in doubt regarding whether an enrolled language should be used for detection or not, please reach out to SRI for clarification.

Comments

Language/Dialect Detection Granularity

LID plugins that are capable of dialect detection typically include functionality to fall back to the base language class in the case of limited confidence. This is typically done by outputting scores for all dialects (i.e. ara-arz, ara-apc, and ara-arb) as well as the base language (i.e. ara). Note that any language with dialect information does not have the base class enrolled, but this is determined from the maximum of the dialect detectors for the base language available within the plugin (whether exposed or not). In the case that a dialect score is sufficiently high, the base language score will be set to 0.001 lower than the highest-scoring dialect, and otherwise the base class is set to 0.001 higher than the highest-scoring dialect score. In this way, labelling the audio sample based on the maximum scoring will indicate a specific dialect if confident, and otherwise the base language. This default mode is defined as BASEAPPEND. There are two alternate modes available that can optionally be set:

  • BASEAPPEND - Default behavior, described above.
  • BASEONLY – Output only base language scores formed by the maximum of the dialect-specific scores for a given base language.
  • STANDARD – Output scores based on enrolled classes without producing a base language summarization for dialect-compatible detectors.

Enrollments

Some recent LID plugins allows class modifications. A class modification is essentially an enrollment capability similar to SID. A new enrollment is created with the first class modification request (sending the system audio with a language label, generally 30 seconds or more per cut), and becomes usable when sufficient cuts have been provided (approximately 10). In general, 30 minutes from around 30 samples is the minimum amount of data required to produce a reasonable language model. This enrollment can be augmented with subsequent class modification requests by adding more audio from the same language to an existing class, again, like SID or SDD. In addition to user enrolled languages, most LID plugins are supplied with several pre-enrolled languages. Users can augment these existing languages using their own data by enrolling audio with the same label as an existing language.

Configuring Languages

Most LID plugins have the ability to re-configure the languages available in a domain. Configuring languages in the domain can be done by entering the domain directory of interest within the plugin folder using the command line interface and calling

    $ ./configure_languages.py

to get all languages or

    $ ./configure_languages.py lang1,lang2,…,langN

for a subset of available languages. Please note that running ./configure_languages.py without any arguments should be done with extreme care. This will enable all languages and dialects in the domain; including those that were included solely for their utility in score calibration, that may not have enough training data to create a model that acts as a reliable detector. Enabling all languages may adversely affect the plugin’s performance. This plugin supports adjusting the language detection granularity discussed above, though this is for advanced users only. An example of changing this setting using the configure_langauges,py script is

    $ ./configure_languages.py lang1,lang2,...,langN BASEONLY

Where the options for this setting are discussed above, if supported.

Default Enabled Languages

The following languages are identified as high-confidence languages, supported by a sufficient amount of training data to make them reliable language detectors. As such, they are enabled by default in the plugin as-delivered, and serve as a general purpose base language set.

Language Code Language Name
amh Amharic
arz Egyptian Arabic
apc North Levantine Arabic
arb Modern Standard Arabic
cmn Mandarin Chinese
yue Yue Chinese
eng English
fas Farsi
fre French
jpn Japanese
kor Korean
pus Pashto
rus Russian
spa Spanish
tgl Tagolog
tha Thai
tur Turkish
urd Urdu
vie Vietnamese

Supported Languages

The full list of languages that exist as an enrolled class within this plugin as delivered are provided in the chart below. Note that as mentioned previously, not all of these languages were enrolled with enough data to serve as reliable detectors, but remain in the domain for the benefits to differentiating other languages, and for score calibration. If in doubt regarding whether an enrolled language should be used for detection or not, please reach out to SRI for clarification.

Language Code Language Name
alb Albanian
amh Amharic
arz Egyptian Arabic
apc North Levantine Arabic
arb Modern Standard Arabic
aze Azerbaijani
bel Belorussian
ben Bengali
bos Bosnian
bul Bulgarian
cmn Mandarin Chinese
yue Yue Chinese
eng English
fas Farsi
fre French
geo Georgian
ger German
gre Greek
hau Hausa
hrv Croatian
ind Indonesian
ita Italian
jpn Japanese
khm Khmer
kor Korean
mac Macedonian
mya Burmese
nde Ndebele
orm Oromo
pan Punjabi
pol Polish
por Portuguese
prs Dari
pus Pashto
ron Romanian
rus Russian
sna Shona
som Somali
spa Spanish
srp Serbian
swa Swahili
tam Tamil
tgl Tagalog
tha Thai
tib Tibetan
tir Tigrinya
tur Turkish
ukr Ukranian
urd Urdu
uzb Uzbek
vie Vietnamese

Global Options

This plugin does not feature user-configurable option parameters. It does, however, offer configurable language models and language-reporting granularity. For details, refer here.