lid-hdplda-v2 (Language Identification)
Version Changelog
Plugin Version | Change |
---|---|
v2.0.0 | Initial release of plugin with GPU support - otherwise functionally identical to v1.0.2. Released with OLIVE 5.6.0. |
Description
LID plugins analyze an audio segment to produce a detection score for each of the enabled language or dialect classes for the domain in use. A plugin domain could consist of 50 or more languages and dialects in a single plugin, or as few as one for use cases where the customer is only focused on a single target class. Some plugin domains are solely focused on dialect or sub-language recognition, such as languages of China. Several LID plugins allow users to add new classes or augment existing classes with more data for the class to improve accuracy.
Language recognition plugin for clean telephone or microphone data, based on a language embeddings DNN fed with acoustic DNN bottleneck features, and language classification using a PLDA backend and multi-class calibration. In contrast to its predecessor, instead of training the PLDA parameters in a generative way, this plugin discriminatively trains the PDLA parameters. In addition, two PLDA models are trained, one to generate scores for clusters of highly related languages, and a second one to generate scores conditional to each cluster. For example, there is a “Spanish cluster” and inside that cluster we have languages as Castilian Spanish, Catalan, Galego etc… We call this approach Hierarchical Discriminative PLDA, or HDPLDA.
Domains
- multi-v1
- Generic domain for most close talking conditions with signal-to-noise ratio above 10 dB. Currently set up with 16 languages configured (optionally configurable to up to over 100 languages). See below for the currently-configured and available languages. See the configuring languages section for instructions on reconfiguring the available languages if necessary.
Inputs
Audio file or buffer and an optional identifier.
Outputs
Generally, a list of scores for all classes in the domain, for the entire segment. As with SAD and SID, scores are generally log-likelihood ratios where a score of greater than “0” is considered a detection. Plugins may be altered to return only detections, rather than a list of classes and scores, but this is generally done on the client side for sake of flexibility.
An example output excerpt:
input-audio.wav Amharic -6.73527861
input-audio.wav Arabic -3.31796265
input-audio.wav English 8.22701168
input-audio.wav French -2.98071671
input-audio.wav Iranian Persian -5.55558729
input-audio.wav Japanese -6.01283073
input-audio.wav Korean -5.64162636
input-audio.wav Mandarin -4.81163836
input-audio.wav Portuguese -1.93523705
input-audio.wav Russian -5.60199690
input-audio.wav Spanish -3.70800495
input-audio.wav Tagalog -4.86510944
input-audio.wav Vietnamese -5.10995102
Functionality (Traits)
The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.
- GLOBAL_SCORER – Score all submitted audio, returning a single score for the entire audio segment for each of the enrolled and enabled languages of interest.
Compatibility
OLIVE 5.4+
Limitations
Known or potential limitations of the plugin are outlined below.
All current LID plugins assume that an audio segment contains only a single language and may be scored as a unit. If a segment contains multiple languages the entire segment will still be scored as a unit. In many cases, a minimum duration of speech of 2 seconds is required in order to output scores. This value can optionally be overwritten, but scores provided for such short segments will be volatile.
Minimum Speech Duration
The system will only attempt to perform language identification if the submitted audio segment contains more than 2 seconds of detected speech.
Languages of Low Confidence
Many of the language models that are included and hidden within the domain's data model, disabled by default, do not contain enough data within the model for reliable detection of this language, and are included solely to help with score calibration, and differentiating other languages. If in doubt regarding whether an enrolled language should be used for detection or not, please reach out to SRI for clarification.
Comments
GPU Support
Please refer to the OLIVE GPU Installation and Support documentation page for instructions on how to enable and configure GPU capability in supported plugins. By default this plugin will run on CPU only.
Language/Dialect Detection Granularity
LID plugins attempt to distinguish dialects (ie., Tunisian Arabic and Levantine Arabic) or a base language class (such as Arabic). These can be mapped back to the base language if desired. This requires one change to be enabled.
- A mapping file 'dialect_language.map' must exist within the domain of the plugin for which mapping is to be performed (eg. domains/multi-v1/dialect_language.map). This file is a tab-delimited, two-column file that lists each mapping for the dialect to the languages as "
\t ". Example lines include: Levantine Arabic Arabic Tunisian Arabic Arabic
In the example above, the output labels of the dialects will be mapped to the same base language 'Arabic'. Note the exception in which mapping is not performed is for user-enrolled languages where it is assumed the user has provided the dialect or language label based on their requirements.
Note that we recommend users request these mapping files from SRI, or request the mapping to be performed before delivery of the plugin so that SRI can test and validate the final mapping before delivery.
Note also that the system will not allow you to create an enrollment with the same class name that you have languages mapped to. This is to avoid confusing situations where the system isn't sure if it should be considering the original pre-mapped models, or the newly enrolled user model. You must provide a unique name for any new language enrollments, that does not conflict with the dialect_langage.map. If you have already enrolled a conflicting model, and then add a mapping to this same name, the plugin will provide a warning message and intentionally fail to load.
Enrollments
Some recent LID plugins allow class modifications. Due to the more complex structure of the model training process for the HDPLDA architecture, this plugin does not support user-enrollable or user-augmentable classes. The language model set for this plugin is fixed, though the provided languages can still be enabled or disabled (see below) as desired.
Configuring Languages
Most LID plugins have the ability to re-configure the languages available in a domain. Configuring languages in the domain can be done by entering the domain directory of interest within the plugin folder and editing domain_config.txt. This file lists the pre-enrolled languages available in the plugin. Disabled languages are indicated by a # at the start of the line. To enable a language, remove the #. To disable a language, add a # at the start of the line.
Note that you cannot add languages to this list that are not supported by underlying models. If nonexistent language are added to this file, the plugin will intentionally fail.
Note that internally, this plugin uses ISO-639-3 Language Codes to refer to each language. They are translated to English language names before being reported by OLIVE for human consumption, but it's important to know the language code when enabling or disabling a language. Refer to the link above to look up language codes, or see below for a list of the included languages and a mapping of the internal codes to the reported language name.
Default Enabled Languages
The following languages are identified as high-confidence languages, supported by a sufficient amount of training data to make them reliable language detectors. As such, they are enabled by default in the plugin as-delivered, and serve as a general purpose base language set.
Language Code | Reported Language Name |
---|---|
arb, aeb, acm, afb, alv, arz | Arabic (dialects merged and reported as Arabic) |
cmn | MandarinChinese |
eng | English |
fas | Persian |
fra | French |
jpn | Japanese |
kor | Korean |
por | Portuguese |
pus | Pushto |
spa | Spanish |
tgl | Tagalog |
amh | Amharic |
rus | Russian |
vie | Vietnamese |
yue | CantoneseChinese |
Supported Languages
The full list of languages that exist as an enrolled class within this plugin as delivered are provided in the chart below. Note that as mentioned previously, not all of these languages were enrolled with enough data to serve as reliable detectors, but remain in the domain for the benefits to differentiating other languages, and for score calibration. If in doubt regarding whether an enrolled language should be used for detection or not, please reach out to SRI for clarification.
Language Code | Reported Language Name |
---|---|
abk | Abkhazian |
aeb | TunisianArabic |
acm | MesopotamianArabic |
afb | GulfArabic |
arb | ModernStandardArabic |
alv | LevantineArabic |
arz | EgyptianArabic |
asm | Assamese |
ben | Bengali |
bod | Tibetan |
bul | Bulgarian |
cmn | MandarinChinese |
dan | Danish |
deu | German |
ell | Greek |
eng | English |
eus | Basque |
fas | Persian |
est | Estonian |
fin | Finnish |
gaz | WestCentralOromo |
fra | French |
hat | HaitianCreole |
hau | Hausa |
heb | Hebrew |
hun | Hungarian |
hye | Armenian |
fao | Faroese |
isl | Icelandic |
ita | Italian |
indsun | Indonesian/Sundanese |
jav | Javanese |
jpn | Japanese |
kat | Georgian |
khm | Khmer |
kor | Korean |
lav | Latvian |
lin | Lingala |
lit | Lithuanian |
ltz | Luxembourgish |
guj | Gujarati |
mar | Marathi |
hbs | Serbo/Croatian |
mkd | Macedonian |
mlg | Malagasy |
mlt | Maltese |
mon | Mongolian |
mri | Maori |
mya | Burmese |
nan | MinNanChinese |
afr | Afrikaans |
nld | Dutch |
npi | Nepali |
bre | Breton |
oci | Occitan |
pan | Punjabi |
por | Portuguese |
pus | Pushto |
ron | Romanian |
sin | Sinhala |
ces | Czech |
pol | Polish |
slk | Slovak |
slv | Slovenian |
nde | Ndebele |
sna | Shona |
snd | Sindhi |
som | Somali |
cat | Catalan |
glg | Galician |
spa | Spanish |
sqi | Albanian |
swa | Swahili |
nno | NorwegianNynorsk |
swe | Swedish |
bak | Bashkir |
kaz | Kazakh |
tat | Tatar |
kan | Kannada |
mal | Malayalam |
tam | Tamil |
tel | Telugu |
tgk | Tajik |
ceb | Cebuano |
tgl | Tagalog |
thalao | Thai/Lao |
amh | Amharic |
tir | Tigrinya |
tuk | Turkmen |
aze | Azerbaijani |
tur | Turkish |
bel | Belarusian |
rus | Russian |
ukr | Ukrainian |
urdhin | Urdu/Hindi |
uzb | Uzbek |
vie | Vietnamese |
wuu | WuChinese |
yid | Yiddish |
ymm | MaayMaay |
yor | Yoruba |
yue | CantoneseChinese |
Global Options
This plugin offers several basic user-configurable parameters which can be edited directly in plugin_config.py or passed via the API. Note that if changed in the plugin_config.py file, a server running the plugin will need to be restarted in order to use the new parameters, while parameters passed via the API are dynamically updated and do not require a restart of the server.
The options available and their default values are described below:
Option Name | Description | Default | Expected Range |
---|---|---|---|
min_speech | The minimum amount of detected speech in order to process a file. A higher value will prevent shorter files from being processed with the benefit of more reliable outputs. | 2.0 | 0.5 - 4.0 |
sad_threshold | The threshold used to determine speech for processing. A higher value results in less speech detected, while removing more noise. | 1.0 | -3.0 - 6.0 |