gdd-embedplda-v1 (Gender Detection)

Version Changelog

Plugin Version	Change
v1.0.0	Initial plugin release with OLIVE 5.2.0

Description

Gender Detection plugins will detect and label the gender of the speaker for regions of speech in a submitted audio segment. This is in contrast to Gender Identification (GID) plugins, which label the entire segment with a single gender. So, unlike Gender Identification (GID), GDD is capable of handling audio where multiple speakers of a different gender are speaking, and will provide timestamp region labels to point to label male and female regions.

Domains

multi-v1
- Generic domain for most close talking conditions with signal-to-noise ratio above 10 dB.

Inputs

Audio file or buffer and an optional identifier.

Outputs

GDD plugins return a list of regions with a score for the detected gender within that region. The starting and stopping boundaries are denoted in seconds. As with LID, scores are log-likelihood ratios, where a score greater than the default threshold of "0" is considered to be a detection.

An example output excerpt:

    input-audio.wav 0.000 41.500 Female 5.40590000
    input-audio.wav 43.500 77.500 Female 5.29558277
    input-audio.wav 77.500 78.500 Male 2.63369179
    input-audio.wav 78.500 80.500 Male 2.25519705
    input-audio.wav 85.500 86.500 Female 2.06612849
    input-audio.wav 97.500 98.500 Female 3.74665093
    input-audio.wav 98.500 99.500 Male 2.22936487
    input-audio.wav 105.500 106.500 Male 2.72254372
    input-audio.wav 107.500 108.500 Female 2.60355234
    input-audio.wav 108.500 110.500 Female 2.76414633
    input-audio.wav 109.500 113.140 Male 2.85003138

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

REGION_SCORER – Score all submitted audio, returning labeled regions within the submitted audio where each region includes a detected gender and corresponding score for this gender
- RegionScorerRequest

Compatibility

OLIVE 5.2+

Limitations

Known or potential limitations of the plugin are outlined below.

Minimum Speech Duration

The system will only attempt to perform gender detection if the submitted audio segment contains more than 2 seconds of detected speech.

Comments

Global Options

The following options are available to this plugin, adjustable in the plugin's configuration file; plugin_config.py.

Option Name	Description	Default	Expected Range
threshold	Detection threshold: Higher value results in less detections being output, but of higher reliability.	1.5	-10.0 to 10.0
min_speech	The minimum length that a speech segment must contain in order to be scored/analyzed for gender.	2.0	1.0 - 4.0
sad_threshold	SAD threshold for determining the audio to be used in meteadata extraction	1.0	-5.0 - 6.0