Skip to content

red-transform-v1 (Redaction - Voice Transformation)

Version Changelog

Plugin Version Change
v1.0.0 Initial plugin release with OLIVE 5.2.0

Description

The red-transform plugin operates very similarly to the red-tone-v1 plugin, in that it alters the selected regions of the audio passed to it. In the red-tone-v1 plugin, this alteration was replacing it with a 'bleep' tone; in this new plugin the regions of submitted audio are instead passed through voice transformation with the goal of obscuring a speaker's identity. Those with very close knowledge of the original speaker may still be able to identify the transformed speaker through pronunciation, accent, and other unique identifiers that may not be sufficiently disguised by the transformation algorithm(s).

Domains

Several pre-set domains are available. Ideally, the audio containing the speaker to be obscured is a clean close microphone and users would select domain clean_close-v1. As the input audio quality decreases and/or distance between the speaker and the microphone increases, users will likely move down the domain list to find a balance between obscurement and intelligibility, with degraded_distant-v1.

These domains were chosen as likely-useful compromises along the gammut of speed, intelligibility, and identity-obscurement. If you find your use case is typically not served well by either of these domains, and need something somewhere between the two, or maybe even less aggressive than the one included for degraded or distant speech, please get in touch with us and we can configure a new domain to better match your expected data, or to provide additional options to have on the shelf if desired.

Moving further down the domain list (e.g. selecting degraded_distant-v1 over clean_close-v1) also lessens robustness to reverse engineering, but also process significantly faster than more aggressive domains.

The included domains:

  • clean_close-v1
    • This domain offers the highest amount of speaker obscurement, but suffers the most with respect to intelligibility in degraded audio conditions (noise, distant mic). It is the most aggressive regarding the transformations performed, is the slowest to process, and is designed for use in very clean audio conditions.
  • degraded_distant-v1
    • This domain backs off some of the transformation being applied in an attempt to maximize intelligibility in noisy, degraded, and/or distant speech recordings or conditions, while still providing acceptable levels of speaker identity masking. It is significantly faster than the domain above.

Inputs

For redaction-voice transformation, an audio file and time-annotated regions corresponding to regions of speech to be obscured is required.

Outputs

The output of red-transform-v1 is an audio file that has been transformed according the domain specifications with the goal of creating audio where the speech is intelligible, but identifying features of the voice that could lead to recognizing the speaker have been obscured.

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

  • AUDIO_CONVERTER – Take an audio file or buffer, potentially perform some modification(s) on it, and return audio to the requestor, this can be the same audio untouched, a modified version of this audio, or completely different audio, depending on the purpose of the plugin.

Compatibility

OLIVE 5.2+

Limitations

Intelligibility & Audio conditions

This plugin is very sensitive to audio conditions and can struggle to produce intelligible speech in degraded or distant microphone conditions. This is why multiple domains were provided, giving users the options of performing less intense transformation in difficult audio conditions where identity will already be partially obscured.

Speed

This plugin can be quite resource-intensive when it comes to processing speed. There can be significant speed differences between domains (e.g. domain degraded_distant-v1 is much faster than domain clean_close-v1, due to less processing). The clean_close-v1 domain may be near real-time processing speed, depending on hardware used for the transformation; and other domains should increase in speed from there.

Usage

This plugin was designed to be used in concert with SRI's Nightingale UI, and the specially designed "Speaker Redaction" tools within. For instructions on using the Speaker Redaction module, refer to the Speaker Redaction documentation.