rsa-hash-commercial (Reverse Audio Search)

Version Changelog

Plugin Version	Change
v1.0.0	Initial plugin release, with OLIVE 6.1.0

Description

The goal of reverse audio search is to identify whether a given audio snippet is identical or near-identical or a sub-audio of any audio in a database of audios, or whether a sub-audio of the test audio exists in the database. The database of audios to be 'indexed' are enrolled in the plugin. A test audio is then compared to all indexed audios. The plugin uses the speaker embedding time delay neural network (TDNN) to represent a compact hash of each 10 seconds of audio in a file, and this is associated with a corresponding time stamp. Comparison scores can be output between 0 and 100, with higher being more similar. With a default threshold of 90, only comparison scores equal to or above this threshold will be output.

Comparison of audios operates using a sliding window of 10 seconds with an overlap of 5 seconds to generate an array of embeddings per audio file. The array of embeddings from one audio file is compared using cosine distance to those of another audio file. A sequence of high scoring embeddings is used to localize detections of overlap. The timing location is output along with the comparison score indicating the estimated timestamp of the analyzed file within or relevant to the indexed audio.

Domains

audio-v1
- A general all-purpose domain used for standard audio comparisons. The internal embedding representation has been exposed to background noise, reverb, compression, and music aiming to provide some robustness to these artifacts during comparisons.

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

REGION_SCORER – Score all submitted audio, returning regions within the submitted audio that overlaps with indexed audio files, as well as the comparison score.
- RegionScorerRequest
CLASS_MODIFIER – Enroll or 'index' new audios into the database ready for reverse audio search of analysis audios.
- ClassModificationRequest
- ClassRemovalRequest

Compatibility

OLIVE 6.0+

Limitations

Known or potential limitations of the plugin are outlined below.

Detection Accuracy

The plugin utilizes a robust TDNN from our speaker recognition plugin. This will inherently focus on speech as opposed to all sounds. Therefore, one can expect speech-dense files to have better detection accuracy that non-speech files, however, this has not yet been validated.

Sub-audio Detections

With a 10 second sliding window, and a 5 second shift, it is possible that the resolution for the comparison is not sufficient enough for some tasks, however, this has not been investigated.

Computation

The length of audios indexed as well as being analyzed will directly impact the speed of the analysis. Particularly when two long audios are compared, the computation will increase significantly over the comparison of two short audios, or a short vs long audio.

Comments

No speech detection is applied in this plugin and all audio is processed. This is not usually the case for speaker embeddings and may therefore exhibit some artifacts in comparisons. The extent of this possibility has not been explored.

Global Options

The following options are available to this plugin, adjustable in the plugin's configuration file; plugin_config.py.

Option Name	Description	Default	Expected Range
threshold	Detection threshold: Higher value results in less detections being output, but of higher reliability.	90.0	0.0 to 100.0