fdv-pyEmbed-v1 (Face Recognition Video)

Version Changelog

Plugin Version	Change
v1.0.0	Initial plugin, released with OLIVE 5.3.0

Description

Face Recognition Video plugins process an input video and attempt to localize in both space and time one or more faces within the video frame. If they are detected, a 'bounding box' highlighting the face is output, along with an associated confidence score informing how likely this box is to be a face, and an associated start and end time region.

Unlike Face Detection plugins, a Face Recognition plugin is only looking for the faces of enrolled persons of interest. If faces are detected that the system is not confident belong to one of the enrolled targets, they will not be reported.

Domains

multi-v1
- A general purpose video processing domain.

Inputs

An video file to process.

Enrollments

Face Recognition plugins allow for class modifications. A class modification is essentially the capability to enroll a class with sample(s) of a class's representation - in this case, an image of a person's face. A new enrollment is created with the first class modification, which consists of essentially sending the system an image sample from a person of interest. This enrollment can be augmented with subsequent class modification requests by adding more images with the same class label.

Note that currently only images can be used for face enrollment requests; it is not yet possible to enroll faces via video.

Outputs

Face Recognition Video plugins are 'bounding box' scorers - the output of a video-processing bounding box scorer is a class, a corresponding score, an associated start and end timestamp denoting when the bounding box is valid, and 4 points associated with this class, time, and score that attempt to localize the detected class (in this case a face) within the video frame.

That output looks like this:

    <file> <class> <score> (<x1>, <y1>, <x2>, <y2>) (<start_seconds>, <end_seconds>)

Where the bounding box itself is defined by the four coordinates grouped in parentheses:

    (Upper Left: x1, y1    |    Lower Right: x2, y2)

And the timestamps are the final two grouped numbers.

An example output could look like this:

    test-videos/input_video.mp4 Marcus 0.5474257349967957 (154, 78, 657, 745) (978.31, 978.84)

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

BOUNDING_BOX_SCORER (NOTE: Coming Soon) – Score all submitted images or videos, returning labeled bounding box regions within the image frames, or labeled bounding box regions with an associated start and end time region if scoring video files.
- BoundingBoxScorerRequest (NOTE: Coming Soon)

Compatibility

OLIVE 5.3+

Limitations

Due to the intensity of resources required for processing videos, this plugin has a few limitations or behaviors that need to be considered.

Large Video Files

When a video file is opened and decoded into individual frames in memory, it can expand in size by considerable amounts. Because of this expansion, care should be taken to minimize other overheards when processing video files - such as by submitting video files for scoring via a file path instead of as a serialized buffer whenever possible. Realistic expectations should be held when attempting to process large video files when available memory is limited. Please plan on making a minimum of 16GB of memory available for video processing; ideally more for larger files.

Resolution Scaling

The current crop of OLIVE video processing plugins do not process video at full resolution - as the video files are opened, they are rescaled to 640 x 480 pixel resolution, and processed at this size. Our internal testing has shown this does not significantly degrade performance with these plugins, but drastically reduces required memory resources and improves our processing capabilities as a result. Note that there is currently no retention of the original aspect ratio, so some files, such as those with a very wide, very square, or portrait-orientation aspect ratio may not be processed exactly as expected due to scaling to 640 x 480 exactly.

Frame Rate (vs Temporal Resolution)

Processing every individual video frame at the videos native frame rate is enormously expensive. To avoid this resource cost and improve the processing speed and reduce the resource requirements of running these plugins, plugins currently process 4 frames per second. This limits the precision of the start and end timestamps for face regions, and makes it possible, though unlikely, for very quickly appearing/disappearing faces to be missed.

Comments

Global Options

This plugin does not currently have user-configurable options, though it is possible for some performance tweaks and configuration changes to be made. If you find this plugin to not perform adequately for your data conditions, or have a specific use case, please get in touch with SRI to discuss how the plugin can be tuned for optimal performance on your data.