Skip to content

odv-fixedClass-commercial (Object Detection Video)

Version Changelog

Plugin Version Change
v1.0.0 Initial plugin, released with OLIVE 6.1.0

Description

Object Detection Video plugins process an input video and attempt to localize one or more instances of known object classes within frames. If an object is detected, a 'bounding box' highlighting the object is output, along with an associated confidence score informing how likely this box is to be an instance of the respective object, and an associated start and end time region.

Domains

  • 80objects-v1
    • A general purpose video object detection domain capable of detecting 80 object classes: 'aeroplane', 'apple', 'backpack', 'banana', 'baseball bat', 'baseball glove', 'bear', 'bed', 'bench', 'bicycle', 'bird', 'boat', 'book', 'bottle', 'bowl', 'broccoli', 'bus', 'cake', 'car', 'carrot', 'cat', 'cell phone', 'chair', 'clock', 'cow', 'cup', 'diningtable', 'dog', 'donut', 'elephant', 'fire hydrant', 'fork', 'frisbee', 'giraffe', 'hair drier', 'handbag', 'horse', 'hot dog', 'keyboard', 'kite', 'knife', 'laptop', 'microwave', 'motorbike', 'mouse', 'orange', 'oven', 'parking meter', 'person', 'pizza', 'pottedplant', 'refrigerator', 'remote', 'sandwich', 'scissors', 'sheep', 'sink', 'skateboard', 'skis', 'snowboard', 'sofa', 'spoon', 'sports ball', 'stop sign', 'suitcase', 'surfboard', 'teddy bear', 'tennis racket', 'tie', 'toaster', 'toilet', 'toothbrush', 'traffic light', 'train', 'truck', 'tvmonitor', 'umbrella', 'vase', 'wine glass', 'zebra'.

Inputs

A video file to process.

Outputs

Object Detection Video plugins are 'bounding box' scorers - the output of a bounding box scorer is an object class, a corresponding score, and 4 points associated with this class and score that attempt to localize the detected object class within the video frame.

That output looks like this:

    <file> <obj class> <score> (<x1>, <y1>, <x2>, <y2>)  (<start_seconds>, <end_seconds>)

Where the bounding box itself is defined by the four coordinates in parentheses:

    (Upper Left: x1, y1    |    Lower Right: x2, y2)

An example output could look like this:

    input_video.mp4 cat 0.9974257349967957 (154, 78, 657, 745) (978.31, 978.84)

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to go to additional implementation details below.

  • BOUNDING_BOX_SCORER – Score all submitted images or videos, returning labeled bounding box regions within the image frames, or labeled bounding box regions with an associated start and end time region if scoring video files.

Compatibility

OLIVE 6.1+

Limitations

The plugin currently does not support enrollment of new object classes.

Due to the intensity of resources required for processing videos, this plugin has a few limitations or behaviors that need to be considered.

Large Video Files

When a video file is opened and decoded into individual frames in memory, it can expand in size by considerable amounts. Because of this expansion, care should be taken to minimize other overhead when processing video files - such as by submitting video files for scoring via a file path instead of as a serialized buffer whenever possible. Realistic expectations should be held when attempting to process large video files when available memory is limited. Please plan on making a minimum of 16GB of memory available for video processing; ideally more for larger files.

Resolution Scaling

The current crop of OLIVE video processing plugins do not process video at full resolution - as the video files are opened, they are rescaled to 640 x 480 pixel resolution, and processed at this size. Our internal testing has shown this does not significantly degrade performance with these plugins, but drastically reduces required memory resources and improves our processing capabilities as a result. Note that there is currently no retention of the original aspect ratio, so some files, such as those with a very wide, very square, or portrait-orientation aspect ratio may not be processed exactly as expected due to scaling to 640 x 480 exactly.

Frame Rate (vs Temporal Resolution)

Processing every individual video frame at the videos native frame rate is enormously expensive. To avoid this resource cost and improve the processing speed and reduce the resource requirements of running these plugins, plugins currently process 1 frame per second. This limits the precision of the start and end timestamps for object regions, and makes it possible, though unlikely, for very quickly appearing/disappearing objects to be missed.

Comments

GPU Support

Please refer to the OLIVE GPU Installation and Support documentation page for instructions on how to enable and configure GPU capability in supported plugins. By default this plugin will run on CPU only.

Bounding Box Options

The following region scoring options are available to this plugin, adjustable in the plugin's configuration file; plugin_config.py.

Option Name Description Default Expected Range
threshold Threshold for video object detection. The higher the threshold is, the fewer objects will be returned 0.0 0.0 - 1.0
frame_interval Time in seconds between consecutive frames to be processed. 1.0 0.1 - 10.0

If you find this plugin to not perform adequately for your data conditions, or have a specific use case, please get in touch with SRI to discuss how the plugin can be tuned for optimal performance on your data.