sum-img-llm-commercial (Summarization of image inputs)

Version Changelog

Plugin Version	Change
v1.0.0	Initial plugin release with OLIVE 6.1.0.

Description

This Image Summarization plugin creates a short summary of the provided image input(s). It transform raw image inputs into a short paragraph with the aim of preserving key information contained within the input image.

This plugin performs this task using a large language model (LLM) external to the plugin, either run in OLIVE or hosted separately (advanced users).

Domains

multi-v1
- Uses an LLM to create a summary of the input image. The accuracy as well as language support varies between different LLMs.

Inputs

An image file to process.

Outputs

The output format for Summarization plugins is simply text.

Functionality (Traits)

The functions of this plugin are defined by its Traits and implemented API messages. A list of these Traits is below, along with the corresponding API messages for each. Click the message name below to be brought to additional implementation details below.

CompositeScorer – Plugin accepts and analyzes one or multiple inputs, and outputs a combination of known Scorer result types as output. In the case of the image summarization plugin, images are provided as inputs, with the expectation that the output will be text strings with concise summaries/descriptions of the input images.
- CompositeScorerRequest
- CompositeScorerResult
  - TextTransformationResult

Compatibility

OLIVE 6.1+

Limitations

This plugin is based on an LLM. Its performance critically depends therefore on the LLM's performance on the task, in particular for low-resource languages. There is often a correlation between the number of parameters of an LLM and its performance on complex tasks. Hence, larger LLMs tend to perform better. However, they also tend to use more resources and have lower processing speed given the same hardware.

We tested this plugin using Google's Gemma-3-4B-it-qat-q4_0-gguf and Gemma-3-12B-it-qat-q4_0-gguf. Performance with other LLMs may vary.

Comments

Large Language Model (LLM) Required

This plugin relies on a Large Language Model (LLM) for the heavy lifting of its task. This plugin can only be used with an appropriately configured OLIVE server that has been started with the LLM server active. See the LLM Configuration Documentation for more information, and refer to the Martini documentation to make sure the appropriate startup procedure is followed.

GPU Support

Please refer to the OLIVE GPU Installation and Support documentation page for instructions on how to enable and configure GPU capability in supported plugins. This plugin will run on CPU only; however, the LLM used in OLIVE may use GPU by default.

Text Transformation Options

The following region scoring options are available to this plugin, adjustable in the plugin's configuration file; plugin_config.py.

Option Name	Description	Default
llm_base_url	LLM base URL of an OpenAI API compatible LLM server (such as llama-server, vLLM, or hosted LLMs such as OpenAI).	http://127.0.0.1:5007/v1
model	Name of the LLM model to use.	gemma3-4b
api_key	API key/token to use for the LLM server, if required.	token

If you find this plugin to not perform adequately for your data conditions, or have a specific use case, please get in touch with SRI to discuss how the plugin can be tuned for optimal performance on your data.