-
-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Bug title
Multi-image uploads are ignored
Describe the bug
Describe the bug
When a user uploads multiple images along with a text prompt, the pipeline only processes the very first image, ignoring all subsequent ones. The model's response is based solely on the content of the first image.
This issue occurs when the selected model is identified by the pipeline as being capable of image generation (e.g., gemini-2.5-flash-image-preview, or custom models whose names trigger the image generation logic).
Steps to reproduce
To Reproduce
Steps to reproduce the behavior:
- Go to Open WebUI and select a model that the pipeline treats as an image generation model.
- Start a new chat.
- Upload two or more different images (e.g., a cat and a dog).
- In the text prompt, ask a question that requires knowledge of all images, such as: "Describe the animals in all of these pictures and compare them."
- Submit the prompt.
- Observe that the model's response only references the first image uploaded (the cat).
Expected behavior
The pipeline should process all uploaded images and include them in the payload sent to the Gemini API. The model's response should reflect an understanding of all the images provided in the prompt.
Thank you for your excellent work on this powerful tool
Environment
Environment:
- Open WebUI Version: v0.6.28
- Google Gemini Pipeline Version: 1.5.2
- Model(s) used: Any model that _check_image_generation_support identifies as true. This includes gemini-2.5-flash-image-preview and potentially custom-named models.
Additional context
Root Cause Analysis
The issue stems from two distinct processing logics within the pipe method:
-
General Multimodal Logic: This path, handled by _prepare_content, correctly processes the full chat history and can handle multiple images in a single turn. It iterates through all content parts and builds a comprehensive contents array.
-
Image Generation/Editing Logic: This path is triggered if _check_image_generation_support(model_id) returns true. This logic is highly specialized and assumes a text-to-image or single image-to-image task. It intentionally ignores the conversation history and calls the _find_image
helper function to locate a base image for editing.
The core problem is the _find_image function. It is designed to search messages and return as soon as it finds the first image. As a result, even if multiple images are present in the user's prompt, only the first one is ever returned and added to the API request payload. All other
images are discarded.
This behavior is incorrect for multimodal queries that involve multiple images but are directed at a model that happens to also have image generation capabilities.