Skip to content

Commit 8f45c79

Browse files
badmonster0georgeh0
authored andcommitted
Move functon to functions.py and use multi-vector search
Move example to image_search/colpali_main.py
1 parent fc964a8 commit 8f45c79

File tree

21 files changed

+303
-1765
lines changed

21 files changed

+303
-1765
lines changed

examples/image_search/README.md

Lines changed: 46 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,33 @@
11
# Image Search with CocoIndex
22
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
33

4-
We will build live image search and query it with natural language, using multimodal embedding model. We are going use CocoIndex to build real-time indexing flow. During running, you can add new files to the folder and it only process changed files and will be indexed within a minute.
4+
We will build live image search and query it with natural language, using multimodal embedding models. We use CocoIndex to build real-time indexing flow. During running, you can add new files to the folder and it only processes changed files, indexing them within a minute.
55

66
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
77

88
<img width="1105" alt="cover" src="https://github.com/user-attachments/assets/544fb80d-c085-4150-84b6-b6e62c4a12b9" />
99

10+
## Two Implementation Options
11+
12+
This example provides two different image search implementations:
13+
14+
### 1. CLIP-based Search (`main.py`)
15+
- **Model**: CLIP ViT-L/14 (OpenAI)
16+
- **Embedding**: Single-vector embeddings (768 dimensions)
17+
- **Search**: Standard cosine similarity
18+
19+
### 2. ColPali-based Search (`colpali_main.py`)
20+
- **Model**: ColPali (Contextual Late-interaction over Patches)
21+
- **Embedding**: Multi-vector embeddings with late interaction
22+
- **Search**: MaxSim scoring for optimal patch-level matching
23+
- **Performance**: Better for document/text-in-image search
1024

1125
## Technologies
1226
- CocoIndex for ETL and live update
13-
- CLIP ViT-L/14 - Embeddings Model for images and query
14-
- Qdrant for Vector Storage
15-
- FastApi for backend
16-
- Ollama (Optional) for generating image captions using `gemma3`.
27+
- **CLIP ViT-L/14** OR **ColPali** - Multimodal embedding models
28+
- Qdrant for Vector Storage (with multi-vector support for ColPali)
29+
- FastAPI for backend
30+
- Ollama (Optional) for generating image captions
1731

1832
## Setup
1933
- Make sure Postgres and Qdrant are running
@@ -32,21 +46,45 @@ export OLLAMA_MODEL="gemma3" # Optional, for caption generation
3246
```
3347

3448
## Run the App
49+
50+
### Option 1: CLIP-based Search
3551
- Install dependencies:
3652
```
3753
pip install -e .
3854
```
3955

40-
- Run Backend
56+
- Run CLIP Backend:
4157
```
4258
uvicorn main:app --reload --host 0.0.0.0 --port 8000
4359
```
4460

45-
- Run Frontend
61+
### Option 2: ColPali-based Search
62+
- Install dependencies:
63+
```
64+
pip install -e .
65+
pip install 'cocoindex[embeddings]' # Adds ColPali and sentence-transformers support
66+
```
67+
68+
- Configure model (optional):
69+
```sh
70+
export COLPALI_MODEL="vidore/colpali-v1.2" # Default model
71+
```
72+
73+
- Run ColPali Backend:
74+
```
75+
uvicorn colpali_main:app --reload --host 0.0.0.0 --port 8000
76+
```
77+
78+
### Frontend (same for both)
79+
- Run Frontend:
4680
```
4781
cd frontend
4882
npm install
4983
npm run dev
5084
```
5185

52-
Go to `http://localhost:5174` to search.
86+
Go to `http://localhost:5173` to search. The frontend works with both backends identically.
87+
88+
## Performance Notes
89+
- **CLIP**: Faster, good for general image-text matching
90+
- **ColPali**: More accurate for document images and text-heavy content, supports multi-vector late interaction for better precision

examples/image_search_colpali/main.py renamed to examples/image_search/colpali_main.py

Lines changed: 26 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
import datetime
2-
import functools
3-
import io
42
import os
53
from contextlib import asynccontextmanager
6-
from typing import Any, Literal
4+
from typing import Any
75

86
import cocoindex
97
import numpy as np
@@ -13,7 +11,6 @@
1311
from fastapi.staticfiles import StaticFiles
1412
from PIL import Image
1513
from qdrant_client import QdrantClient
16-
from colpali_engine.models import ColPali, ColPaliProcessor
1714

1815

1916
# --- Config ---
@@ -29,76 +26,24 @@
2926
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434/")
3027
QDRANT_COLLECTION = "ImageSearchColpali"
3128
COLPALI_MODEL_NAME = os.getenv("COLPALI_MODEL", "vidore/colpali-v1.2")
32-
COLPALI_MODEL_DIMENSION = 1031 # Set to match ColPali's output
29+
print(f"📐 Using ColPali model {COLPALI_MODEL_NAME}")
3330

34-
# --- ColPali model cache and embedding functions ---
35-
_colpali_model_cache = {}
3631

32+
# Create ColPali embedding function using the class-based pattern
33+
colpali_embed = cocoindex.functions.ColPaliEmbedImage(model=COLPALI_MODEL_NAME)
3734

38-
def get_colpali_model(model: str = COLPALI_MODEL_NAME):
39-
global _colpali_model_cache
40-
if model not in _colpali_model_cache:
41-
print(f"Loading ColPali model: {model}")
42-
_colpali_model_cache[model] = {
43-
"model": ColPali.from_pretrained(model),
44-
"processor": ColPaliProcessor.from_pretrained(model),
45-
}
46-
return _colpali_model_cache[model]["model"], _colpali_model_cache[model][
47-
"processor"
48-
]
49-
50-
51-
def colpali_embed_image(
52-
img_bytes: bytes, model: str = COLPALI_MODEL_NAME
53-
) -> list[float]:
54-
from PIL import Image
55-
import torch
56-
import io
57-
58-
colpali_model, processor = get_colpali_model(model)
59-
pil_image = Image.open(io.BytesIO(img_bytes)).convert("RGB")
60-
inputs = processor.process_images([pil_image])
61-
with torch.no_grad():
62-
embeddings = colpali_model(**inputs)
63-
pooled_embedding = embeddings.mean(dim=-1)
64-
result = pooled_embedding[0].cpu().numpy() # [1031]
65-
return result.tolist()
66-
67-
68-
def colpali_embed_query(query: str, model: str = COLPALI_MODEL_NAME) -> list[float]:
69-
import torch
70-
import numpy as np
71-
72-
colpali_model, processor = get_colpali_model(model)
73-
inputs = processor.process_queries([query])
74-
with torch.no_grad():
75-
embeddings = colpali_model(**inputs)
76-
pooled_embedding = embeddings.mean(dim=-1)
77-
query_tokens = pooled_embedding[0].cpu().numpy() # [15]
78-
target_length = COLPALI_MODEL_DIMENSION
79-
result = np.zeros(target_length, dtype=np.float32)
80-
result[: min(len(query_tokens), target_length)] = query_tokens[:target_length]
81-
return result.tolist()
82-
83-
84-
# --- End ColPali embedding functions ---
8535

86-
87-
def embed_query(text: str) -> list[float]:
88-
"""
89-
Embed the caption using ColPali model.
90-
"""
91-
return colpali_embed_query(text, model=COLPALI_MODEL_NAME)
92-
93-
94-
@cocoindex.op.function(cache=True, behavior_version=1, gpu=True)
95-
def embed_image(
96-
img_bytes: bytes,
97-
) -> cocoindex.Vector[cocoindex.Float32, Literal[COLPALI_MODEL_DIMENSION]]:
36+
@cocoindex.transform_flow()
37+
def text_to_colpali_embedding(
38+
text: cocoindex.DataSlice[str],
39+
) -> cocoindex.DataSlice[list[list[float]]]:
9840
"""
99-
Convert image to embedding using ColPali model.
41+
Embed text using a ColPali model, returning multi-vector format.
42+
This is shared logic between indexing and querying, ensuring consistent embeddings.
10043
"""
101-
return colpali_embed_image(img_bytes, model=COLPALI_MODEL_NAME)
44+
return text.transform(
45+
cocoindex.functions.ColPaliEmbedQuery(model=COLPALI_MODEL_NAME)
46+
)
10247

10348

10449
@cocoindex.flow_def(name="ImageObjectEmbeddingColpali")
@@ -131,7 +76,7 @@ def image_object_embedding_flow(
13176
),
13277
image=img["content"],
13378
)
134-
img["embedding"] = img["content"].transform(embed_image)
79+
img["embedding"] = img["content"].transform(colpali_embed)
13580

13681
collect_fields = {
13782
"id": cocoindex.GeneratedField.UUID,
@@ -189,24 +134,30 @@ def search(
189134
q: str = Query(..., description="Search query"),
190135
limit: int = Query(5, description="Number of results"),
191136
) -> Any:
192-
# Get the embedding for the query
193-
query_embedding = embed_query(q)
137+
# Get the multi-vector embedding for the query
138+
query_embedding = text_to_colpali_embedding.eval(q)
139+
print(
140+
f"🔍 Query multi-vector shape: {len(query_embedding)} tokens x {len(query_embedding[0]) if query_embedding else 0} dims"
141+
)
194142

195-
# Search in Qdrant
196-
search_results = app.state.qdrant_client.search(
143+
# Search in Qdrant with multi-vector MaxSim scoring using query_points API
144+
search_results = app.state.qdrant_client.query_points(
197145
collection_name=QDRANT_COLLECTION,
198-
query_vector=("embedding", query_embedding),
146+
query=query_embedding, # Multi-vector format: list[list[float]]
147+
using="embedding", # Specify the vector field name
199148
limit=limit,
200149
with_payload=True,
201150
)
202151

152+
print(f"📈 Found {len(search_results.points)} results with MaxSim scoring")
153+
203154
return {
204155
"results": [
205156
{
206157
"filename": result.payload["filename"],
207158
"score": result.score,
208159
"caption": result.payload.get("caption"),
209160
}
210-
for result in search_results
161+
for result in search_results.points
211162
]
212163
}

examples/image_search/frontend/vite.config.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import react from '@vitejs/plugin-react';
44
export default defineConfig({
55
plugins: [react()],
66
server: {
7+
host: true, // Allow LAN access
78
port: 5173,
89
open: true,
910
},

examples/image_search/pyproject.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,18 @@
11
[project]
22
name = "image-search"
33
version = "0.1.0"
4-
description = "Simple example for cocoindex: build embedding index based on images."
4+
description = "Image search examples for cocoindex: CLIP and ColPali-based embedding."
55
requires-python = ">=3.11"
66
dependencies = [
77
"cocoindex>=0.1.75",
88
"python-dotenv>=1.0.1",
99
"fastapi>=0.100.0",
1010
"torch>=2.0.0",
11-
"transformers>=4.29.0",
11+
"transformers>=4.29.0", # For CLIP model in main.py
1212
"qdrant-client>=1.14.2",
1313
"uvicorn>=0.34.3",
14+
"Pillow>=10.0.0", # For ColPali image processing
15+
"numpy>=1.24.0", # For ColPali embeddings
1416
]
1517

1618
[tool.setuptools]

examples/image_search_colpali/.env

Lines changed: 0 additions & 1 deletion
This file was deleted.

examples/image_search_colpali/README.md

Lines changed: 0 additions & 71 deletions
This file was deleted.

0 commit comments

Comments
 (0)