Skip to content

Commit d77ae24

Browse files
authored
Support using ColPali library to compute embedding (#796)
* Copy image_search folder to image_search_colpali as diff base * Update ColPali image search example Add PREFER_GRPC config at the top of main.py for easy switching between gRPC (default, port 6334) and HTTP (port 6333) Qdrant connections via environment variable. Frontend (App.jsx): Use window.location.hostname for API and image URLs, so devices on the same LAN can access the backend and images when the frontend is served on 0.0.0.0. This enables seamless LAN access to search and image results. * Move functon to functions.py and use multi-vector search Move example to image_search/colpali_main.py * Optimize ColPali with functools.cache and add colpali feature - Use @functools.cache for model caching instead of manual dict - Add 'colpali' optional dependency separate from 'embeddings' - Fix dimension detection and LAN access for frontend * clean up for examples * clean up for Colpali functions * add troubleshooting notice to `README.md` * run the model on GPU * use stronger type and cleanup
1 parent 86083a2 commit d77ae24

File tree

8 files changed

+430
-17
lines changed

8 files changed

+430
-17
lines changed

examples/image_search/.env

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
COCOINDEX_DATABASE_URL="postgresql://cocoindex:cocoindex@127.0.0.1:5432/cocoindex"
1+
export COCOINDEX_DATABASE_URL="postgres://cocoindex:cocoindex@localhost/cocoindex"

examples/image_search/README.md

Lines changed: 63 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,40 @@
11
# Image Search with CocoIndex
22
[![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex)
33

4-
We will build live image search and query it with natural language, using multimodal embedding model. We are going use CocoIndex to build real-time indexing flow. During running, you can add new files to the folder and it only process changed files and will be indexed within a minute.
4+
We will build live image search and query it with natural language, using multimodal embedding models. We use CocoIndex to build real-time indexing flow. During running, you can add new files to the folder and it only processes changed files, indexing them within a minute.
55

66
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
77

88
<img width="1105" alt="cover" src="https://github.com/user-attachments/assets/544fb80d-c085-4150-84b6-b6e62c4a12b9" />
99

10+
## Two Implementation Options
11+
12+
This example provides two different image search implementations:
13+
14+
### 1. CLIP-based Search (`main.py`)
15+
- **Model**: CLIP ViT-L/14 (OpenAI)
16+
- **Embedding**: Single-vector embeddings (768 dimensions)
17+
- **Search**: Standard cosine similarity
18+
19+
### 2. ColPali-based Search (`colpali_main.py`)
20+
- **Model**: ColPali (Contextual Late-interaction over Patches)
21+
- **Embedding**: Multi-vector embeddings with late interaction
22+
- **Search**: MaxSim scoring for optimal patch-level matching
23+
- **Performance**: Better for document/text-in-image search
1024

1125
## Technologies
1226
- CocoIndex for ETL and live update
13-
- CLIP ViT-L/14 - Embeddings Model for images and query
14-
- Qdrant for Vector Storage
15-
- FastApi for backend
16-
- Ollama (Optional) for generating image captions using `gemma3`.
27+
- **CLIP ViT-L/14** OR **ColPali** - Multimodal embedding models
28+
- Qdrant for Vector Storage (with multi-vector support for ColPali)
29+
- FastAPI for backend
30+
- Ollama (Optional) for generating image captions
1731

1832
## Setup
19-
- Make sure Postgres and Qdrant are running
33+
- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
34+
35+
- Make sure Qdrant is running
2036
```
2137
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
22-
export COCOINDEX_DATABASE_URL="postgres://cocoindex:cocoindex@localhost/cocoindex"
2338
```
2439

2540
## (Optional) Run Ollama
@@ -32,21 +47,59 @@ export OLLAMA_MODEL="gemma3" # Optional, for caption generation
3247
```
3348

3449
## Run the App
50+
51+
### Option 1: CLIP-based Search
3552
- Install dependencies:
3653
```
3754
pip install -e .
3855
```
3956

40-
- Run Backend
57+
- Run CLIP Backend:
4158
```
4259
uvicorn main:app --reload --host 0.0.0.0 --port 8000
4360
```
4461

45-
- Run Frontend
62+
### Option 2: ColPali-based Search
63+
- Install dependencies:
64+
```
65+
pip install -e .
66+
pip install 'cocoindex[colpali]' # Adds ColPali support
67+
```
68+
69+
- Configure model (optional):
70+
```sh
71+
export COLPALI_MODEL="vidore/colpali-v1.2" # Default model
72+
```
73+
74+
- Run ColPali Backend:
75+
```
76+
uvicorn colpali_main:app --reload --host 0.0.0.0 --port 8000
77+
```
78+
79+
Note that recent Nvidia GPUs (RTX 5090) will not work with the Stable pytorch version up to 2.7.1
80+
81+
If you get this error:
82+
83+
```
84+
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90 compute_37.
85+
```
86+
87+
You can install the nightly pytorch build here: https://pytorch.org/get-started/locally/
88+
89+
```sh
90+
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129
91+
```
92+
93+
### Frontend (same for both)
94+
- Run Frontend:
4695
```
4796
cd frontend
4897
npm install
4998
npm run dev
5099
```
51100

52-
Go to `http://localhost:5174` to search.
101+
Go to `http://localhost:5173` to search. The frontend works with both backends identically.
102+
103+
## Performance Notes
104+
- **CLIP**: Faster, good for general image-text matching
105+
- **ColPali**: More accurate for document images and text-heavy content, supports multi-vector late interaction for better precision

examples/image_search/colpali_main.py

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
import datetime
2+
import os
3+
from contextlib import asynccontextmanager
4+
from typing import Any
5+
6+
import cocoindex
7+
from dotenv import load_dotenv
8+
from fastapi import FastAPI, Query
9+
from fastapi.middleware.cors import CORSMiddleware
10+
from fastapi.staticfiles import StaticFiles
11+
from qdrant_client import QdrantClient
12+
13+
14+
# --- Config ---
15+
16+
# Use GRPC
17+
QDRANT_URL = os.getenv("QDRANT_URL", "localhost:6334")
18+
PREFER_GRPC = os.getenv("QDRANT_PREFER_GRPC", "true").lower() == "true"
19+
20+
# Use HTTP
21+
# QDRANT_URL = os.getenv("QDRANT_URL", "localhost:6333")
22+
# PREFER_GRPC = os.getenv("QDRANT_PREFER_GRPC", "false").lower() == "true"
23+
24+
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434/")
25+
QDRANT_COLLECTION = "ImageSearchColpali"
26+
COLPALI_MODEL_NAME = os.getenv("COLPALI_MODEL", "vidore/colpali-v1.2")
27+
print(f"📐 Using ColPali model {COLPALI_MODEL_NAME}")
28+
29+
30+
# Create ColPali embedding function using the class-based pattern
31+
colpali_embed = cocoindex.functions.ColPaliEmbedImage(model=COLPALI_MODEL_NAME)
32+
33+
34+
@cocoindex.transform_flow()
35+
def text_to_colpali_embedding(
36+
text: cocoindex.DataSlice[str],
37+
) -> cocoindex.DataSlice[list[list[float]]]:
38+
"""
39+
Embed text using a ColPali model, returning multi-vector format.
40+
This is shared logic between indexing and querying, ensuring consistent embeddings.
41+
"""
42+
return text.transform(
43+
cocoindex.functions.ColPaliEmbedQuery(model=COLPALI_MODEL_NAME)
44+
)
45+
46+
47+
@cocoindex.flow_def(name="ImageObjectEmbeddingColpali")
48+
def image_object_embedding_flow(
49+
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
50+
) -> None:
51+
data_scope["images"] = flow_builder.add_source(
52+
cocoindex.sources.LocalFile(
53+
path="img", included_patterns=["*.jpg", "*.jpeg", "*.png"], binary=True
54+
),
55+
refresh_interval=datetime.timedelta(minutes=1),
56+
)
57+
img_embeddings = data_scope.add_collector()
58+
with data_scope["images"].row() as img:
59+
ollama_model_name = os.getenv("OLLAMA_MODEL")
60+
if ollama_model_name is not None:
61+
# If an Ollama model is specified, generate an image caption
62+
img["caption"] = flow_builder.transform(
63+
cocoindex.functions.ExtractByLlm(
64+
llm_spec=cocoindex.llm.LlmSpec(
65+
api_type=cocoindex.LlmApiType.OLLAMA, model=ollama_model_name
66+
),
67+
instruction=(
68+
"Describe the image in one detailed sentence. "
69+
"Name all visible animal species, objects, and the main scene. "
70+
"Be specific about type, color, and notable features. "
71+
"Mention what each animal is doing."
72+
),
73+
output_type=str,
74+
),
75+
image=img["content"],
76+
)
77+
img["embedding"] = img["content"].transform(colpali_embed)
78+
79+
collect_fields = {
80+
"id": cocoindex.GeneratedField.UUID,
81+
"filename": img["filename"],
82+
"embedding": img["embedding"],
83+
}
84+
85+
if ollama_model_name is not None:
86+
print(f"Using Ollama model '{ollama_model_name}' for captioning.")
87+
collect_fields["caption"] = img["caption"]
88+
else:
89+
print(f"No Ollama model '{ollama_model_name}' found — skipping captioning.")
90+
91+
img_embeddings.collect(**collect_fields)
92+
93+
img_embeddings.export(
94+
"img_embeddings",
95+
cocoindex.targets.Qdrant(collection_name=QDRANT_COLLECTION),
96+
primary_key_fields=["id"],
97+
)
98+
99+
100+
@asynccontextmanager
101+
async def lifespan(app: FastAPI) -> None:
102+
load_dotenv()
103+
cocoindex.init()
104+
image_object_embedding_flow.setup(report_to_stdout=True)
105+
106+
app.state.qdrant_client = QdrantClient(url=QDRANT_URL, prefer_grpc=PREFER_GRPC)
107+
108+
# Start updater
109+
app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow)
110+
app.state.live_updater.start()
111+
112+
yield
113+
114+
115+
# --- FastAPI app for web API ---
116+
app = FastAPI(lifespan=lifespan)
117+
118+
app.add_middleware(
119+
CORSMiddleware,
120+
allow_origins=["*"],
121+
allow_credentials=True,
122+
allow_methods=["*"],
123+
allow_headers=["*"],
124+
)
125+
# Serve images from the 'img' directory at /img
126+
app.mount("/img", StaticFiles(directory="img"), name="img")
127+
128+
129+
# --- Search API ---
130+
@app.get("/search")
131+
def search(
132+
q: str = Query(..., description="Search query"),
133+
limit: int = Query(5, description="Number of results"),
134+
) -> Any:
135+
# Get the multi-vector embedding for the query
136+
query_embedding = text_to_colpali_embedding.eval(q)
137+
print(
138+
f"🔍 Query multi-vector shape: {len(query_embedding)} tokens x {len(query_embedding[0]) if query_embedding else 0} dims"
139+
)
140+
141+
# Search in Qdrant with multi-vector MaxSim scoring using query_points API
142+
search_results = app.state.qdrant_client.query_points(
143+
collection_name=QDRANT_COLLECTION,
144+
query=query_embedding, # Multi-vector format: list[list[float]]
145+
using="embedding", # Specify the vector field name
146+
limit=limit,
147+
with_payload=True,
148+
)
149+
150+
print(f"📈 Found {len(search_results.points)} results with MaxSim scoring")
151+
152+
return {
153+
"results": [
154+
{
155+
"filename": result.payload["filename"],
156+
"score": result.score,
157+
"caption": result.payload.get("caption"),
158+
}
159+
for result in search_results.points
160+
]
161+
}

examples/image_search/frontend/src/App.jsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import React, { useState } from 'react';
22

3-
const API_URL = 'http://localhost:8000/search'; // Adjust this to your backend search endpoint
3+
const API_URL = `http://${window.location.hostname}:8000/search`;
44

55
export default function App() {
66
const [query, setQuery] = useState('');
@@ -42,7 +42,7 @@ export default function App() {
4242
{results.length === 0 && !loading && <div>No results</div>}
4343
{results.map((result, idx) => (
4444
<div key={idx} className="result-card">
45-
<img src={`http://localhost:8000/img/${result.filename}`} alt={result.filename} className="result-img" />
45+
<img src={`http://${window.location.hostname}:8000/img/${result.filename}`} alt={result.filename} className="result-img" />
4646
<div className="score">Score: {result.score?.toFixed(3)}</div>
4747
</div>
4848
))}

examples/image_search/frontend/vite.config.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import react from '@vitejs/plugin-react';
44
export default defineConfig({
55
plugins: [react()],
66
server: {
7+
host: true, // Allow LAN access
78
port: 5173,
89
open: true,
910
},

examples/image_search/pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
[project]
22
name = "image-search"
33
version = "0.1.0"
4-
description = "Simple example for cocoindex: build embedding index based on images."
4+
description = "Image search examples for cocoindex: CLIP and ColPali-based embedding."
55
requires-python = ">=3.11"
66
dependencies = [
7-
"cocoindex>=0.1.75",
7+
"cocoindex[colpali]>=0.1.75",
88
"python-dotenv>=1.0.1",
99
"fastapi>=0.100.0",
1010
"torch>=2.0.0",
11-
"transformers>=4.29.0",
11+
"transformers>=4.29.0", # For CLIP model in main.py
1212
"qdrant-client>=1.14.2",
1313
"uvicorn>=0.34.3",
1414
]

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,11 @@ features = ["pyo3/extension-module"]
3232
dev = ["pytest", "pytest-asyncio", "ruff", "mypy", "pre-commit"]
3333

3434
embeddings = ["sentence-transformers>=3.3.1"]
35+
colpali = ["colpali-engine"]
3536

3637
# We need to repeat the dependency above to make it available for the `all` feature.
3738
# Indirect dependencies such as "cocoindex[embeddings]" will not work for local development.
38-
all = ["sentence-transformers>=3.3.1"]
39+
all = ["sentence-transformers>=3.3.1", "colpali-engine"]
3940

4041
[tool.mypy]
4142
python_version = "3.11"

0 commit comments

Comments
 (0)