-
I am struggling with indexing newly inserted data without having to reindex the whole collection. Is there a way to add a vector index to newly inserted data only? I know I can set a selection when defining my listener, but it gets added under a new index version instead of continuing with the existing one. Additionally, performing a vector search on newly indexed items seems to exclude the older items. I'm using the following code to add my vector index: db = superduper(MONGO_URI_WITH_DB)
def get_model() -> SentenceTransformer:
return SentenceTransformer(
identifier="all-MiniLM-L6-v2",
model="all-MiniLM-L6-v2",
device="cpu",
encoder=vector(shape=(384,)),
postprocess=lambda x: x.tolist(),
predict_kwargs={"show_progress_bar": True},
)
def get_collection() -> Collection:
return Collection(collection_name)
def add_sitemap_url_vector_index() -> None:
model = get_model()
collection = get_collection()
selection = collection.find()
db.add(
VectorIndex(
identifier=index_name,
indexing_listener=Listener(
model=model,
key=field,
select=selection,
predict_kwargs={"max_chunk_size": 1000},
),
)
)
if __name__ == "__main__":
add_sitemap_url_vector_index() Any help is appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@martincpt you should be able to just add the new data, and the So simply add more data:
If the |
Beta Was this translation helpful? Give feedback.
@martincpt you should be able to just add the new data, and the
VectorIndex
will index only that data.So simply add more data:
db['docs'].insert_many(new_data)
If the
Listener
was configured on the"docs"
collection, then this data will be indexed only.