Skip to content

TensorCruncher/animal-image-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Animal Image Search ๐Ÿˆ ๐Ÿž๏ธ ๐Ÿ”Ž

Search through 5400 animal images (90 classes x 60 images) using text or image queries.

Results are returned with captions.

See demo on Hugging Face Spaces.

Tech

  • Image embeddings created using OpenClip.
  • Search provided by FAISS.
  • Captions generated using BLIP.

Scope for improvement

  • Dataset can be expanded to include more images per class so as to provide richer results.

  • Dataset contains some duplicates which need to be removed.

  • OpenClip with ViT-B-32 model and laion2b_s34b_b79k weights works alright for basic queries, but fails to understand more abstract queries. For example, โ€œTiger foodโ€ returns images of tigers and not deers etc. Using a larger model might help in this regard by embedding images and text in a richer embedding space.

  • BLIP model appears to repeat last word in caption sometimes. Needs investigation. BLIP2 might provide better captions at the cost of being slower on CPU.

Project extensions

  • Audio input can be added using whisper model. Model would convert audio to text which can then be joined with existing pipeline.

  • We could use an LLM as a judge to rank / grade returned results based on how closely they match the search query.

About

Multimodal animal image search using FAISS & OpenClip embeddings and image captioning with an LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published