-
Notifications
You must be signed in to change notification settings - Fork 311
Open
Description
Feature request
Hey Folks,
I am using TEI for embedding creation, and I am constantly getting (OOM)
I need to profile at the CUDA level and understand what is happening under the hood.
- vLLM quickly saturates GPU memory but preempts gracefully.
- TEI grows linearly with batch size and crashes on OOM, without recovery.
This one!

Motivation
I am using TEI for embedding creation, and I am constantly getting (OOM)
Your contribution
I cannot contribute, as I dont know Rust!
Metadata
Metadata
Assignees
Labels
No labels