Cuda level metrics collection!

### Feature request

Hey Folks, 

I am using TEI for embedding creation, and I am constantly getting (OOM)
I need to profile at the CUDA level and understand what is happening under the hood. 

- **vLLM** quickly saturates GPU memory but preempts gracefully.
- **TEI** grows linearly with batch size and crashes on OOM, without recovery. 

This one! 

<img width="1897" height="895" alt="Image" src="https://github.com/user-attachments/assets/b23cfac8-d51f-4d5f-9346-8c9bf559e047" />

### Motivation

I am using TEI for embedding creation, and I am constantly getting (OOM)

### Your contribution

I cannot contribute, as I dont know Rust!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda level metrics collection! #688

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cuda level metrics collection! #688

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions