Gpu fallen of the bus - candle cuda health check for GPU health.

### System Info

https://forums.developer.nvidia.com/t/gpu-has-fallen-off-the-bus/217357

Currently we already return `Ok()` irrespective if gpu fails
https://github.com/huggingface/text-embeddings-inference/blob/ebb63dfa7121705f1999a06d8e222581a5221c00/backends/candle/src/lib.rs#L584 

Willing to contribute a simple candle equivalent of 
```python
import torch # candle
try:
       torch.Tensor([2]).cuda() ** 2
       return "healthy"
except:
       return "error"
```

Other option:
return err if the last 3 consecutive candle requests fail.

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

e.g. overheat disconnect your GPU. 

### Expected behavior

-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gpu fallen of the bus - candle cuda health check for GPU health. #722

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gpu fallen of the bus - candle cuda health check for GPU health. #722

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions