multi-lora batching #14249

Diffizle · 2025-03-05T02:11:39Z

Diffizle
Mar 5, 2025

Does vllm support batching of prompts with different lora-adapters? Is there a more detailed example code?
the example shown in <examples/offline_inference/multilora_inference.py> does not seem to express this feature

jeejeelee · 2025-03-05T12:06:13Z

jeejeelee
Mar 5, 2025
Collaborator

Does vllm support batching of prompts with different lora-adapters?

vLLM support this featrue

the example shown in <examples/offline_inference/multilora_inference.py> does not seem to express this feature

multilora_inference.py show this feature, different lora_id to represent different LoRAs.

1 reply

Diffizle Mar 6, 2025
Author

yeah, I also figured it out with debugging
when max_loras is set to > 1, it is indeed batching loras
thanks for comment !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

multi-lora batching #14249

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

multi-lora batching #14249

Uh oh!

Uh oh!

Diffizle Mar 5, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

jeejeelee Mar 5, 2025 Collaborator

Uh oh!

Diffizle Mar 6, 2025 Author

Diffizle
Mar 5, 2025

Replies: 1 comment 1 reply

jeejeelee
Mar 5, 2025
Collaborator

Diffizle Mar 6, 2025
Author