A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
quantization compression-algorithm mlsystem mlsys large-language-models llms efficientml efficient-computing
-
Updated
Jun 18, 2025 - Python