#

llm-compression

Here are 12 public repositories matching this topic...

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

compression language-model knowledge-distillation model-quantization pruning-algorithms llm llm-compression efficient-llm

Updated Jun 17, 2025
Python

Tencent / AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

quantization diffusion vlm llm qwen speculative-decoding llm-compression hunyuan deepseek fp4

Updated Sep 28, 2025
Python

pprp / Pruner-Zero

[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

symbolic-regression llm-compression llm-pruning

Updated Nov 25, 2024
Python

lliai / D2MoE

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

efficient language-model llm pruning-sparsity llm-compression deepseek mixtral-of-experts

Updated Mar 25, 2025
Python

VITA-Group / llm-kick

[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.

llm-inference llm-evaluation llm-compression llm-pruning

Updated Apr 21, 2025
Python

Picovoice / llm-compression-benchmark

LLM Compression Benchmark

llm llm-inference llm-compression

Updated Aug 6, 2025
Python

Picovoice / serverless-picollm

LLM Inference on AWS Lambda

aws-lambda serverless llm serverless-inference llm-inference llm-compression

Updated Jun 3, 2024
Python

bupt-ai-club / llm-compression-papers

papers of llm compression

survey pruning quantization knowledge-distillation llm llm-compression llm-survey llm-compression-survey

Updated Mar 6, 2024

GongCheng1919 / bias-compensation

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

post-training-quantization llm-compression output-error-optimization bias-compensation llm-quantization

Updated Mar 12, 2025
Python

KeithLin724 / NYCU_Edge_AI_SGLang

NYCU Edge AI Final Project Using SGLang

quantization llm vllm llm-compression sglang

Updated Jun 4, 2025
Python

FardinHash / tokencal

Token Price Estimation for LLMs

tokens cost-optimization cost-management token-count llm-compression llm-cost llm-token token-cost

Updated Jun 20, 2024
Python

meghanmane84 / LLM-Manifold-Based-Compression-Techniques

Research code for LLM Compression using Functional Algorithms, exploring stratified manifold learning, clustering, and compression techniques. Experiments span synthetic datasets (Swiss Roll, Manifold Singularities) and real-world text embeddings (DBpedia-14). The goal is to preserve semantic structure while reducing model complexity.

clustering cuda embeddings pruning quantization manifold-learning topological-data-analysis hdbscan spectral-clustering stratification functional-algorithms llm-compression classical-clustering

Updated Sep 12, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."