A curated list for Efficient Large Language Models
-
Updated
Jun 17, 2025 - Python
A curated list for Efficient Large Language Models
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
LLM Inference on AWS Lambda
papers of llm compression
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
NYCU Edge AI Final Project Using SGLang
Token Price Estimation for LLMs
Research code for LLM Compression using Functional Algorithms, exploring stratified manifold learning, clustering, and compression techniques. Experiments span synthetic datasets (Swiss Roll, Manifold Singularities) and real-world text embeddings (DBpedia-14). The goal is to preserve semantic structure while reducing model complexity.
Add a description, image, and links to the llm-compression topic page so that developers can more easily learn about it.
To associate your repository with the llm-compression topic, visit your repo's landing page and select "manage topics."