Skip to content

ximinng/LLM4SVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Empowering LLMs to Understand and Generate Complex Vector Graphics

Python Version Framework LLaMA-Factory Framework Unsloth Framework Transformers Framework TRL Inference vLLM

CVPR 2025 arXiv Project Website Dataset SVGX-Core-250k Dataset SVGX-SFT-1M


Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics". This project enables Large Language Models to process, understand, and generate complex Scalable Vector Graphics (SVG).

Table of Contents

πŸŽ‰ News

✨ Highlights

  • 🧠 Multi-model Support: Fine-tune a wide range of popular foundation models, including Llama 3.2, Qwen2.5-VL, Gemma 3, DeepSeek, Falcon, Phi-2, GPT2-XL, and more.
  • πŸ“¦ Specialized SVGX Dataset: Includes curated pretraining data (SVGX-Core-250k) and extensive supervised fine-tuning data (SVGX-SFT-1M).
  • ⚑ Accelerated Training & Inference: Leverages efficient training frameworks like LLaMA-Factory, unsloth, transformers, and trl. Integrated with vLLM for high-throughput, low-latency inference.
  • πŸ” Multimodal Capabilities: Fully supports text and vision inputs for comprehensive SVG understanding and generation tasks.
  • βš™οΈ Flexible Training Options: Supports various training techniques including LoRA and full fine-tuning, along with distributed training setups (Multi-GPU, Multi-Node).

πŸ“Š SVGX-SFT Dataset

Our SVGX-SFT Dataset is a comprehensive collection designed specifically for training LLMs to work effectively with vector graphics.

# Login using `huggingface-cli login` if the dataset requires authentication
from datasets import load_dataset

# Load SVGX-Core-250k
svgx_core_250k_dataset = load_dataset("xingxm/SVGX-Core-250k")

# Load SVGX-SFT-1M
svgx_sft_1m_dataset = load_dataset("xingxm/SVGX-SFT-1M")

πŸ“¦ Installation & Data Preparation

# Step 1: Set up the environment (torch & unsloth & trl)
conda env create -f environment.yml && conda activate llm4svg

# Step 2: Download the datasets and place them in `dataset/SVGX-dataset`
bash script/download_dataset.sh

# Step 3: Set up the datasets
bash script/setup_dataset.sh

# Step 4: Install LLaMA-Factory
cd LLaMA-Factory && pip install -e ".[torch,metrics]"

πŸš€ Training Examples

We provide example configurations for fine-tuning various models using different frameworks.


Based on LLaMA-Factory:

  • Model: Qwen/Qwen2.5-VL-7B
# Fine-tune with LoRA (using special SVG Tokens for encoding)
llamafactory-cli train examples/train_lora/svgx_qwen2vl_lora_sft_enc.yaml

# Example: Multi-Node Distributed Training (4 nodes)
# Set MASTER_ADDR, MASTER_PORT, NODE_RANK accordingly
FORCE_TORCHRUN=1 NNODES=4 NODE_RANK=<0,1,2,3> MASTER_ADDR=<your_master_node_ip> MASTER_PORT=29500 \
llamafactory-cli train examples/train_lora/svgx_qwen2vl_lora_sft_enc.yaml
  • Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
llamafactory-cli train examples/train_lora/svgx_deepseekr1_qwen_lora_sft_enc.yaml
  • Model: Google/Gemma-3
llamafactory-cli train examples/train_lora/svgx_gemma3_lora_sft_enc.yaml
  • Model: Falcon-7B
llamafactory-cli train examples/train_lora/svgx_falcon_lora_sft_enc.yaml

Based on unsloth:

  • Model: unsloth/llama-3.2-Instruct-4bit (Example using 4-bit quantized Llama-3.2)
python main.py x=llama3-sft-unsloth project_dir="workspace/llm4svg-via-unsloth/llama3-4bit"

Based on transformers & accelerate:

  • Model: openai/GPT2-XL
# Default training using accelerate
accelerate launch main.py x=gpt2-sft data.load_from_disk='/path/to/dataset'
# Example using multiple GPUs with Data Parallelism (DDP)
accelerate launch --config_file configs/accelerate/ddp_config.yaml main.py x=gpt2-sft x.seq_len=2048 x.train_batch_size=2 project_dir="workspace/llm4svg-gpt2xl-maxL2048"
# or
accelerate launch --multi_gpu main.py x=gpt2-sft data.text_prompt='[name]' x.seq_len=1024 x.train_batch_size=2 project_dir="workspace/llm4svg-gpt2xl-maxL1024"

Based on trl & accelerate:

  • Model: microsoft/Phi-2
accelerate launch --config_file configs/accelerate/fsdp_config.yaml main.py x=phi2-sft-trl project_dir="workspace/llm4svg-phi2-fsdp-maxL2048"

πŸ”§ Inference using vLLM

Get started with fast inference using our vLLM-powered API server:

API_PORT=8000 llamafactory-cli api \
--model_name_or_path=Qwen/Qwen2-VL-7B-SVGX-SFT-Encode-Model \
--max_length=4096 --max_new_tokens=4096 \
--template=qwen2_vl --trust_remote_code=True --infer_backend=vllm

Refer to the vLLM Quickstart for more details on interacting with the API endpoint.

πŸ”‘ Tips for Best Results

  • Distributed Training: For datasets >50k examples, consider using multi-node setups (like DeepSpeed or FSDP via accelerate or llamafactory-cli) to significantly reduce training time.
  • Context Length:* Set an appropriate max_seq_length (e.g., x.seq_len or via YAML config) for complex SVG generation. We recommend 2048 or higher.
  • Batch Optimization: Adjust per_device_train_batch_size and gradient_accumulation_steps based on your available GPU memory to maximize throughput.
  • Inference Acceleration: Utilize vLLM as your inference backend (--infer_backend=vllm) for optimized performance (up to 2x faster generation compared to standard Hugging Face pipelines).
  • Model Choice: Experiment with different base models. Models with strong visual grounding (like Qwen-VL) or coding capabilities might show better performance on SVG tasks.
  • SVG Tokenization: Using dedicated SVG tokens (enabled via _enc configs in examples) can potentially improve model understanding and generation of SVG syntax.

πŸ’˜ Acknowledgements

This project is built upon the incredible work of:

Special thanks to the authors and contributors of both projects!

πŸ“Ž Citation

If you find this project useful in your research, please consider citing our paper:

@article{xing2024llm4svg,
  title={Empowering LLMs to Understand and Generate Complex Vector Graphics},
  author={Xing, Ximing and Hu, Juncheng and Liang, Guotao and Zhang, Jing and Xu, Dong and Yu, Qian},
  journal={arXiv preprint arXiv:2412.11102},
  year={2024}
}

πŸ“„ License

This project is licensed under the MIT License.

πŸ“¬ Contact

For questions, bug reports, or collaboration inquiries, please feel free to open an issue or submit a pull request on GitHub.

About

[CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.11102

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages