DART: Diffusion-Autoregressive Recursive Transformer

Overview

DART (Diffusion-Autoregressive Recursive Transformer) is a novel hybrid architecture that combines diffusion-based and autoregressive approaches for text generation. By leveraging both paradigms, DART achieves robust global coherence through diffusion while maintaining local consistency via autoregressive modeling.

Key Features

Hybrid Architecture: Integrates diffusion and autoregressive components in a unified framework
Adaptive Noise Scheduling: Implements multiple noise scheduling strategies (linear, cosine, quadratic)
Flexible Generation: Supports both conditional and unconditional text generation
Production Ready: Full type annotations, comprehensive logging, and configurable parameters
Efficient Implementation: Optimized attention mechanisms and memory usage
Modular Design: Easy to extend and modify for specific use cases

Model Architecture

DART consists of several key components:

Diffusion Transformer (DiT) blocks for global dependency modeling
Autoregressive blocks for local coherence
Adaptive noise scheduling mechanism
Dual-path information exchange during training
Classifier-free guidance support

Installation

https://github.com/The-Swarm-Corporation/DART.git
cd DART
pip install -r requirements.txt

Quick Start

from main import DART, DARTConfig

# Initialize configuration
config = DARTConfig(
    vocab_size=50257,  # GPT-2 vocabulary size
    hidden_size=768,
    num_hidden_layers=12,
    num_attention_heads=12,
    diffusion_steps=1000,
    ar_weight=0.5,
)

# Initialize model
model = DART(config)

# Training example
input_ids = torch.randint(0, config.vocab_size, (4, 128))
loss_dict = model.compute_loss(input_ids)
loss = loss_dict["loss"]
loss.backward()

# Generation example
generated = model.generate(
    input_ids=torch.tensor([[0, 1, 2, 3]]),
    max_length=128,
    temperature=0.8,
    do_sample=True,
)

Configuration Options

Parameter	Default	Description
vocab_size	50257	Vocabulary size (default: GPT-2)
hidden_size	768	Dimension of hidden layers
num_hidden_layers	12	Number of transformer layers
num_attention_heads	12	Number of attention heads
diffusion_steps	1000	Number of diffusion steps
ar_weight	0.5	Weight between AR and diffusion

Advanced Usage

Custom Noise Scheduling

model = DART(DARTConfig(
    diffusion_schedule="cosine",  # Options: linear, cosine, quadratic
    diffusion_steps=1000,
))

Classifier-Free Guidance

generated = model.generate(
    input_ids=input_ids,
    guidance_scale=7.5,  # Higher values = stronger guidance
)

Citation

If you use DART in your research, please cite:

@article{dart2024,
  title={DART: Diffusion-Autoregressive Recursive Transformer for Text Generation},
  author={Kye Gomez },
  journal={[Journal/Conference]},
  year={2024}
}

Contributing

We welcome contributions! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Thanks to the authors of DiT and related works in diffusion models
Built with PyTorch and Transformers library
Special thanks to the research community for valuable feedback

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dart-architecture.svg		dart-architecture.svg
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

DART: Diffusion-Autoregressive Recursive Transformer

Overview

Key Features

Model Architecture

Installation

Quick Start

Configuration Options

Advanced Usage

Custom Noise Scheduling

Classifier-Free Guidance

Citation

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

The-Swarm-Corporation/DART

Folders and files

Latest commit

History

Repository files navigation

DART: Diffusion-Autoregressive Recursive Transformer

Overview

Key Features

Model Architecture

Installation

Quick Start

Configuration Options

Advanced Usage

Custom Noise Scheduling

Classifier-Free Guidance

Citation

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages