GitHub - ComplexData-MILA/AIF-Gen: Generating Synthetic Lifelong RL Data for LLMs at Scale

Generating Synthetic Continual RLHF Data at Scale

About The Project

AIF-Gen is a platform for generating synthetic RLHF datasets for lifelong reinforcement learning on LLMs.

Our main goal is to facilitate preference data generation at scale via RL from AI feedback. AIF-Gen natively supports evolving preferences making it especially useful for studying non-stationary domains such as tutoring. Think of it like Procgen, but for RLHF.

Note

AIF-Gen is still alpha software, and may introduce breaking changes.

Library Highlights

⚡ Asynchronous LLM batch inference powered by vLLM
🔧 Modular prompt templates and fully customizable preference specification
🗄️ LLM response cache to avoid redundant API requests
✅ Validation metrics to judge synthetic data quality
🤗 Direct integration with HuggingFace for robust dataset management

Architecture Overview

Quick Tour for New Users

AIF-Gen is intended to be primarily used as a command line tool:

foo@bar:~$ aif --help

          / _ | /  _/ __/ / ___/ __/ |/ /
         / __ |_/ // _/  / (_ / _//    /
        /_/ |_/___/_/    \___/___/_/|_/

A tool for generating synthetic continual RLHF datasets.

Usage: aif [OPTIONS] COMMAND [ARGS]...

Options:
  --log_file FILE  Optional log file to use.  [default: aif_gen.log]
  --help           Show this message and exit.

Commands:
  generate   Generate a new ContinualAlignmentDataset.
  merge      Merge a set of ContinualAlignmentDatasets.
  preview    Preview a ContinualAlignmentDataset.
  sample     Downsample a ContinualAlignmentDataset.
  transform  Transform a ContinualAlignmentDataset.
  validate   Validate a ContinualAlignmentDataset.

For advanced usage, refer to our docs.

Generating Data

In this example, we highlight the ease of generating synthetic data with AIF-Gen.

Pre-requisites

First, ensure you have installed AIF-Gen (see installation). For this example, we'll generating data using allenai/OLMo-1B-hf. The chat template we are using is found here.

We'll need to serve our model on an inference server with vLLM. The following will do the trick:

# Install vLLM (only needs to be done once)
uv tool install vllm

# Serve the model locally
uvx --with setuptools serve allenai/OLMo-1B-hf --dtype auto --api-key MY_KEY --chat-template chat_templates/omlo-chat-template.jinja

Some things to keep in mind:

We use the api-key MY_KEY, but anything works here
This starts an inference server listening on localhost:8000

Export env variables

Now that the inference server is up, we'll need to export a few environment variables so that AIF-Gen knows where to direct requests.

export OPENAI_BASE_URL=http://localhost:8000
export OPENAI_API_KEY=MY_KEY

# Optionally, set the following to cache OpenAI requests in Elasticsearch.
export ELASTIC_SEARCH_HOST="..."
export ELASTIC_SEARCH_API_KEY="..."

Create a Dataset Configuration

We are now ready to specify our preference data configuration. We'll create the following yaml file in config/philosophy_qna.yaml.

---
task_specs:
  # First dataset: 5 samples of Philosophy QNA with ELI5 preference
  - num_samples: 5
    alignment_task:
      objective: 'Ask an interesting philosophy question'
      preference: 'Explain the answer at a level that could be understood by a 5 year old'
      domain:
        philosophy:
          seed_words: # Some interesting words we want inject into our prompts
            - consciousness
            - time
            - metaphysics

  # Second dataset, 5 samples of Philosophy QNA with expert preference
  - num_samples: 5
    alignment_task:
      objective: 'Ask an interesting philosophy question'
      preference: 'Explain the answer at an expert level. Draw from technical literature.'
      domain:
        philosophy:
          seed_words: # Change up some seed words for variety
            - determinism
            - universe
            - meaning

This will produce a final dataset with 10 samples in TRL preference format with explicit prompts. The first 5 responses follow the ELI5 preference, while the last 5 should be more technical.

Generate some data

It's advisable to do a dry run first to ensure everything is setup correctly:

aif generate config/philosophy_qna.yaml allenai/OLMo-1B-hf --dry-run

If everything worked, you should see: Dry run was a success. We can now generate the data:

aif generate config/philosophy_qna.yaml allenai/OLMo-1B-hf

For options such as choosing output directory, changing model temperature, increasing concurrency limits, and uploading directly to hugging face, check our docs or issue aif generate --help.

Tip

Refer to our our docs for information and example usage for the other commands.

Installation

The current recommended way to install AIF-Gen is from source.

Using uv (recommended)

# Create and activate your venv
uv venv my_venv --python 3.10 && source my_venv/bin/activate

# Install the wheels into the venv
uv pip install git+https://github.com/ComplexData-MILA/AIF-Gen.git

# Test the install
aif

Using pip

# Create and activate your venv
python3.10 -m venv my_venv && source my_venv/bin/activate

# Install the wheels into the venv
pip install git+https://github.com/ComplexData-MILA/AIF-Gen.git

# Test the install
aif

Documentation

Documentation along with a quick start guide can be found on the docs website.

Citation

Please cite our paper if your use this code in your own work:

@article{TODO,
  title   = "TODO",
  author  = "TODO"
  journal = "TODO",
  url     = "TODO"
  year    = "2025",
}

Contributing

If you notice anything unexpected, or would like to propose a new feature, please open an issue and feel free to discuss them with us.

To learn more about making a contribution to AIF-Gen see our contribution guide.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 552 Commits
.github		.github
aif_gen		aif_gen
benchmarks		benchmarks
chat_templates		chat_templates
config		config
docs		docs
jobs		jobs
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generating Synthetic Continual RLHF Data at Scale

About The Project

Library Highlights

Architecture Overview

Quick Tour for New Users

Generating Data

Pre-requisites

Export env variables

Create a Dataset Configuration

Generate some data

Installation

Using uv (recommended)

Using pip

Documentation

Citation

Contributing

About

Uh oh!

Releases

Uh oh!

Contributors 6

Uh oh!

Languages

License

ComplexData-MILA/AIF-Gen

Folders and files

Latest commit

History

Repository files navigation

Generating Synthetic Continual RLHF Data at Scale

About The Project

Library Highlights

Architecture Overview

Quick Tour for New Users

Generating Data

Pre-requisites

Export env variables

Create a Dataset Configuration

Generate some data

Installation

Using uv (recommended)

Using pip

Documentation

Citation

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 6

Uh oh!

Languages