AIF-Gen is a platform for generating synthetic RLHF datasets for lifelong reinforcement learning on LLMs.
Our main goal is to facilitate preference data generation at scale via RL from AI feedback. AIF-Gen natively supports evolving preferences making it especially useful for studying non-stationary domains such as tutoring. Think of it like Procgen, but for RLHF.
Note
AIF-Gen is still alpha software, and may introduce breaking changes.
- ⚡ Asynchronous LLM batch inference powered by vLLM
- 🔧 Modular prompt templates and fully customizable preference specification
- 🗄️ LLM response cache to avoid redundant API requests
- ✅ Validation metrics to judge synthetic data quality
- 🤗 Direct integration with HuggingFace for robust dataset management
AIF-Gen is intended to be primarily used as a command line tool:
foo@bar:~$ aif --help
/ _ | / _/ __/ / ___/ __/ |/ /
/ __ |_/ // _/ / (_ / _// /
/_/ |_/___/_/ \___/___/_/|_/
A tool for generating synthetic continual RLHF datasets.
Usage: aif [OPTIONS] COMMAND [ARGS]...
Options:
--log_file FILE Optional log file to use. [default: aif_gen.log]
--help Show this message and exit.
Commands:
generate Generate a new ContinualAlignmentDataset.
merge Merge a set of ContinualAlignmentDatasets.
preview Preview a ContinualAlignmentDataset.
sample Downsample a ContinualAlignmentDataset.
transform Transform a ContinualAlignmentDataset.
validate Validate a ContinualAlignmentDataset.
For advanced usage, refer to our docs.
In this example, we highlight the ease of generating synthetic data with AIF-Gen.
First, ensure you have installed AIF-Gen (see installation). For this example, we'll generating data using allenai/OLMo-1B-hf. The chat template we are using is found here.
We'll need to serve our model on an inference server with vLLM. The following will do the trick:
# Install vLLM (only needs to be done once)
uv tool install vllm
# Serve the model locally
uvx --with setuptools serve allenai/OLMo-1B-hf --dtype auto --api-key MY_KEY --chat-template chat_templates/omlo-chat-template.jinja
Some things to keep in mind:
- We use the api-key
MY_KEY
, but anything works here - This starts an inference server listening on
localhost:8000
Now that the inference server is up, we'll need to export a few environment variables so that AIF-Gen knows where to direct requests.
export OPENAI_BASE_URL=http://localhost:8000
export OPENAI_API_KEY=MY_KEY
# Optionally, set the following to cache OpenAI requests in Elasticsearch.
export ELASTIC_SEARCH_HOST="..."
export ELASTIC_SEARCH_API_KEY="..."
We are now ready to specify our preference data configuration. We'll create the following yaml file in config/philosophy_qna.yaml
.
---
task_specs:
# First dataset: 5 samples of Philosophy QNA with ELI5 preference
- num_samples: 5
alignment_task:
objective: 'Ask an interesting philosophy question'
preference: 'Explain the answer at a level that could be understood by a 5 year old'
domain:
philosophy:
seed_words: # Some interesting words we want inject into our prompts
- consciousness
- time
- metaphysics
# Second dataset, 5 samples of Philosophy QNA with expert preference
- num_samples: 5
alignment_task:
objective: 'Ask an interesting philosophy question'
preference: 'Explain the answer at an expert level. Draw from technical literature.'
domain:
philosophy:
seed_words: # Change up some seed words for variety
- determinism
- universe
- meaning
This will produce a final dataset with 10 samples in TRL preference format with explicit prompts. The first 5 responses follow the ELI5 preference, while the last 5 should be more technical.
It's advisable to do a dry run first to ensure everything is setup correctly:
aif generate config/philosophy_qna.yaml allenai/OLMo-1B-hf --dry-run
If everything worked, you should see: Dry run was a success
. We can now generate the data:
aif generate config/philosophy_qna.yaml allenai/OLMo-1B-hf
For options such as choosing output directory, changing model temperature, increasing concurrency limits, and uploading directly to hugging face, check our docs or issue aif generate --help
.
Tip
Refer to our our docs for information and example usage for the other commands.
The current recommended way to install AIF-Gen is from source.
Using uv (recommended)
# Create and activate your venv
uv venv my_venv --python 3.10 && source my_venv/bin/activate
# Install the wheels into the venv
uv pip install git+https://github.com/ComplexData-MILA/AIF-Gen.git
# Test the install
aif
Using pip
# Create and activate your venv
python3.10 -m venv my_venv && source my_venv/bin/activate
# Install the wheels into the venv
pip install git+https://github.com/ComplexData-MILA/AIF-Gen.git
# Test the install
aif
Documentation along with a quick start guide can be found on the docs website.
Please cite our paper if your use this code in your own work:
@article{TODO,
title = "TODO",
author = "TODO"
journal = "TODO",
url = "TODO"
year = "2025",
}
If you notice anything unexpected, or would like to propose a new feature, please open an issue and feel free to discuss them with us.
To learn more about making a contribution to AIF-Gen see our contribution guide.