Reinforcement learning auto-bidding library for research and production.
Overview β’ Who Should Use This β’ Installation β’ Quickstart β’ Benchmarks β’ API β’ Citation
rlbidder is a comprehensive toolkit for training and deploying reinforcement learning agents in online advertising auctions. Built for both industrial scale and research agility, it provides:
- Complete offline RL pipeline: Rust-powered data processing (Polars) β SOTA algorithms (IQL, CQL, DT, GAVE) β parallel evaluation
- Modern ML infrastructure: PyTorch Lightning multi-GPU training, experiment tracking, automated reproducibility
- Production insights: Interactive dashboards for campaign monitoring, market analytics, and agent behavior analysis
- Research rigor: Statistically robust benchmarking with RLiable metrics, tuned control baselines, and round-robin evaluation
Whether you're deploying bidding systems at scale or researching novel RL methods, rlbidder bridges the gap between academic innovation and production readiness.
Researchers looking to experiment with SOTA offline RL algorithms (IQL, CQL, DT, GAVE, GAS) on realistic auction data with rigorous benchmarking.
AdTech Practitioners comparing RL agents against classic baselines (PID, BudgetPacer) before production deployment.
rlbidder pushes beyond conventional RL libraries by integrating cutting-edge techniques from both RL research and modern LLM/transformer architectures. Here's what sets it apart:
- Standardized workflow: Scan Parquet β RL Dataset β Feature Engineering β DT Dataset with reproducible artifacts at every stage
- Polars Lazy API: Streaming data processing with a blazingly fast Rust engine that handles massive datasets without memory overhead
- Scalable workflows: Process 100GB+ auction logs efficiently with lazy evaluation and zero-copy operations
- Feature engineering: Drop-in scikit-learn-style transformers (Symlog, Winsorizer, ReturnScaledReward) for states, actions, and rewards
- Comprehensive baselines: Classic control (Heuristic, BudgetPacer, PID) and learning-based methods (BC, CQL, IQL, DT, GAVE, GAS)
- HL-Gauss Distributional RL: Smooth Gaussian-based distributional Q-learning for improved uncertainty quantification, advancing beyond standard categorical approaches
- Efficient ensemble critics: Leverage
torch.vmapfor vectorized ensemble operationsβmuch faster than traditional loop-based implementations - Numerically stable stochastic policies: DreamerV3-style
SigmoidRangeStdand TorchRL-styleBiasedSoftplusto avoid numerical instabilities from exp/log operations
- FlashAttention (SDPA): Uses latest PyTorch scaled dot-product attention API for accelerated training
- RoPE positional encoding: Rotary positional embeddings for improved sequence length generalization, adopted from modern LLMs
- QK-Norm: Query-key normalization for enhanced training stability at scale
- SwiGLU: Advanced feed-forward networks for superior expressiveness
- Efficient inference:
DTInferenceBufferwith deque-based temporal buffering for online Decision Transformer deployment
- Parallel evaluation: Multi-process evaluators with pre-loaded data per workerβmuch faster than sequential benchmarking
- Robust testing: Round-robin agent rotation with multi-seed evaluation for statistically reliable comparisons
- Tuned competitors: Classic control methods (BudgetPacer, PID) with optimized hyperparameters as baselines
- Interactive dashboards: Production-ready Plotly visualizations with market structure metrics (HHI, Gini, volatility) and RLiable metrics
- Industrial analytics: Campaign health monitoring, budget pacing diagnostics, auction dynamics, and score distribution analysis
- Modular design: Enables both production readiness and rapid prototyping
- PyTorch Lightning: Reduce boilerplate code, automatic mixed precision, gradient accumulation
- Draccus configuration: Type-safe dataclass-to-CLI with hierarchical configs, dot-notation overrides, and zero boilerplate
- Local experiment tracking: AIM for experiment management without external cloud dependencies
| Feature | AuctionNet | rlbidder |
|---|---|---|
| Data Engine | Pandas | Polars Lazy (Rust) β¨ |
| Configuration | argparse | Draccus (dataclass-to-CLI) β¨ |
| Distributional RL | β | HL-Gauss β¨ |
| Ensemble Method | β | torch.vmap β¨ |
| Transformer Attention | Standard | SDPA/FlashAttn β¨ |
| Positional Encoding | Learned | RoPE β¨ |
| Policy Stability | exp(log_std) | SigmoidRangeStd/BiasedSoftplus β¨ |
| Parallel Evaluation | β | ProcessPool + Round-robin β¨ |
| Visualization | β | Production Dashboards β¨ |
We evaluate all agents using rigorous statistical methods across multiple delivery periods with round-robin testing and multi-seed evaluation. The evaluation protocol follows RLiable best practices for statistically reliable algorithm comparison.
Beyond raw performance metrics, rlbidder helps you understand why agents behave the way they do. Production-grade interactive dashboards summarize policy behavior, campaign health, and auction dynamics for both research insights and production monitoring.
- Python 3.11 or newer
- PyTorch 2.6 or newer (follow PyTorch install guide)
- GPU with 8GB+ vRAM recommended for training
git clone https://github.com/zuoxingdong/rlbidder.git
cd rlbidder
pip install -e .Follow the steps below to reproduce the full offline RL workflow on processed campaign data.
# Download sample competition data (periods 7-8 and trajectory 1)
bash scripts/download_raw_data.sh -p 7-8,traj1 -d data/raw
# Convert raw CSV to Parquet (faster I/O with Polars)
python scripts/convert_csv_to_parquet.py --raw_data_dir=data/raw
# Build evaluation-period parquet files
python scripts/build_eval_dataset.py --data_dir=data
# Create training transitions (trajectory format for offline RL)
python scripts/build_transition_dataset.py --data_dir=data --mode=trajectory
# Fit scalers for state, action, and reward normalization
python scripts/scale_transitions.py --data_dir=data --output_dir=scaled_transitions
# Generate Decision Transformer trajectories with return-to-go
python scripts/build_dt_dataset.py \
--build.data_dir=data \
--build.reward_type=reward_dense \
--build.use_scaled_reward=trueWhat you'll have: Preprocessed datasets in data/processed/ and fitted scalers in data/scaled_transitions/ ready for training.
# Train IQL (Implicit Q-Learning) - value-based offline RL
python examples/train_iql.py \
--model_cfg.lr_actor 3e-4 \
--model_cfg.lr_critic 3e-4 \
--model_cfg.num_q_models 5 \
--model_cfg.bc_alpha 0.01 \
--train_cfg.enable_aim_logger=False
# Train DT (Decision Transformer) - sequence modeling for RL
python examples/train_dt.py \
--model_cfg.embedding_dim 512 \
--model_cfg.num_layers 6 \
--model_cfg.lr 1e-4 \
--model_cfg.rtg_scale 98 \
--model_cfg.target_rtg 2.0 \
--train_cfg.enable_aim_logger=FalseWhat you'll have: Trained model checkpoints in examples/checkpoints/ with scalers and hyperparameters.
π‘ Configuration powered by draccus: All training scripts use type-safe dataclass configs with automatic CLI generation. Override any nested config with dot-notation (e.g., --model_cfg.lr 1e-4) or pass config files directly.
π‘ Track experiments with Aim: All training scripts automatically log metrics, hyperparameters, and model artifacts to Aim (a local experiment tracker). To use Aim, first initialize your project with:
aim initThen launch the Aim UI to visualize training progress:
aim up --port 43800Then open http://localhost:43800 in your browser to explore training curves, compare runs, and analyze hyperparameter configurations.
# Evaluate IQL agent with parallel multi-seed evaluation
python examples/evaluate_agents.py \
--evaluation.data_dir=data \
--evaluation.evaluator_type=OnlineCampaignEvaluator \
--evaluation.delivery_period_indices=[7,8] \
--evaluation.num_seeds=5 \
--evaluation.num_workers=8 \
--evaluation.output_dir=examples/eval \
--agent.agent_class=IQLBiddingAgent \
--agent.model_dir=examples/checkpoints/iql \
--agent.checkpoint_file=best.ckptWhat you'll have: Evaluation reports, campaign summaries, and auction histories in examples/eval/ ready for visualization.
Next steps: Generate dashboards with examples/performance_visualization.ipynb or explore the evaluation results with Polars DataFrames.
Each module handles a specific aspect of the RL bidding pipeline:
| Module | Description | Key Classes/Functions |
|---|---|---|
π rlbidder.agents |
Offline RL agents and control baselines | IQLModel, CQLModel, DTModel, GAVEModel, BudgetPacerBiddingAgent |
π§ rlbidder.data |
Data processing, scalers, and datasets | OfflineDataModule, TrajDataset, SymlogTransformer, WinsorizerTransformer |
πͺ rlbidder.envs |
Auction simulation and value sampling | OnlineAuctionEnv, ValueSampler, sample_conversions |
π― rlbidder.evaluation |
Multi-agent evaluation and metrics | ParallelOnlineCampaignEvaluator, OnlineCampaignEvaluator |
π§ rlbidder.models |
Neural network building blocks | StochasticActor, EnsembledQNetwork, NormalHead, HLGaussLoss |
π rlbidder.viz |
Interactive dashboards and analytics | create_campaign_dashboard, create_market_dashboard, plot_rliable_metrics |
π οΈ rlbidder.utils |
Utilities and helpers | set_seed, log_distribution, regression_report |
The library follows a modular design with clear separation of concerns. Data flows from raw logs through preprocessing, training, and evaluation to final visualization:
flowchart TD
subgraph Data["π¦ Data Pipeline"]
direction TB
raw["Raw Campaign Data<br/><i>CSV/Parquet logs</i>"]
scripts["Build Scripts<br/>convert β’ build_eval<br/>build_transition β’ scale"]
artifacts["π Preprocessed Artifacts<br/>processed/ β’ scaled_transitions/<br/><i>Parquet + Scalers</i>"]
raw -->|transform| scripts
scripts -->|generate| artifacts
end
subgraph Core["βοΈ Core Library Modules"]
direction TB
data_mod["<b>rlbidder.data</b><br/>OfflineDataModule<br/>TrajDataset β’ ReplayBuffer<br/>π§ <i>Handles batching & scaling</i>"]
models["<b>rlbidder.models</b><br/>StochasticActor β’ EnsembledQNetwork<br/>ValueNetwork β’ Losses β’ Optimizers<br/>π§ <i>Agent building blocks</i>"]
agents["<b>rlbidder.agents</b><br/>IQLModel β’ CQLModel β’ DTModel<br/>π <i>LightningModule implementations</i>"]
agents -->|composes| models
end
subgraph Training["π₯ Training Pipeline"]
direction TB
train["<b>examples/train_iql.py</b><br/>ποΈ Config + CLI<br/><i>Orchestration script</i>"]
trainer["β‘ Lightning Trainer<br/>fit() β’ validate()<br/><i>Multi-GPU support</i>"]
ckpt["πΎ Model Checkpoints<br/>best.ckpt β’ last.ckpt<br/><i>+ scalers + hparams</i>"]
train -->|instantiates| data_mod
train -->|instantiates| agents
train -->|launches| trainer
trainer -->|saves| ckpt
end
subgraph Eval["π― Online Evaluation"]
direction TB
evaluator["<b>rlbidder.evaluation</b><br/>OnlineCampaignEvaluator<br/>ParallelEvaluator<br/>π <i>Multi-seed, round-robin</i>"]
env["<b>rlbidder.envs</b><br/>Auction Simulator<br/>πͺ <i>Multi-agent market</i>"]
results["π Evaluation Results<br/>Campaign Reports β’ Agent Summaries<br/>Auction Histories<br/><i>Polars DataFrames</i>"]
evaluator -->|simulates| env
env -->|produces| results
end
subgraph Viz["π Visualization & Analysis"]
direction TB
viz["<b>rlbidder.viz</b><br/>Plotly Dashboards<br/>Market Metrics<br/>π¨ <i>Interactive HTML</i>"]
plots["π Production Dashboards<br/>Campaign Health β’ Market Structure<br/>Budget Pacing β’ Scatter Analysis"]
viz -->|renders| plots
end
artifacts ==>|loads| data_mod
artifacts -.->|eval data| evaluator
ckpt ==>|load_from_checkpoint| evaluator
results ==>|consumes| viz
classDef dataStyle fill:#1565c0,stroke:#0d47a1,stroke-width:3px,color:#fff,font-weight:bold
classDef coreStyle fill:#ef6c00,stroke:#e65100,stroke-width:3px,color:#fff,font-weight:bold
classDef trainStyle fill:#6a1b9a,stroke:#4a148c,stroke-width:3px,color:#fff,font-weight:bold
classDef evalStyle fill:#2e7d32,stroke:#1b5e20,stroke-width:3px,color:#fff,font-weight:bold
classDef vizStyle fill:#c2185b,stroke:#880e4f,stroke-width:3px,color:#fff,font-weight:bold
class Data,raw,scripts,artifacts dataStyle
class Core,data_mod,models,agents coreStyle
class Training,train,trainer,ckpt trainStyle
class Eval,evaluator,env,results evalStyle
class Viz,viz,plots vizStyle
Design Principles:
- π Modular - Each component is independently usable and testable
- β‘ Scalable - Polars + Lightning enable massive datasets and efficient training
- π Reproducible - Deterministic seeding, configuration management, and evaluation
- π Production-ready - Type hints, error handling, logging, and monitoring built-in
- π Star the repo if you find it useful
- π Fork and submit PRs for bug fixes or new features
- π Improve documentation and add examples
- π§ͺ Add tests for new functionality
rlbidder builds upon ideas from:
- AuctionNet original pioneer, for auction environment and benchmark design
- PyTorch Lightning for training infrastructure
- Draccus for elegant dataclass-to-CLI configuration management
- TRL & Transformers for modern transformer implementations
- Polars for high-performance data processing
- PyTorch RL for RL algorithm
If you use rlbidder in your work, please cite it using the BibTeX entry below.
@misc{zuo2025rlbidder,
author = {Zuo, Xingdong},
title = {RLBidder: Reinforcement learning auto-bidding library for research and production},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/zuoxingdong/rlbidder}}
}MIT License. See LICENSE.







