A comprehensive Python-based validation framework for ensuring data quality in AI agent training datasets. This framework checks data completeness, format integrity, consistency, and quality metrics specifically tailored for AI/ML applications.
- Completeness Validation: Missing values, required columns, empty records
- Format Validation: Data types, string patterns, date formats
- Consistency Validation: Value ranges, categorical values, foreign key relations
- Quality Validation: Class balance, feature correlation, outlier detection
- Comprehensive Reporting: Multiple output formats (console, JSON, HTML)
- Extensible Architecture: Easy to add custom validators
git clone https://github.com/qshytpolite/ai-data-validator.git
cd ai-data-validator
pip install -r requirements.txt