🚀 A fully modular, production-grade ML pipeline for student performance prediction. Built using FastAPI, DVC, Optuna, MLflow, PostgreSQL, and Docker — following industry-grade MLOps principles.
- ✅ End-to-end ML pipeline: Ingestion ➜ Validation ➜ Transformation ➜ Training ➜ Evaluation ➜ Prediction
- ✅ YAML-driven configuration system with full ConfigBox support
- ✅ Dynamic preprocessing (scaling, encoding, imputation, custom ops)
- ✅ Optuna hyperparameter tuning with MLflow tracking
- ✅ S3 + local storage support for all artifacts and logs
- ✅ PostgreSQL-backed ingestion with dynamic table creation from schema
- ✅ DVC-integrated dataset management
student_performance/
├── app.py # FastAPI app for /predict and /train
├── Dockerfile # Container build setup
├── docker-compose.yaml # FastAPI + Redis + Celery stack
├── config/ # All YAML configs: params, schema, templates
├── data/ # DVC-tracked raw/validated/transformed data
├── artifacts/ # Timestamped pipeline outputs
├── logs/ # UTC-based log directory
├── templates/ # HTML Jinja templates (for UI if enabled)
├── requirements.txt # Dependency list
└── src/student_performance/
├── components/ # Pipeline stages
├── config/ # Configuration manager
├── constants/ # Path constants and global flags
├── data_processors/ # Reusable preprocessing modules
├── dbhandler/ # PostgreSQL and S3 handler classes
├── entity/ # Dataclass config and artifact entities
├── exception/ # Central error handling
├── inference/ # Inference model wrapper
├── logging/ # Centralized log setup (local/S3)
├── pipeline/ # Training and prediction pipeline runners
├── utils/ # File I/O and transformation helpers
└── worker/ # Celery task runner
PostgreSQL → Ingestion → Validation → Transformation → Training → Evaluation → Inference Model
The system uses declarative YAML for all configuration and parameter tuning.
Config Files:
config.yaml
: Paths, filenames, directoriesparams.yaml
: ML params, splits, methods, tuning spacesschema.yaml
: Data schema + validation constraintstemplates.yaml
: Structure of report templates
Secrets (.env):
# PostgreSQL
PG_USER=
PG_PASSWORD=
PG_HOST=
PG_PORT=5432
PG_DB=student_performance_db
# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
# MLflow
MLFLOW_TRACKING_URI=
MLFLOW_TRACKING_USERNAME=
MLFLOW_TRACKING_PASSWORD=
uvicorn app:app --reload
docker compose up --build
- Experiment:
StudentPerformanceExperiment
- Registry:
StudentPerformanceModel
- Metrics:
neg_root_mean_squared_error
,r2
,mae
,adjusted_r2
, etc.
mlflow ui
POST /predict
→ Accepts input array or CSV for inferencePOST /train
→ Triggers background model training via Celery
This project is licensed under GPLv3.
Gokul Krishna N V
Machine Learning Engineer — UK 🇬🇧
GitHub • LinkedIn