🎓 StudentPerformance: Modular ML Pipeline for Academic Score Prediction

🚀 A fully modular, production-grade ML pipeline for student performance prediction. Built using FastAPI, DVC, Optuna, MLflow, PostgreSQL, and Docker — following industry-grade MLOps principles.

✅ Features

✅ End-to-end ML pipeline: Ingestion ➜ Validation ➜ Transformation ➜ Training ➜ Evaluation ➜ Prediction
✅ YAML-driven configuration system with full ConfigBox support
✅ Dynamic preprocessing (scaling, encoding, imputation, custom ops)
✅ Optuna hyperparameter tuning with MLflow tracking
✅ S3 + local storage support for all artifacts and logs
✅ PostgreSQL-backed ingestion with dynamic table creation from schema
✅ DVC-integrated dataset management

📂 Project Structure

student_performance/
├── app.py                   # FastAPI app for /predict and /train
├── Dockerfile               # Container build setup
├── docker-compose.yaml      # FastAPI + Redis + Celery stack
├── config/                  # All YAML configs: params, schema, templates
├── data/                    # DVC-tracked raw/validated/transformed data
├── artifacts/               # Timestamped pipeline outputs
├── logs/                    # UTC-based log directory
├── templates/               # HTML Jinja templates (for UI if enabled)
├── requirements.txt         # Dependency list
└── src/student_performance/
    ├── components/          # Pipeline stages
    ├── config/              # Configuration manager
    ├── constants/           # Path constants and global flags
    ├── data_processors/     # Reusable preprocessing modules
    ├── dbhandler/           # PostgreSQL and S3 handler classes
    ├── entity/              # Dataclass config and artifact entities
    ├── exception/           # Central error handling
    ├── inference/           # Inference model wrapper
    ├── logging/             # Centralized log setup (local/S3)
    ├── pipeline/            # Training and prediction pipeline runners
    ├── utils/               # File I/O and transformation helpers
    └── worker/              # Celery task runner

🔁 Pipeline Flow

PostgreSQL → Ingestion → Validation → Transformation → Training → Evaluation → Inference Model

⚙️ Configuration

The system uses declarative YAML for all configuration and parameter tuning.

Config Files:

config.yaml: Paths, filenames, directories
params.yaml: ML params, splits, methods, tuning spaces
schema.yaml: Data schema + validation constraints
templates.yaml: Structure of report templates

Secrets (.env):

# PostgreSQL
PG_USER=
PG_PASSWORD=
PG_HOST=
PG_PORT=5432
PG_DB=student_performance_db

# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=

# MLflow
MLFLOW_TRACKING_URI=
MLFLOW_TRACKING_USERNAME=
MLFLOW_TRACKING_PASSWORD=

🧪 Run Instructions

⚙️ Run FastAPI app locally

uvicorn app:app --reload

🐳 Run with Docker Compose

docker compose up --build

🔬 MLflow Tracking

Experiment: StudentPerformanceExperiment
Registry: StudentPerformanceModel
Metrics: neg_root_mean_squared_error, r2, mae, adjusted_r2, etc.

mlflow ui

📊 FastAPI Endpoints

POST /predict → Accepts input array or CSV for inference
POST /train → Triggers background model training via Celery

📜 License

This project is licensed under GPLv3.

👨‍💻 Author

Gokul Krishna N V
Machine Learning Engineer — UK 🇬🇧
GitHub • LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.dvc		.dvc
.github/workflows		.github/workflows
.vscode		.vscode
artifacts		artifacts
config		config
inference_model		inference_model
logs		logs
notebook		notebook
predictions		predictions
research		research
src/student_performance		src/student_performance
student_data		student_data
templates		templates
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
app.py		app.py
buildspec.yaml		buildspec.yaml
data.dvc		data.dvc
debug.py		debug.py
main.py		main.py
project_structure.txt		project_structure.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎓 StudentPerformance: Modular ML Pipeline for Academic Score Prediction

✅ Features

📂 Project Structure

🔁 Pipeline Flow

⚙️ Configuration

🧪 Run Instructions

⚙️ Run FastAPI app locally

🐳 Run with Docker Compose

🔬 MLflow Tracking

📊 FastAPI Endpoints

📜 License

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

megokul/student_performance

Folders and files

Latest commit

History

Repository files navigation

🎓 StudentPerformance: Modular ML Pipeline for Academic Score Prediction

✅ Features

📂 Project Structure

🔁 Pipeline Flow

⚙️ Configuration

🧪 Run Instructions

⚙️ Run FastAPI app locally

🐳 Run with Docker Compose

🔬 MLflow Tracking

📊 FastAPI Endpoints

📜 License

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages