Skip to content

Pratik3c/sql-data-warehouse-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Data Warehouse and Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! 🚀
This project showcases a complete end-to-end data engineering and analytics solution — from ingesting raw data to deriving powerful business insights.

Designed as a portfolio project, it reflects industry best practices in modern data warehousing, ETL pipelines, dimensional modeling, and business intelligence.


🏗️ Data Architecture: Medallion Framework

The architecture follows the Medallion Architecture with three data layers:

Data Architecture

Layer Description
🥉 Bronze Stores raw data as-ingested from ERP and CRM systems (CSV files into SQL Server).
🥈 Silver Data is cleaned, standardized, and transformed for analytical readiness.
🥇 Gold Business-ready, analytical data modeled in a star schema. Used for BI and reporting.

📖 Project Highlights

This project includes:

Modern Data Architecture — Using Medallion model (Bronze/Silver/Gold)
ETL Pipelines — Custom SQL scripts to extract, clean, transform, and load data
Data Modeling — Fact and Dimension tables for reporting
SQL-based Analytics — Actionable business insights into sales, products, and customers
Dashboard & Reports — Tailored for stakeholders and business teams

💼 Ideal For Roles In:

  • Data Engineer
  • SQL Developer
  • Business Intelligence Analyst
  • ETL Developer
  • Data Architect

🛠️ Tools & Resources

Everything is 100% Free & Open Source:

Tool Purpose
📂 Datasets Raw ERP & CRM data in CSV format
🛢️ SQL Server Express Host your DW locally
🧰 SSMS GUI for managing SQL Server
🧩 Draw.io Diagrams for architecture, data flow, models

🚀 Project Objectives

🛠️ 1. Data Engineering Phase

Build a modern SQL Server-based Data Warehouse.

🔹 Import ERP & CRM CSV files
🔹 Clean and transform data
🔹 Build ETL pipelines using SQL
🔹 Create star schema models (fact + dimension tables)

Focus on latest data only; no historization required.


📊 2. Business Intelligence & Reporting

Uncover key insights through advanced SQL queries:

  • 🔍 Customer Behavior Analysis
  • 📦 Product Performance Trends
  • 💰 Sales & Revenue Insights

These insights drive data-informed decision-making.

📄 For requirements, check: docs/requirements.md


📂 Repository Structure

data-warehouse-project/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details
│   ├── etl.drawio                      # Draw.io file shows all different techniquies and methods of ETL
│   ├── data_architecture.drawio        # Draw.io file shows the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.drawio                # Draw.io file for the data flow diagram
│   ├── data_models.drawio              # Draw.io file for data models (star schema)
│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── silver/                         # Scripts for cleaning and transforming data
│   ├── gold/                           # Scripts for creating analytical models
│
├── tests/                              # Test scripts and quality files
│
├── README.md                           # Project overview and instructions
├── LICENSE                             # License information for the repository
├── .gitignore                          # Files and directories to be ignored by Git
└── requirements.txt                    # Dependencies and requirements for the project

🔐 License

This project is licensed under the MIT License.
You're free to use, modify, and share — just credit appropriately. 🙌


🌟 About Me

👋 Hey there! I'm Pratik Mandalkar, a tech enthusiast passionate about solving real-world problems using Data Engineering, Analytics, and System Design.

💼 Connect with me:

Let’s connect, collaborate, and learn together! 🚀


🧠 “Data is the new oil, but insight is the spark that sets it on fire.”

Releases

No releases published

Packages

No packages published

Languages