Music genre classification using GTZAN dataset, a dataset containing 1000 labeled audio files of 10 different genres.
The model takes as input the Mel Spectrogram of the song and analyzes the image using a Convolutional Neural Network (CNN). The output is a vector containing the score associated to each genre.
The dataset contains 1000 labeled audio files (belonging to famous songs) all having a length of 30 seconds of 10 different genres. The genres are:
- Blues
- Classical
- Country
- Disco
- HipHop
- Jazz
- Metal
- Pop
- Raggae
- Rock
The dataset can be downloaded here. Place the content in data/raw/
folder. For this project only the folder genres_original
will be used. It contains the audio files in .wav format.
Place in the current directory. Clone the current conda environment or install package directly from requirements, then, install the project package.
The conda environment has been exported using the command
conda env export > environment.yml
Type the following instruction to clone the environment
conda env create -f environment.yml
The requirements has been generated using (check this):
pip list --format=freeze > requirements.txt
Install package requirements using pip
pip install -r requirements.txt
or conda
conda create --name <env_name> --file requirements.txt
The project package can be install using
pip install -e .
The project folder genre_classification is installed as a python package and it can be now referenced anywhere.
Once an experiment is executed and a trained model is ready in the experiments folder (follow the guide in genre_classification), the API service can be used. The API service is used to make inference on a new and unlabeled wav file located in /data/external
folder.
Start the server by running
uvicorn app:app --reload
Run your browser to test the service API.
Go to the address
http://127.0.0.1:8000
or
http://127.0.0.1:8000/docs
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│ └── external <- External data to use for inference. Wav files without label
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ ├── 1.0-exploration.ipynb
│ ├── 2.0-google-colab.ipynb
│ └── 3.0-visualize_embeddings.ipynb
│
├── requirements.txt
│
├── setup.py <- makes project pip installable so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py <- Generate the final dataset used for modelling
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── config.py
│ │ ├── cnn.py
│ │ ├── dataset.py
│ │ ├── train.py
│ │ ├── evaluate.py
│ │ ├── embeddings.py
│ │ └── inference.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── app.py <- FastAPI service
Project based on the cookiecutter data science project template. #cookiecutterdatascience