Algorithmic Complexity & Perceived Musical Interest

Do listeners prefer the comfort of familiar patterns or the novelty of the unexpected?

This project investigates the relationship between the structural complexity of music and its popularity. By applying LZMA compression on MIDI files as a proxy for Kolmogorov complexity, we analyzed thousands of tracks across various genres to determine if "simple" music is inherently more successful commercially.

Project Goals

Quantify Complexity: Use compression ratios and "unexpectedness" scores to measure the structural density of musical files (MIDI).
Correlate with Popularity: Link these metrics with Spotify popularity scores.
Genre Analysis: Map the landscape of musical genres based on their algorithmic entropy.

Key Findings

Simplicity Bias: We observed a negative correlation between complexity and popularity. Mass audiences generally favor higher predictability and repetition.
The "Goldilocks Zone": Popularity peaks at low-to-medium complexity and drops conceptually as music becomes too entropic (random/chaotic).
Genre Clustering:
- High Popularity / Low Complexity: Hip-hop, Metal, Punk (characterized by repetitive loops or rhythmic patterns).
- Low Popularity / High Complexity: Jazz, Classical (characterized by variation and improvisation).
MIDI Limitation: The study highlights that symbolic data (MIDI) misses key information sources like vocals and timbre, which explains why lyrically complex genres like Hip-hop appear algorithmically "simple" in this analysis.

Setup & Installation

To reproduce the dataset and run the analysis, follow these steps:

1. Kaggle API Setup

You need the Kaggle API to download the Lakh MIDI dataset.

Install the Kaggle client:
```
pip install kaggle
```
Create an API token by visiting: https://www.kaggle.com/settings/
Place the downloaded kaggle.json file in your configuration directory:
```
mv kaggle.json ~/.kaggle/
```
Verify the installation:
```
kaggle datasets list
```

2. Dataset Generation

Run the build script to download and extract the MIDI files into a data/ folder:

source build_dataset.sh

3. (Optional) Rebuild Metadata

If you wish to rebuild the dataset metadata (popularity scores, genres) from scratch:

Obtain a Client ID and Client Secret from Spotify: https://developer.spotify.com/documentation/web-api
Use these credentials in the preprocessing/build_csv.py script.

Usage

You can explore the analysis through the provided Jupyter Notebooks.

Performance Note: All notebooks are optimized. Dataframes for visualizations are pre-saved, so executing the notebooks to generate plots is fast and does not require re-running the heavy compression algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Part 1 : Compression Ratio		Part 1 : Compression Ratio
Part 2 : NCD		Part 2 : NCD
Part 3 : Complexity Metric Validation		Part 3 : Complexity Metric Validation
Part 4 : Unexpectedness		Part 4 : Unexpectedness
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
build_dataset.sh		build_dataset.sh
final_dataset.csv		final_dataset.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Algorithmic Complexity & Perceived Musical Interest

Project Goals

Key Findings

Setup & Installation

1. Kaggle API Setup

2. Dataset Generation

3. (Optional) Rebuild Metadata

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TristanDonze/music-complexity-analysis

Folders and files

Latest commit

History

Repository files navigation

Algorithmic Complexity & Perceived Musical Interest

Project Goals

Key Findings

Setup & Installation

1. Kaggle API Setup

2. Dataset Generation

3. (Optional) Rebuild Metadata

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages