Reproducing node2vec results

This small project tried to replicate the results of the original node2vec paper through its reference implementation for two datasets, focusing only on multi-label classification. The repository contains a modified Python node2vec implementation and a multi-label classification experiment similar to the one in the paper.

node2vec is an algorithm that learns continuous feature representations, or embeddings, for nodes in a graph. Briefly, node2vec learns low-dimensional vector representations of nodes by simulating biased random walks on a graph and then applying the Skip-Gram model of the word2vec algorithm to these walks, treating node sequences as sentences.

For the multi-label classification experiments, I used the protein-protein interactions (PPI) and Wikipedia datasets. I downloaded the preprocessed datasets directly from the SNAP website. I could not obtain the BlogCatalog dataset because the link is dead.

The file node2vec.py is based on the reference implementation by Aditya Grover and Jure Leskovec, which can be found on GitHub. It has been updated to Python 3 and fixed some minor issues. More information about node2vec can be found in the original paper and on the official website.

These experiments were developed in July 2025 as part of a computer science doctoral course exam called "Graph Theory and Algorithms" at the University of Milano-Bicocca for the 2024–2025 academic year. A brief report on the experiments and their results is available for download. The project will not be updated after submission, and the code is provided as is.

The git repository is available online on GitLab and GitHub. However, GitHub is only a mirror of GitLab.

Results

I don't know why, but I'm getting worse results than those reported in the paper. Some details about the multi-label classification are missing from the paper, and this is what I achieved:

Dataset	Macro-F1 (Original)	Micro-F1 (Original)	Macro-F1 (Mine)	Micro-F1 (Mine)
PPI	0.1791	0.2100	0.1154	0.1222
Wikipedia	0.1552	0.5600	0.1341	0.4143

I used the same node2vec parameters (d = 128, r = 10, l = 80, k = 10, and a single epoch) as in the paper, as well as the same hyperparameters shown in Table 2 (p = 4 and q = 1 for the PPI network and p = 4 and q = 0.5 for Wikipedia). Note that the Micro-F1 score is missing from Table 3 of the original paper, but it can be obtained from Figure 4.

How to run

Prepare the virtual environment (Python 3 is required):

$ git clone https://gitlab.com/ema-pe/node2vec-results-replication.git
$ cd fluidc
$ python3 -m venv .env
$ source .env/bin/activate
$ pip install -r requirements.txt

Download the datasets from node2vec's website. A Bash script is provided to automatically download them:

$ chmod u+x dataset/download_datasets.sh
$ ./dataset/download_datasets.sh
# There will be "Homo_sapiens.mat" and "POS.mat" in the current directory.

Convert the dataset (a .mat file) to edgelist format using mat2edgelist.py script. WARNING: the graph of PPI dataset is undirected and unweighted, while Wikipedia dataset is undirected and weighted:
```
$ python mat2edgelist.py --input DATASET.mat --output DATASET.edgelist
```
Learn graph embedding using node2vec.py script. The script supports various options, the default ones are the same used in the original paper. The input must be (weighted or not) edgelist representation of the graph. See NetworkX documentation for more information. The produced output as "word2vec C format" using gensim, see its documentation for more information.
```
$ python node2vec.py --input DATASET.edgelist --output DATASET.emb --p 4 --q 0.5 --weighted
```
Run multi-label classification using classifier.py script. By default it divides the dataset in 50% for training and 50% for evaluation:
```
$ python node2vec.py --emb DATASET.emb --mat DATASET.mat --train-size 0.5
```

All scripts support several options, run them with --help or -h to see them all.

Examples

With PPI dataset:

$ ./dataset/download_datasets.sh
$ python mat2edgelist.py --input Homo_sapiens.mat --output Homo_sapiens.edgelist
$ python node2vec.py --input Homo_sapiens.edgelist --output Homo_sapiens.emb --p 4 --q 1
$ python classifier.py --emb Homo_sapiens.emb --mat Homo_sapiens.mat

With Wikipedia dataset:

$ ./dataset/download_datasets.sh
$ python mat2edgelist.py --input POS.mat --output POS.edgelist --weighted
$ python node2vec.py --input POS.edgelist --output POS.emb --p 4 --q 0.5 --weighted
$ python classifier.py --emb POS.emb --mat POS.mat

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classifier.py		classifier.py
download_datasets.sh		download_datasets.sh
mat2edgelist.py		mat2edgelist.py
node2vec.py		node2vec.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducing node2vec results

Results

How to run

Examples

License

About

Uh oh!

Languages

License

ema-pe/node2vec-results-replication

Folders and files

Latest commit

History

Repository files navigation

Reproducing node2vec results

Results

How to run

Examples

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages