This small project tried to replicate the results of the original node2vec paper through its reference implementation for two datasets, focusing only on multi-label classification. The repository contains a modified Python node2vec implementation and a multi-label classification experiment similar to the one in the paper.
node2vec is an algorithm that learns continuous feature representations, or embeddings, for nodes in a graph. Briefly, node2vec learns low-dimensional vector representations of nodes by simulating biased random walks on a graph and then applying the Skip-Gram model of the word2vec algorithm to these walks, treating node sequences as sentences.
For the multi-label classification experiments, I used the protein-protein interactions (PPI) and Wikipedia datasets. I downloaded the preprocessed datasets directly from the SNAP website. I could not obtain the BlogCatalog dataset because the link is dead.
The file node2vec.py
is based on the reference implementation by Aditya Grover and Jure Leskovec, which can be found on GitHub. It has been updated to Python 3 and fixed some minor issues. More information about node2vec can be found in the original paper and on the official website.
These experiments were developed in July 2025 as part of a computer science doctoral course exam called "Graph Theory and Algorithms" at the University of Milano-Bicocca for the 2024–2025 academic year. A brief report on the experiments and their results is available for download. The project will not be updated after submission, and the code is provided as is.
The git repository is available online on GitLab and GitHub. However, GitHub is only a mirror of GitLab.
I don't know why, but I'm getting worse results than those reported in the paper. Some details about the multi-label classification are missing from the paper, and this is what I achieved:
Dataset | Macro-F1 (Original) | Micro-F1 (Original) | Macro-F1 (Mine) | Micro-F1 (Mine) |
---|---|---|---|---|
PPI | 0.1791 | 0.2100 | 0.1154 | 0.1222 |
Wikipedia | 0.1552 | 0.5600 | 0.1341 | 0.4143 |
I used the same node2vec parameters (d = 128, r = 10, l = 80, k = 10, and a single epoch) as in the paper, as well as the same hyperparameters shown in Table 2 (p = 4 and q = 1 for the PPI network and p = 4 and q = 0.5 for Wikipedia). Note that the Micro-F1 score is missing from Table 3 of the original paper, but it can be obtained from Figure 4.
-
Prepare the virtual environment (Python 3 is required):
$ git clone https://gitlab.com/ema-pe/node2vec-results-replication.git $ cd fluidc $ python3 -m venv .env $ source .env/bin/activate $ pip install -r requirements.txt
-
Download the datasets from node2vec's website. A Bash script is provided to automatically download them:
$ chmod u+x dataset/download_datasets.sh $ ./dataset/download_datasets.sh # There will be "Homo_sapiens.mat" and "POS.mat" in the current directory.
-
Convert the dataset (a
.mat
file) to edgelist format usingmat2edgelist.py
script. WARNING: the graph of PPI dataset is undirected and unweighted, while Wikipedia dataset is undirected and weighted:$ python mat2edgelist.py --input DATASET.mat --output DATASET.edgelist
-
Learn graph embedding using
node2vec.py
script. The script supports various options, the default ones are the same used in the original paper. The input must be (weighted or not) edgelist representation of the graph. See NetworkX documentation for more information. The produced output as "word2vec C format" using gensim, see its documentation for more information.$ python node2vec.py --input DATASET.edgelist --output DATASET.emb --p 4 --q 0.5 --weighted
-
Run multi-label classification using
classifier.py
script. By default it divides the dataset in 50% for training and 50% for evaluation:$ python node2vec.py --emb DATASET.emb --mat DATASET.mat --train-size 0.5
All scripts support several options, run them with --help
or -h
to see them
all.
With PPI dataset:
$ ./dataset/download_datasets.sh
$ python mat2edgelist.py --input Homo_sapiens.mat --output Homo_sapiens.edgelist
$ python node2vec.py --input Homo_sapiens.edgelist --output Homo_sapiens.emb --p 4 --q 1
$ python classifier.py --emb Homo_sapiens.emb --mat Homo_sapiens.mat
With Wikipedia dataset:
$ ./dataset/download_datasets.sh
$ python mat2edgelist.py --input POS.mat --output POS.edgelist --weighted
$ python node2vec.py --input POS.edgelist --output POS.emb --p 4 --q 0.5 --weighted
$ python classifier.py --emb POS.emb --mat POS.mat
Copyright (c) 2016 Aditya Grover with MIT License for node2vec.py
file.
Copyright (c) 2025 Emanuele Petriglia. This project is licensed under the MIT License. See the LICENSE file for details.