lorenzotomada
diff --git a/‎.gitignore
Lines changed: 3 additions & 0 deletions b/‎.gitignore
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 42 additions & 8 deletions b/‎README.md
Lines changed: 42 additions & 8 deletions
diff --git a/‎experiments/config.yaml
Lines changed: 4 additions & 0 deletions b/‎experiments/config.yaml
Lines changed: 4 additions & 0 deletions
diff --git a/‎experiments/config_accuracy.yaml
Lines changed: 0 additions & 2 deletions b/‎experiments/config_accuracy.yaml
Lines changed: 0 additions & 2 deletions
diff --git a/‎experiments/config_profiling.yaml
Lines changed: 0 additions & 2 deletions b/‎experiments/config_profiling.yaml
Lines changed: 0 additions & 2 deletions
diff --git a/‎pyproject.toml
Lines changed: 2 additions & 1 deletion b/‎pyproject.toml
Lines changed: 2 additions & 1 deletion
diff --git a/‎scripts/mpi_running.py
Lines changed: 2 additions & 3 deletions b/‎scripts/mpi_running.py
Lines changed: 2 additions & 3 deletions
diff --git a/‎scripts/plot_scalability.py
Lines changed: 0 additions & 44 deletions b/‎scripts/plot_scalability.py
Lines changed: 0 additions & 44 deletions
@@ -196,3 +196,6 @@ skbuild/*
 
 # helper files
 *TMP.py
+
+# files from profiling
+*.lprof
@@ -1,19 +1,53 @@
-Names: Lorenzo Tomada, Gaspare Li Causi
+Names: Gaspare Li Causi, Lorenzo Tomada
 
-email: ltomada@sissa.it, glicausi@sissa.it
+email: glicausi@sissa.it, ltomada@sissa.it
 
+# TODO
+Running example in run.sh
+
+# Introduction
 This repository contains the final project for the course in Development Tools in Scientific Computing.
 
+The goal of this project is to implement an efficient eigenvalue solver.
+This is done following an efficient strategy specialized for symmetric matrices, which is described in detail in the notebook `Documentation.ipynb` in the `docs` folder.
+
+# General details
+The implementation of the solver is done using `mpi4py`. Moreover, the package relies on a `C++` backend that is automatically compiled when running `python -m pip install .`.
+A more detailed discussion on dependencies and on how to install the package is provided at the end of the `README.md` file.
+## Repo structure
+We implemented various GitHub workflows, which include unit testing, documentation generation and code formatting.
+
+1. Unit tests are performed using `pytest`. They are run automatically after each push. There are three test files in the `test` folder, namely `test_eigensolvers.py` (using to test the implementation of the Lanczos method and the QR algorithm), `test_zero_finder.py` (used to ensure correctness of helper functions for the divide et impera algorithm), and `test_utils.py` (to test that some helper functions work as expected).
+2. All the code is commented in detail in terms of docstrings and comments corresponding to the most salient lines of code. The documentation is generated automatially using `sphinx` at each push and deployed to `GitHub` pages.
+3. After each push, the code is automatically formatted using the `black` formatter.
+
+## Where to find important files
+All the important files are in the `src/pyclassify folder`. In the root directory, the only interesting files are the `CMakeLists.txt` and the `setup.py`. Notice that the `setup.py` was added to the `pyproject.toml` file as it made it easier to automatically compile the library during installation and to deal with external dependencies, e.g. `Eigen`.
+
+In the `src/pyclassify` folder, the file `utils.py` contains some helper functions, e.g. the ones need to check that a matrix is of the correct shape.
+The `cxx_utils.cpp` file contains the implementation in `C++` of some functions that are needed in the divide and conquer algorithm (e.g. the implementations of deflation, QR method and secular solvers).
+In addition, the `parallel_tridiag_eigen.py` contains the actual implementation of the divide and conquer method, while `eigenvalues.py` contains the implementation of the Lanczos algorithm.
+The `zero_finder.py` just consists of a first implementation of some of the functions in `cxx_utils.cpp` and has not been removed since it is used in tests to ensure that the `C++` implementation is correct.
+
+## What did we implement?
+In order to solve an eigenvalue problem, we considered multiple strategies.
+1. The most trivial one was to implement the power method in order to be able to compute (at least) the biggest eigenvalue. We then used `numba` to try and optimize it, but in this case just-in-time compilation was not extremely beneficial.The implementation of the power method is contained in `eigenvalues.py`.
+2. Lanczos + QR: this is an approach (tailored to the case of symmetric matrices) to compute *all* the eigenvalues and eigenvectors. Notice that, also in the case of the QR method,`numba` was not very beneficial in terms of speed-up, resulting in a pretty slow methodology. For this reason, we implemented the QR method in `C++` and used `pybind11` to expose it to `Python`. All the code written in `C++` can be found in `cxx_utils.cpp`.
+3. `CuPy` implementation of all of the above: we implemented all the above methodologies using `CuPy` to see whether using GPU could speed up computations. Since this was not the case, we commented all the lines of code involving `CuPy`, so that installation of the package is no longer required and we can use our code also on machines that do not have GPU.
+4. The core of the project is the implementation (as well as a generalization of the simplified case in which $\rho=\$ considered in our reference) of the _divide et implera_ method for the computation of eigenvalues of a symmetric matrix. Some helpers were originally written in `Python` and then translated to `C++` for efficiency reasons: their original implementation is in `zero_finder.py` and is still present in the project for testing purposes. The translated version can be found in `cxx_utils.cpp`. Instead, the implementation of the actual method to compute the eigenvalues starting from a tridiagonal matrix is contained in `parallel_tridiag_eigen.py` and makes use of `mpi4py`.
+
+# Results
+The results of the profiling (runtime vs matrix size, memory consumption, scalability, and so on) are discussed in detail in `Documentation.ipynb`.
+All the scripts in the `scripts` folder are either used for profiling or to provide running examples.
 
-TO DO:
-1) Profile runtime and memory usage, saving the results and plotting
-2) Runtime vs matrix size comparison (follow in detail the instructions on the course repo)
-3) Accuracy vs efficiency
-4) Add missing tests
+# How to run
+We provide an example of running code in the `script` folder.
+In the `shell` folder, we provide a `submit.sbatch` file to run using `SLURM`, as well as a `submit.sh` to run the same experiment locally.
+In particular, these two files perform memory profiling.
 
 # To install using Ulysses:
 ```bash
-source shell/submit.sh
+source shell/load_modules.sh
 ```
 The previous line will load CMake and gcc. Both are needed to compile the project.
 In addition, it will enable the istallation of `mpi4py`.
 
@@ -0,0 +1,4 @@
+dim: 200
+density: 0.2
+n_processes: 2
+plot: true
@@ -18,9 +18,9 @@ authors = [
 dynamic = ["dependencies"]
 
 [tool.scikit-build]
-# Optional: specify the build directory
 build-dir = "build"
 
+
 [tool.setuptools.packages.find]
 where = ["src"]
 exclude = ["scripts", "tests", "shell", "experiments"]
@@ -30,3 +30,4 @@ dependencies = { file = ["requirements.txt"] }
 
 [project.optional-dependencies]
 test = ["pytest"]
+
@@ -1,5 +1,4 @@
 # from pyclassify.parallel_tridiag_eigen import parallel_eigen
-from pyclassify import parallel_tridiag_eigen
 from time import time
 import numpy as np
 from mpi4py import MPI
@@ -10,7 +9,7 @@ def parallel_eig(d, off_d, nprocs):
 
     print("inside parallel_eig")
     comm = MPI.COMM_SELF.Spawn(
-        sys.executable, args=["parallel_tridiag_eigen.py"], maxprocs=nprocs
+        sys.executable, args=["./parallel_tridiag_eigen.py"], maxprocs=nprocs
     )
     print("sending")
     comm.send(d, dest=0, tag=11)
@@ -27,7 +26,7 @@ def parallel_eig(d, off_d, nprocs):
     return eigvals, eigvecs, delta_t
 
 
-n = 1000
+n = 100
 nprocs = 4
 # np.random.seed(42)
 d = np.random.rand(n) * 2