Skip to content

Commit ac4ac7f

Browse files
Merge branch 'release/v0.8.1'
2 parents 70df031 + 63753a2 commit ac4ac7f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+4575
-1227
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.8.0
2+
current_version = 0.8.1
33
commit = False
44
tag = False
55
allow_dirty = False

.test_durations

Lines changed: 696 additions & 319 deletions
Large diffs are not rendered by default.

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
# Changelog
22

3+
## 0.8.1 - 🆕 🏗 New method and noteboo, Games with exact shapley values, bug fixes and cleanup
4+
5+
### Added
6+
7+
- Implement new method: `EkfacInfluence`
8+
[PR #451](https://github.com/aai-institute/pyDVL/issues/451)
9+
- New notebook to showcase ekfac for LLMs
10+
[PR #483](https://github.com/aai-institute/pyDVL/pull/483)
11+
- Implemented exact games in Castro et al. 2009 and 2017
12+
[PR #341](https://github.com/appliedAI-Initiative/pyDVL/pull/341)
13+
14+
### Fixed
15+
16+
- Bug in using `DaskInfluenceCalcualator` with `TorchnumpyConverter`
17+
for single dimensional arrays [PR #485](https://github.com/aai-institute/pyDVL/pull/485)
18+
- Fix implementations of `to` methods of `TorchInfluenceFunctionModel` implementations
19+
[PR #487](https://github.com/aai-institute/pyDVL/pull/487)
20+
- Fixed bug with checking for converged values in semivalues
21+
[PR #341](https://github.com/appliedAI-Initiative/pyDVL/pull/341)
22+
23+
### Docs
24+
25+
- Add applications of data valuation section, display examples more prominently,
26+
make all sections visible in table of contents, use mkdocs material cards
27+
in the home page [PR #492](https://github.com/aai-institute/pyDVL/pull/492)
28+
329
## 0.8.0 - 🆕 New interfaces, scaling computation, bug fixes and improvements 🎁
430

531
### Added

CONTRIBUTING.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ to make your life easier.
2323

2424
Run the following to set up the pre-commit git hook to run before pushes:
2525

26-
```shell script
26+
```shell
2727
pre-commit install --hook-type pre-push
2828
```
2929

@@ -32,15 +32,15 @@ pre-commit install --hook-type pre-push
3232
We strongly suggest using some form of virtual environment for working with the
3333
library. E.g. with venv:
3434

35-
```shell script
35+
```shell
3636
python -m venv ./venv
3737
. venv/bin/activate # `venv\Scripts\activate` in windows
3838
pip install -r requirements-dev.txt -r requirements-docs.txt
3939
```
4040

4141
With conda:
4242

43-
```shell script
43+
```shell
4444
conda create -n pydvl python=3.8
4545
conda activate pydvl
4646
pip install -r requirements-dev.txt -r requirements-docs.txt
@@ -49,7 +49,7 @@ pip install -r requirements-dev.txt -r requirements-docs.txt
4949
A very convenient way of working with your library during development is to
5050
install it in editable mode into your environment by running
5151

52-
```shell script
52+
```shell
5353
pip install -e .
5454
```
5555

@@ -58,7 +58,7 @@ suite) [pandoc](https://pandoc.org/) is required. Except for OSX, it should be i
5858
automatically as a dependency with `requirements-docs.txt`. Under OSX you can
5959
install pandoc (you'll need at least version 2.11) with:
6060

61-
```shell script
61+
```shell
6262
brew install pandoc
6363
```
6464

@@ -152,11 +152,11 @@ Two important markers are:
152152
To test the notebooks separately, run (see [below](#notebooks) for details):
153153

154154
```shell
155-
tox -e tests -- notebooks/
155+
tox -e notebook-tests
156156
```
157157

158158
To create a package locally, run:
159-
```shell script
159+
```shell
160160
python setup.py sdist bdist_wheel
161161
```
162162

@@ -343,8 +343,12 @@ runs](#skipping-ci-runs)).
343343
3. We split the tests based on their duration into groups and run them in parallel.
344344

345345
For that we use [pytest-split](https://jerry-git.github.io/pytest-split)
346-
to first store the duration of all tests with `pytest --store-durations pytest --slow-tests`
346+
to first store the duration of all tests with
347+
`tox -e tests -- --store-durations --slow-tests`
347348
in a `.test_durations` file.
349+
350+
Alternatively, we case use pytest directly
351+
`pytest --store-durations --slow-tests`.
348352

349353
> **Note** This does not have to be done each time a new test or test case
350354
> is added. For new tests and test cases pytes-split assumes
@@ -359,11 +363,14 @@ runs](#skipping-ci-runs)).
359363
Then we can have as many splits as we want:
360364

361365
```shell
362-
pytest --splits 3 --group 1
363-
pytest --splits 3 --group 2
364-
pytest --splits 3 --group 3
366+
tox -e tests -- --splits 3 --group 1
367+
tox -e tests -- --splits 3 --group 2
368+
tox -e tests -- --splits 3 --group 3
365369
```
366370

371+
Alternatively, we case use pytest directly
372+
`pytest --splits 3 ---group 1`.
373+
367374
Each one of these commands should be run in a separate shell/job
368375
to run the test groups in parallel and decrease the total runtime.
369376

@@ -510,13 +517,13 @@ Then, a new release can be created using the script
510517
`bumpversion` automatically derive the next release version by bumping the patch
511518
part):
512519

513-
```shell script
520+
```shell
514521
build_scripts/release-version.sh 0.1.6
515522
```
516523

517524
To find out how to use the script, pass the `-h` or `--help` flags:
518525

519-
```shell script
526+
```shell
520527
build_scripts/release-version.sh --help
521528
```
522529

@@ -542,7 +549,7 @@ create a new release manually by following these steps:
542549
2. When ready to release: From the develop branch create the release branch and
543550
perform release activities (update changelog, news, ...). For your own
544551
convenience, define an env variable for the release version
545-
```shell script
552+
```shell
546553
export RELEASE_VERSION="vX.Y.Z"
547554
git checkout develop
548555
git branch release/${RELEASE_VERSION} && git checkout release/${RELEASE_VERSION}
@@ -553,7 +560,7 @@ create a new release manually by following these steps:
553560
(the `release` part is ignored but required by bumpversion :rolling_eyes:).
554561
4. Merge the release branch into `master`, tag the merge commit, and push back to the repo.
555562
The CI pipeline publishes the package based on the tagged commit.
556-
```shell script
563+
```shell
557564
git checkout master
558565
git merge --no-ff release/${RELEASE_VERSION}
559566
git tag -a ${RELEASE_VERSION} -m"Release ${RELEASE_VERSION}"
@@ -564,7 +571,7 @@ create a new release manually by following these steps:
564571
always strictly more recent than the last published release version from
565572
`master`.
566573
6. Merge the release branch into `develop`:
567-
```shell script
574+
```shell
568575
git checkout develop
569576
git merge --no-ff release/${RELEASE_VERSION}
570577
git push origin develop

README.md

Lines changed: 9 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,13 @@
77
</p>
88

99
<p align="center" style="text-align:center;">
10-
<a href="https://pypi.org/project/pydvl/">
11-
<img src="https://img.shields.io/pypi/v/pydvl.svg" alt="PyPI">
12-
</a>
13-
<a href="https://pypi.org/project/pydvl/">
14-
<img src="https://img.shields.io/pypi/pyversions/pydvl.svg" alt="Version">
15-
</a>
16-
<a href="https://pydvl.org">
17-
<img src="https://img.shields.io/badge/docs-All%20versions-009485" alt="documentation">
18-
</a>
19-
<a href="https://raw.githubusercontent.com/aai-institute/pyDVL/master/LICENSE">
20-
<img alt="License" src="https://img.shields.io/pypi/l/pydvl">
21-
</a>
22-
<a href="https://github.com/aai-institute/pyDVL/actions/workflows/main.yaml">
23-
<img src="https://github.com/aai-institute/pyDVL/actions/workflows/main.yaml/badge.svg" alt="Build status" >
24-
</a>
25-
<a href="https://codecov.io/gh/aai-institute/pyDVL">
26-
<img src="https://codecov.io/gh/aai-institute/pyDVL/graph/badge.svg?token=VN7DNDE0FV"/>
27-
</a>
28-
<a href="https://zenodo.org/badge/latestdoi/354117916">
29-
<img src="https://zenodo.org/badge/354117916.svg" alt="DOI">
30-
</a>
10+
<a href="https://pypi.org/project/pydvl/"><img src="https://img.shields.io/pypi/v/pydvl.svg" alt="PyPI"></a>
11+
<a href="https://pypi.org/project/pydvl/"><img src="https://img.shields.io/pypi/pyversions/pydvl.svg" alt="Version"></a>
12+
<a href="https://pydvl.org"><img src="https://img.shields.io/badge/docs-All%20versions-009485" alt="documentation"></a>
13+
<a href="https://raw.githubusercontent.com/aai-institute/pyDVL/master/LICENSE"><img alt="License" src="https://img.shields.io/pypi/l/pydvl"></a>
14+
<a href="https://github.com/aai-institute/pyDVL/actions/workflows/main.yaml"><img src="https://github.com/aai-institute/pyDVL/actions/workflows/main.yaml/badge.svg" alt="Build status" ></a>
15+
<a href="https://codecov.io/gh/aai-institute/pyDVL"><img src="https://codecov.io/gh/aai-institute/pyDVL/graph/badge.svg?token=VN7DNDE0FV"/></a>
16+
<a href="https://zenodo.org/badge/latestdoi/354117916"><img src="https://zenodo.org/badge/354117916.svg" alt="DOI"></a>
3117
</p>
3218

3319
**pyDVL** collects algorithms for **Data Valuation** and **Influence Function** computation.
@@ -332,7 +318,8 @@ We currently implement the following papers:
332318
- Schioppa, Andrea, Polina Zablotskaia, David Vilar, and Artem Sokolov.
333319
[Scaling Up Influence Functions](http://arxiv.org/abs/2112.03052).
334320
In Proceedings of the AAAI-22. arXiv, 2021.
335-
321+
- James Martens, Roger Grosse, [Optimizing Neural Networks with Kronecker-factored Approximate Curvature](https://arxiv.org/abs/1503.05671), International Conference on Machine Learning (ICML), 2015.
322+
- George, Thomas, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent, [Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis](https://arxiv.org/abs/1806.03884), Advances in Neural Information Processing Systems 31,2018.
336323

337324
# License
338325

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
import logging
2+
import os
3+
from pathlib import Path
4+
5+
import mkdocs.plugins
6+
7+
logger = logging.getLogger(__name__)
8+
9+
root_dir = Path(__file__).parent.parent
10+
docs_dir = root_dir / "docs"
11+
contributing_file = root_dir / "CONTRIBUTING.md"
12+
target_filepath = docs_dir / contributing_file.name
13+
14+
15+
@mkdocs.plugins.event_priority(100)
16+
def on_pre_build(config):
17+
logger.info("Temporarily copying contributing guide to docs directory")
18+
try:
19+
if os.path.getmtime(contributing_file) <= os.path.getmtime(target_filepath):
20+
logger.info(
21+
f"Contributing guide '{os.fspath(contributing_file)}' hasn't been updated, skipping."
22+
)
23+
return
24+
except FileNotFoundError:
25+
pass
26+
logger.info(
27+
f"Creating symbolic link for '{os.fspath(contributing_file)}' "
28+
f"at '{os.fspath(target_filepath)}'"
29+
)
30+
target_filepath.symlink_to(contributing_file)
31+
32+
logger.info("Finished copying contributing guide to docs directory")
33+
34+
35+
@mkdocs.plugins.event_priority(-100)
36+
def on_shutdown():
37+
logger.info("Removing temporary contributing guide in docs directory")
38+
target_filepath.unlink()

docs/assets/pydvl.bib

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,4 +342,21 @@ @InProceedings{kwon_data_2023
342342
pdf = {https://proceedings.mlr.press/v202/kwon23e/kwon23e.pdf},
343343
url = {https://proceedings.mlr.press/v202/kwon23e.html},
344344
abstract = {Data valuation is a powerful framework for providing statistical insights into which data are beneficial or detrimental to model training. Many Shapley-based data valuation methods have shown promising results in various downstream tasks, however, they are well known to be computationally challenging as it requires training a large number of models. As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. The proposed method is computationally efficient and can scale to millions of data by reusing trained weak learners. Specifically, Data-OOB takes less than $2.25$ hours on a single CPU processor when there are $10^6$ samples to evaluate and the input dimension is $100$. Furthermore, Data-OOB has solid theoretical interpretations in that it identifies the same important data point as the infinitesimal jackknife influence function when two different points are compared. We conduct comprehensive experiments using 12 classification datasets, each with thousands of sample sizes. We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points, highlighting the potential for applying data values in real-world applications.}
345+
}
346+
347+
@article{george2018fast,
348+
title={Fast approximate natural gradient descent in a kronecker factored eigenbasis},
349+
author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal},
350+
journal={Advances in Neural Information Processing Systems},
351+
volume={31},
352+
year={2018}
353+
}
354+
355+
@inproceedings{martens2015optimizing,
356+
title={Optimizing neural networks with kronecker-factored approximate curvature},
357+
author={Martens, James and Grosse, Roger},
358+
booktitle={International conference on machine learning},
359+
pages={2408--2417},
360+
year={2015},
361+
organization={PMLR}
345362
}

docs/css/extra.css

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ a.autorefs-external:hover::after {
6969
.nt-card-image:focus {
7070
filter: invert(32%) sepia(93%) saturate(1535%) hue-rotate(220deg) brightness(102%) contrast(99%);
7171
}
72+
7273
.md-header__button.md-logo {
7374
padding: 0;
7475
}

docs/css/grid-cards.css

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/* Shadow and Hover */
2+
.grid.cards > ul > li {
3+
box-shadow: 0 2px 2px 0 rgb(0 0 0 / 14%), 0 3px 1px -2px rgb(0 0 0 / 20%), 0 1px 5px 0 rgb(0 0 0 / 12%);
4+
5+
&:hover {
6+
transform: scale(1.05);
7+
z-index: 999;
8+
background-color: rgba(0, 0, 0, 0.05);
9+
}
10+
}
11+
12+
[data-md-color-scheme="slate"] {
13+
.grid.cards > ul > li {
14+
box-shadow: 0 2px 2px 0 rgb(4 40 33 / 14%), 0 3px 1px -2px rgb(40 86 94 / 47%), 0 1px 5px 0 rgb(139 252 255 / 64%);
15+
16+
&:hover {
17+
transform: scale(1.05);
18+
z-index: 999;
19+
background-color: rgba(139, 252, 255, 0.05);
20+
}
21+
}
22+
}

docs/css/neoteroi.css

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)