Skip to content

Commit 71b781f

Browse files
Merge pull request #22 from SchlossLab/improve-config
Allow custom hyperparameters and a feature importance toggle in config
2 parents 5130e24 + f07c6c2 commit 71b781f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+359
-130
lines changed

.github/workflows/tests.yml

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,17 @@ on:
88

99

1010
jobs:
11-
Formatting:
12-
runs-on: ubuntu-latest
13-
steps:
14-
- uses: actions/checkout@v2
15-
- name: Formatting
16-
uses: github/super-linter@v4
17-
env:
18-
VALIDATE_ALL_CODEBASE: false
19-
DEFAULT_BRANCH: main
20-
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
21-
VALIDATE_SNAKEMAKE_SNAKEFMT: true
11+
# Formatting:
12+
# runs-on: ubuntu-latest
13+
# steps:
14+
# - uses: actions/checkout@v2
15+
# - name: Formatting
16+
# uses: github/super-linter@v4
17+
# env:
18+
# VALIDATE_ALL_CODEBASE: false
19+
# DEFAULT_BRANCH: main
20+
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
21+
# VALIDATE_SNAKEMAKE_SNAKEFMT: true
2222

2323
Linting:
2424
runs-on: ubuntu-latest

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ results/
66
.Rproj.user
77
*KLS*
88
*_test*
9+
.vscode/

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,15 @@ combines the results files, plots performance results
3030
(cross-validation and test AUROCs, hyperparameter AUROCs from cross-validation, and benchmark performance),
3131
and renders a simple [R Markdown report](report.Rmd) as a GitHub-flavored markdown file ([example](report-example.md)).
3232

33+
<!-- Create the rulegraph with workflow/scripts/rulegraph.sh -->
3334
![rulegraph](figures/rulegraph.png)
3435

3536
The DAG shows how calls to `run_ml` can run in parallel if
3637
snakemake is allowed to run more than one job at a time.
3738
If we use 100 seeds and 4 ML methods, snakemake would call `run_ml` 400 times.
3839
Here's a small example DAG if we were to use only 2 seeds and 1 ML method:
3940

41+
<!-- Create the dag with workflow/scripts/dag.sh -->
4042
![dag](figures/dag.png)
4143

4244

config/README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,14 @@
3535

3636
1. Edit the configuration file [`config/default.yml`](/config/default.yml).
3737
- `dataset_csv`: the path to the dataset as a csv file.
38-
- `outcome_colname`: column name of the outcomes for the dataset.
39-
- `ml_methods`: list of machine learning methods to use. Must be supported by mikropml.
38+
- `dataset_csv`: a short name to identify the dataset.
39+
- `outcome_colname`: column name of the outcomes or classes for the dataset.
40+
- `ml_methods`: list of machine learning methods to use. Must be [supported by mikropml or caret](http://www.schlosslab.org/mikropml/articles/introduction.html#the-methods-we-support).
4041
- `kfold`: k number for k-fold cross validation during model training.
41-
- `ncores`: the number of cores to use for preprocessing and for each `mikropml::run_ml()` call. Do not exceed the number of cores you have available.
42-
- `nseeds`: the number of different random seeds to use for training models with `mikropml::run_ml()`.
42+
- `ncores`: the number of cores to use for `preprocess_data()`, `run_ml()`, and `get_feature_importance()`. Do not exceed the number of cores you have available.
43+
- `nseeds`: the number of different random seeds to use for training models with `run_ml()`. This will result in `nseeds` different train/test splits.
44+
- `find_feature_importance`: whether to calculate feature importances with permutation tests (`true` or `false`). If `false`, the plot in the report will be blank.
45+
- `hyperparams`: override the default model hyperparameters set by mikropml for each ML method (optional). Leave this blank if you'd like to use the defaults. You will have to set these if you wqish to use an ML method from caret that we don't officially support.
4346
4447
You can leave these options as-is if you'd like to first make sure the
4548
workflow runs without error on your machine before using your own dataset
@@ -89,7 +92,7 @@
8992
1. View the results in `report.md` ([see example here](report-example.md)).
9093
9194
This example report was created by running the workflow on the Great Lakes HPC
92-
at the University of Michigan with [`config/robust.yml`](config/robust.yml).
95+
at the University of Michigan.
9396
9497
## Out of memory or walltime
9598

config/default.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ outcome_colname: dx
44
ml_methods:
55
- glmnet
66
- rf
7-
- svmRadial
87
kfold: 5
9-
ncores: 4
8+
ncores: 8
109
nseeds: 10
10+
find_feature_importance: true
11+
hyperparams:

config/robust.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,21 @@ ml_methods:
99
kfold: 5
1010
ncores: 36
1111
nseeds: 100
12+
find_feature_importance: false
13+
hyperparams:
14+
- glmnet:
15+
- alpha:
16+
- 0
17+
- lambda:
18+
- 0.0001
19+
- 0.001
20+
- 0.01
21+
- 0.1
22+
- 1
23+
- 10
24+
- rf:
25+
- mtry:
26+
- 42
27+
- 83
28+
- 166
29+

config/test.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,15 @@ ml_methods:
66
kfold: 2
77
ncores: 4
88
nseeds: 2
9+
find_feature_importance: true
10+
hyperparams:
11+
- glmnet:
12+
- alpha:
13+
- 0
14+
- lambda:
15+
- 0.0001
16+
- 0.001
17+
- 0.01
18+
- 0.1
19+
- 1
20+
- 10

figures/benchmarks-example.png

-33.1 KB
Binary file not shown.

figures/dag.png

21.3 KB
Loading

figures/example/benchmarks.png

38.1 KB
Loading

0 commit comments

Comments
 (0)