Skip to content

Commit 7eec776

Browse files
2 parents 0021f08 + a0dc2ee commit 7eec776

15 files changed

+70
-9
lines changed

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Snakemake automatically builds a directed acyclic graph (DAG) of jobs to figure
2121
out the dependencies of each of the rules and what order to run them in.
2222
This workflow preprocesses the example dataset, calls `mikropml::run_ml()`
2323
for each seed and ML method set in the config file,
24-
combines the results files, plots performance results,
24+
combines the results files, plots performance results
25+
(cross-validation and test AUROCs, hyperparameter AUROCs from cross-validation, and benchmark performance),
2526
and renders a simple [R Markdown report](report.Rmd) as a GitHub-flavored markdown file ([example](report-example.md)).
2627

2728
![rulegraph](figures/rulegraph.png)
@@ -117,6 +118,10 @@ Here's a small example DAG if we were to use only 2 seeds and 2 ML methods:
117118
This example report was created by running the workflow on the Great Lakes HPC
118119
at the University of Michigan with [`config/config_robust.yml`](config/config_robust.yml).
119120
121+
## Out of memory or walltime
122+
123+
If any of your jobs fail because it ran out of memory, you can increase the memory for the given rule in the [`config/cluster.json`](config/cluster.json) file. For example, if the `combine_hp_performance` rule fails, you can increase the memory from 16GB to, say, 24GB. You can also change other slurm parameters from the defaults in this file (e.g. walltime, number of cores, etc.).
124+
120125
## More resources
121126
122127
- [mikropml docs](http://www.schlosslab.org/mikropml/)

Snakefile

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ configfile: 'config/config.yml'
33
ncores = config['ncores']
44
ml_methods = config['ml_methods']
55
kfold = config['kfold']
6+
outcome_colname = config['outcome_colname']
67

78
nseeds = config['nseeds']
89
start_seed = 100
@@ -22,6 +23,8 @@ rule preprocess_data:
2223
"log/preprocess_data.txt"
2324
benchmark:
2425
"benchmarks/preprocess_data.txt"
26+
params:
27+
outcome_colname=outcome_colname
2528
resources:
2629
ncores=ncores
2730
script:
@@ -40,7 +43,7 @@ rule run_ml:
4043
benchmark:
4144
"benchmarks/runs/run_ml.{method}_{seed}.txt"
4245
params:
43-
outcome_colname=config['outcome_colname'],
46+
outcome_colname=outcome_colname,
4447
method="{method}",
4548
seed="{seed}",
4649
kfold=kfold
@@ -62,6 +65,19 @@ rule combine_results:
6265
script:
6366
"code/combine_results.R"
6467

68+
rule combine_hp_performance:
69+
input:
70+
R='code/combine_hp_perf.R',
71+
rds=expand('results/runs/{{method}}_{seed}_model.Rds', seed=seeds)
72+
output:
73+
rds='results/hp_performance_results_{method}.Rds'
74+
log:
75+
"log/combine_hp_perf_{method}.txt"
76+
benchmark:
77+
"benchmarks/combine_hp_perf_{method}.txt"
78+
script:
79+
"code/combine_hp_perf.R"
80+
6581
rule combine_benchmarks:
6682
input:
6783
R='code/combine_benchmarks.R',
@@ -84,6 +100,17 @@ rule plot_performance:
84100
script:
85101
"code/plot_perf.R"
86102

103+
rule plot_hp_performance:
104+
input:
105+
R='code/plot_hp_perf.R',
106+
rds=rules.combine_hp_performance.output.rds,
107+
output:
108+
plot='figures/hp_performance_{method}.png'
109+
log:
110+
'log/plot_hp_perf_{method}.txt'
111+
script:
112+
'code/plot_hp_perf.R'
113+
87114
rule plot_benchmarks:
88115
input:
89116
R='code/plot_benchmarks.R',
@@ -100,6 +127,7 @@ rule render_report:
100127
Rmd='report.Rmd',
101128
R='code/render.R',
102129
perf_plot=rules.plot_performance.output.plot,
130+
hp_plot=expand(rules.plot_hp_performance.output.plot, method = ml_methods),
103131
bench_plot=rules.plot_benchmarks.output.plot
104132
output:
105133
doc='report.md'

code/combine_hp_perf.R

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
source("code/log_smk.R")
2+
3+
models <- lapply(snakemake@input[["rds"]], function(x) readRDS(x))
4+
hp_perf <- mikropml::combine_hp_performance(models)
5+
hp_perf$method <- snakemake@wildcards[["method"]]
6+
saveRDS(hp_perf, file = snakemake@output[["rds"]])

code/plot_hp_perf.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
source("code/log_smk.R")
2+
3+
hp_perf <- readRDS(snakemake@input[["rds"]])
4+
hp_plot_list <- lapply(hp_perf$params, function(param){
5+
mikropml::plot_hp_performance(hp_perf$dat, !!rlang::sym(param), !!rlang::sym(hp_perf$metric)) + ggplot2::theme_classic() + ggplot2::scale_color_brewer(palette = "Dark2") + ggplot2::labs(title=unique(hp_perf$method))
6+
})
7+
hp_plot <- cowplot::plot_grid(plotlist = hp_plot_list)
8+
ggplot2::ggsave(snakemake@output[["plot"]])

code/preproc.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ doFuture::registerDoFuture()
55
future::plan(future::multicore, workers = snakemake@resources[["ncores"]])
66

77
data_raw <- readr::read_csv(snakemake@input[["csv"]])
8-
data_processed <- preprocess_data(data_raw, outcome_colname = "dx")
8+
data_processed <- preprocess_data(data_raw, outcome_colname = snakemake@params[['outcome_colname']])
99

1010
saveRDS(data_processed, file = snakemake@output[["rds"]])

config/cluster.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,8 @@
1818
"run_ml": {
1919
"procs": "{resources.ncores}",
2020
"pmem": "4GB"
21+
},
22+
"combine_hp_performance": {
23+
"pmem": "16GB"
2124
}
2225
}

config/environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ channels:
66
- r
77
dependencies:
88
- r-base=4
9+
- r-cowplot
910
- r-doFuture
1011
- r-foreach
1112
- r-future
File renamed without changes.
30.5 KB
Loading

figures/hp_performance_rf-example.png

29.5 KB
Loading

0 commit comments

Comments
 (0)