Skip to content

Commit 1c9dea3

Browse files
fgvieiraVito Zanotelli
authored andcommitted
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> Allow for custom URLs (fix issues snakemake#366 and snakemake#2649). ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
1 parent 83e4181 commit 1c9dea3

File tree

13 files changed

+56
-24
lines changed

13 files changed

+56
-24
lines changed

bio/reference/ensembl-annotation/meta.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@ authors:
44
- Johannes Köster
55
output:
66
- Ensemble GTF or GFF3 anotation file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-annotation/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ rule get_annotation_gz:
2525
# branch="plants", # optional: specify branch
2626
log:
2727
"logs/get_annotation.log",
28+
params:
29+
url="http://ftp.ensembl.org/pub",
2830
cache: "omit-software" # save space and time with between workflow caching (see docs)
2931
wrapper:
3032
"master/bio/reference/ensembl-annotation"

bio/reference/ensembl-annotation/wrapper.py

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,8 @@
4848
)
4949

5050

51-
url = "ftp://ftp.ensembl.org/pub/{branch}release-{release}/{out_fmt}/{species}/{species_cap}.{build}.{gtf_release}.{flavor}{suffix}".format(
52-
release=release,
53-
gtf_release=gtf_release,
54-
build=build,
55-
species=species,
56-
out_fmt=out_fmt,
57-
species_cap=species.capitalize(),
58-
suffix=suffix,
59-
flavor=flavor,
60-
branch=branch,
61-
)
51+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
52+
url = f"{url}/{branch}release-{release}/{out_fmt}/{species}/{species.capitalize()}.{build}.{gtf_release}.{flavor}{suffix}"
6253

6354

6455
try:

bio/reference/ensembl-sequence/meta.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@ name: ensembl-sequence
22
description: Download sequences (e.g. genome) from ENSEMBL FTP servers, and store them in a single .fasta file.
33
authors:
44
- Johannes Köster
5+
output:
6+
- fasta file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-sequence/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ rule get_single_chromosome:
2525
# branch="plants", # optional: specify branch
2626
log:
2727
"logs/get_genome.log",
28+
params:
29+
url="http://ftp.ensembl.org/pub",
2830
cache: "omit-software" # save space and time with between workflow caching (see docs)
2931
wrapper:
3032
"master/bio/reference/ensembl-sequence"

bio/reference/ensembl-sequence/wrapper.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,9 @@
5050
"invalid datatype, to select a single chromosome the datatype must be dna"
5151
)
5252

53+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
5354
spec = spec.format(build=build, release=release)
54-
url_prefix = f"ftp://ftp.ensembl.org/pub/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
55+
url_prefix = f"{url}/{branch}release-{release}/fasta/{species}/{datatype}/{species.capitalize()}.{spec}"
5556

5657
success = False
5758
for suffix in suffixes:

bio/reference/ensembl-variation/meta.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@ name: ensembl-variation
22
description: Download known genomic variants from ENSEMBL FTP servers, and store them in a single .vcf.gz file.
33
authors:
44
- Johannes Köster
5+
output:
6+
- VCF file
7+
params:
8+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)

bio/reference/ensembl-variation/test/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ rule get_variation:
1212
type="all", # one of "all", "somatic", "structural_variation"
1313
# chromosome="21", # optionally constrain to chromosome, only supported for homo_sapiens
1414
# branch="plants", # optional: specify branch
15+
params:
16+
url="http://ftp.ensembl.org/pub",
1517
log:
1618
"logs/get_variation.log",
1719
cache: "omit-software" # save space and time with between workflow caching (see docs)

bio/reference/ensembl-variation/wrapper.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,16 +62,12 @@
6262

6363
species_filename = species if release >= 91 else species.capitalize()
6464

65+
url = snakemake.params.get("url", "ftp://ftp.ensembl.org/pub")
6566
urls = [
66-
"ftp://ftp.ensembl.org/pub/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.vcf.gz".format(
67-
release=release,
68-
species=species,
69-
suffix=suffix,
70-
species_filename=species_filename,
71-
branch=branch,
72-
)
67+
f"{url}/{branch}release-{release}/variation/vcf/{species}/{species_filename}{suffix}.vcf.gz"
7368
for suffix in suffixes
7469
]
70+
7571
names = [os.path.basename(url) for url in urls]
7672

7773
try:

bio/vep/cache/meta.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,10 @@ description: Download VEP cache for given species, build and release.
33
url: http://www.ensembl.org/info/docs/tools/vep/index.html
44
authors:
55
- Johannes Köster
6+
output:
7+
- directory to store the VEP cache
8+
params:
9+
- url: URL from where to download cache data (optional; by default is ``ftp://ftp.ensembl.org/pub``)
10+
- species: species to download cache data
11+
- build: build to download cache data
12+
- release: release to download cache data

0 commit comments

Comments
 (0)