-
Notifications
You must be signed in to change notification settings - Fork 195
Add cosmic database download wrapper #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
channels: | ||
- bioconda | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- curl =7 | ||
- python >=3.6.2 | ||
- requests =2.25 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
name: cosmic-db | ||
description: | | ||
Download cosmic databases from https://cancer.sanger.ac.uk/cosmic/download | ||
authors: | ||
- Till Hartmann | ||
input: | ||
- genome build, cosmic release version, dataset and file name | ||
output: | ||
- database |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
envvars: | ||
"COSMIC_EMAIL", | ||
"COSMIC_PW" | ||
|
||
rule cosmic_download: | ||
output: | ||
"resources/{db}" | ||
params: | ||
build="GRCh38", | ||
dataset="cosmic", | ||
version="v92", | ||
Comment on lines
+9
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider making 'build', 'dataset', and 'version' params more flexible. The current implementation hardcodes these values, which might limit the reusability of this rule. Consider making these parameters more flexible by using config values or wildcards. Here's a suggestion for improvement: params:
build=config.get("cosmic_build", "GRCh38"),
dataset=config.get("cosmic_dataset", "cosmic"),
version=config.get("cosmic_version", "v92"), This allows users to override these values in their config file while still providing sensible defaults. |
||
file=lambda wc: wc.db # e.g. "CosmicHGNC.tsv.gz" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add some comment here explaining the meaning of file? And where to get the possible values from? |
||
log: | ||
"logs/cosmic-db/{db}.log" | ||
wrapper: | ||
"master/bio/cosmic-db" |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,69 @@ | ||||||||||||||||||||||||||||
"""Snakemake wrapper for trimming paired-end reads using cutadapt.""" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
Comment on lines
+1
to
+2
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update the docstring to reflect the correct functionality The current docstring mentions trimming paired-end reads using cutadapt, which doesn't match the functionality of this wrapper. It should describe downloading files from the COSMIC database. Apply this diff to correct the docstring: -"""Snakemake wrapper for trimming paired-end reads using cutadapt."""
+"""Snakemake wrapper for downloading files from the COSMIC database.""" 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
__author__ = "Till Hartmann" | ||||||||||||||||||||||||||||
__copyright__ = "Copyright 2021, Till Hartmann" | ||||||||||||||||||||||||||||
__email__ = "till.hartmann@udo.edu" | ||||||||||||||||||||||||||||
__license__ = "MIT" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
import requests | ||||||||||||||||||||||||||||
import os | ||||||||||||||||||||||||||||
from typing import List | ||||||||||||||||||||||||||||
from snakemake.shell import shell | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
email = os.environ["COSMIC_EMAIL"] | ||||||||||||||||||||||||||||
password = os.environ["COSMIC_PW"] | ||||||||||||||||||||||||||||
assert email, "$COSMIC_EMAIL is not set" | ||||||||||||||||||||||||||||
assert password, "$COSMIC_PW is not set" | ||||||||||||||||||||||||||||
Comment on lines
+13
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handle missing environment variables to avoid Accessing environment variables with Apply this diff to improve error handling: -email = os.environ["COSMIC_EMAIL"]
-password = os.environ["COSMIC_PW"]
-assert email, "$COSMIC_EMAIL is not set"
-assert password, "$COSMIC_PW is not set"
+email = os.environ.get("COSMIC_EMAIL")
+password = os.environ.get("COSMIC_PW")
+if not email:
+ raise EnvironmentError("$COSMIC_EMAIL environment variable is not set")
+if not password:
+ raise EnvironmentError("$COSMIC_PW environment variable is not set") 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
COSMIC_URL = "https://cancer.sanger.ac.uk/cosmic/file_download" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
def available_builds() -> List[str]: | ||||||||||||||||||||||||||||
builds = requests.get(COSMIC_URL).json() | ||||||||||||||||||||||||||||
return builds | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
Comment on lines
+21
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add error handling to The Apply this diff to handle potential errors: def available_builds() -> List[str]:
- builds = requests.get(COSMIC_URL).json()
+ response = requests.get(COSMIC_URL)
+ response.raise_for_status()
+ builds = response.json()
return builds 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
def available_datasets(build: str) -> List[str]: | ||||||||||||||||||||||||||||
datasets = requests.get(f"{COSMIC_URL}/{build}").json() | ||||||||||||||||||||||||||||
return [d.rpartition("/")[-1] for d in datasets] | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
Comment on lines
+26
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add error handling to Similar to Apply this diff: def available_datasets(build: str) -> List[str]:
- datasets = requests.get(f"{COSMIC_URL}/{build}").json()
+ response = requests.get(f"{COSMIC_URL}/{build}")
+ response.raise_for_status()
+ datasets = response.json()
return [d.rpartition("/")[-1] for d in datasets] 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
def available_versions(build: str, dataset: str) -> List[str]: | ||||||||||||||||||||||||||||
versions = requests.get(f"{COSMIC_URL}/{build}/{dataset}").json() | ||||||||||||||||||||||||||||
return [v.rpartition("/")[-1] for v in versions] | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
Comment on lines
+31
to
+34
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add error handling to The Apply this diff: def available_versions(build: str, dataset: str) -> List[str]:
- versions = requests.get(f"{COSMIC_URL}/{build}/{dataset}").json()
+ response = requests.get(f"{COSMIC_URL}/{build}/{dataset}")
+ response.raise_for_status()
+ versions = response.json()
return [v.rpartition("/")[-1] for v in versions] 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
def available_files(build: str, dataset: str, version: str) -> List[str]: | ||||||||||||||||||||||||||||
files = requests.get(f"{COSMIC_URL}/{build}/{dataset}/{version}").json() | ||||||||||||||||||||||||||||
return [f.rpartition("/")[-1] for f in files] | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
Comment on lines
+36
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add error handling to Include error handling for HTTP requests in the Apply this diff: def available_files(build: str, dataset: str, version: str) -> List[str]:
- files = requests.get(f"{COSMIC_URL}/{build}/{dataset}/{version}").json()
+ response = requests.get(f"{COSMIC_URL}/{build}/{dataset}/{version}")
+ response.raise_for_status()
+ files = response.json()
return [f.rpartition("/")[-1] for f in files] 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
def download_path(build: str, dataset: str, version: str, file: str) -> str: | ||||||||||||||||||||||||||||
return f"{COSMIC_URL}/{build}/{dataset}/{version}/{file}" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
build = snakemake.params.get("build", "") | ||||||||||||||||||||||||||||
dataset = snakemake.params.get("dataset", "") | ||||||||||||||||||||||||||||
version = snakemake.params.get("version", "") | ||||||||||||||||||||||||||||
file = snakemake.params.get("file", "") | ||||||||||||||||||||||||||||
log = snakemake.log_fmt_shell(stdout=False, stderr=True) | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
builds = available_builds() | ||||||||||||||||||||||||||||
assert build in builds, f"{build} is not available. Choose one of: {builds}." | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
datasets = available_datasets(build) | ||||||||||||||||||||||||||||
assert dataset in datasets, f"{dataset} is not available. Choose one of: {datasets}." | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
versions = available_versions(build, dataset) | ||||||||||||||||||||||||||||
assert version in versions, f"{version} is not available. Choose one of: {versions}." | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
files = available_files(build, dataset, version) | ||||||||||||||||||||||||||||
assert file in files, f"{file} is not available. Choose one of: {files}." | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
download_url = ( | ||||||||||||||||||||||||||||
requests.get(download_path(build, dataset, version, file), auth=(email, password)) | ||||||||||||||||||||||||||||
.json()["url"] | ||||||||||||||||||||||||||||
.strip() | ||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||
Comment on lines
+63
to
+67
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add error handling when retrieving the download URL The code assumes the HTTP request is successful and the response contains a Apply this diff to improve robustness: download_url = (
- requests.get(download_path(build, dataset, version, file), auth=(email, password))
- .json()["url"]
- .strip()
+ response = requests.get(download_path(build, dataset, version, file), auth=(email, password))
+ response.raise_for_status()
+ json_response = response.json()
+ download_url = json_response.get("url", "").strip()
+ if not download_url:
+ raise Exception("Failed to retrieve download URL from the response.")
) 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
shell('curl "{download_url}" -o {snakemake.output[0]} {log}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restructure the input section for clarity and completeness.
The current input description lacks structure and specificity. Consider restructuring it as a YAML list with more detailed information about each parameter, including possible values where applicable.
Here's a suggested improvement based on the information provided in the PR objectives:
Also, consider mentioning the required environment variables
COSMIC_EMAIL
andCOSMIC_PW
in the description or in a separate 'requirements' section.