XGDP

Environment

conda env create --file=environment.yml

Data Preparation

Download the raw data

The data used in this study is available in our Google drive.

If you want to use the latest dataset, download the drug response data in IC50 format called PANCANCER_IC from GDSC. And download the gene expression data called CCLE_expression from CCLE under mRNA expression.

Preprocess the data

Create a folder in your project directory called root_folder.

mkdir root_folder

Place the PANCANCER_IC data under folder data/GDSC and place the CCLE_expression data under folder data/CCLE. Choose a <branch_num> as you like and run the following command to preprocess the data. The data will be saved under root_folder/<branch_num>.

python load_data.py <branch_num>

Train the model

python train.py \
        --model <model_num>
        --branch <branch_num>
        --do_cv
        --do_attn

Available models: 0:GCN, 1:GAT, 2:GAT_Edge, 3:GATv2, 4:SAGE, 5:GIN, 6:GINE, 7:WIRGAT, 8:ARGAT, 9:RGCN, 10:FiLM

Explain the model

Instead of training the models from scratch, you can use the pretrained models under models/. Place them under root_folder/<branch_num> where you stored the processed data in [Preprocess the data](###Preprocess the data)

Attribute the chemical structures with GNNExplainer

python gnnexplainer.py \
        --model <model_num>
        --branch <branch_num>
        --do_attn
        --explain_type <type>
python draw_gnnexplainer.py \
        --model <model_num>
        --branch <branch_num>
        --explain_type <type>
        --annotation <type>

Available explaining types: 0:model, 1:phenomenon
Available annotation types: 0:numbers, 1:heatmap, 2:both, 3:functional group-level heatmap
- Numbers: Only the saliency scores will be visualized
- Heatmap: The atom-level heatmap to show saliency levels
- Both: Both numbers and heatmaps will be displayed
- Functional group-level heatmap (Recommended): The saliency scores are accumulated for each functional groups rather than atoms. Both numbers and heatmaps will be displayed with this mode.

Attribute the gene expression values with Integrated Gradients

python integrated_gradients.py \
        --model <model_num>
        --branch <branch_num>
        --do_attn
        --iqr_baseline

Pathway Analysis

Download the gene sets from MSigDB and place them under data/.
Refer to pathway_analysis.ipynb for the pathway analysis experiments based on the gene saliency scores.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
data		data
models		models
rdkit_heatmaps		rdkit_heatmaps
.gitattributes		.gitattributes
.gitignore		.gitignore
Biomarker.ipynb		Biomarker.ipynb
CCLE_data.ipynb		CCLE_data.ipynb
DIG.ipynb		DIG.ipynb
FYP_15G2.ipynb		FYP_15G2.ipynb
README.md		README.md
data.ipynb		data.ipynb
dataframes.py		dataframes.py
draw_attn.py		draw_attn.py
draw_bonds_saliency.py		draw_bonds_saliency.py
draw_gene_saliency.py		draw_gene_saliency.py
draw_gnnexplainer.py		draw_gnnexplainer.py
draw_gnnexplainer_bonds.py		draw_gnnexplainer_bonds.py
draw_mol_functional_group.ipynb		draw_mol_functional_group.ipynb
environment.yml		environment.yml
explore_attn_weights.ipynb		explore_attn_weights.ipynb
explore_results.ipynb		explore_results.ipynb
explore_saliency.ipynb		explore_saliency.ipynb
explore_saliency_GNNExplainer.ipynb		explore_saliency_GNNExplainer.ipynb
explore_saliency_IG.ipynb		explore_saliency_IG.ipynb
gnnexplainer.ipynb		gnnexplainer.ipynb
gnnexplainer.py		gnnexplainer.py
integratedGradients.ipynb		integratedGradients.ipynb
integratedGradients.py		integratedGradients.py
load_data.py		load_data.py
load_data_tcnn.py		load_data_tcnn.py
models.py		models.py
packages.py		packages.py
pathway_analysis.ipynb		pathway_analysis.ipynb
preprocess.py		preprocess.py
t-cnn.ipynb		t-cnn.ipynb
train.py		train.py
train_tcnn.py		train_tcnn.py
utils_data.py		utils_data.py
utils_decoding.py		utils_decoding.py
utils_preproc.py		utils_preproc.py
utils_tcnn.py		utils_tcnn.py
utils_train.py		utils_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

XGDP

Environment

Data Preparation

Download the raw data

Preprocess the data

Train the model

Explain the model

Attribute the chemical structures with GNNExplainer

Attribute the gene expression values with Integrated Gradients

Pathway Analysis

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SCSE-Biomedical-Computing-Group/XGDP

Folders and files

Latest commit

History

Repository files navigation

XGDP

Environment

Data Preparation

Download the raw data

Preprocess the data

Train the model

Explain the model

Attribute the chemical structures with GNNExplainer

Attribute the gene expression values with Integrated Gradients

Pathway Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages