conda env create --file=environment.yml
The data used in this study is available in our Google drive.
If you want to use the latest dataset, download the drug response data in IC50 format called PANCANCER_IC from GDSC. And download the gene expression data called CCLE_expression from CCLE under mRNA expression.
- Create a folder in your project directory called
root_folder
.
mkdir root_folder
- Place the PANCANCER_IC data under folder
data/GDSC
and place the CCLE_expression data under folderdata/CCLE
. Choose a<branch_num>
as you like and run the following command to preprocess the data. The data will be saved underroot_folder/<branch_num>
.
python load_data.py <branch_num>
python train.py \
--model <model_num>
--branch <branch_num>
--do_cv
--do_attn
- Available models: 0:GCN, 1:GAT, 2:GAT_Edge, 3:GATv2, 4:SAGE, 5:GIN, 6:GINE, 7:WIRGAT, 8:ARGAT, 9:RGCN, 10:FiLM
Instead of training the models from scratch, you can use the pretrained models under models/
. Place them under root_folder/<branch_num>
where you stored the processed data in [Preprocess the data](###Preprocess the data)
python gnnexplainer.py \
--model <model_num>
--branch <branch_num>
--do_attn
--explain_type <type>
python draw_gnnexplainer.py \
--model <model_num>
--branch <branch_num>
--explain_type <type>
--annotation <type>
- Available explaining types: 0:model, 1:phenomenon
- Available annotation types: 0:numbers, 1:heatmap, 2:both, 3:functional group-level heatmap
- Numbers: Only the saliency scores will be visualized
- Heatmap: The atom-level heatmap to show saliency levels
- Both: Both numbers and heatmaps will be displayed
- Functional group-level heatmap (Recommended): The saliency scores are accumulated for each functional groups rather than atoms. Both numbers and heatmaps will be displayed with this mode.
python integrated_gradients.py \
--model <model_num>
--branch <branch_num>
--do_attn
--iqr_baseline
- Download the gene sets from MSigDB and place them under
data/
. - Refer to
pathway_analysis.ipynb
for the pathway analysis experiments based on the gene saliency scores.