Modifications to the Histomorphological Phenotype Learning pipeline. This pipeline generates histomorphological phenotype clusters (HPCs) from tiled H&E images via unsupervised learning.
Original HPL paper by Quiros et al. is here: https://www.nature.com/articles/s41467-024-48666-7.
Yumi Briones - yb2612@nyu.edu, Yumi.Briones@nyulangone.org
Jennifer Motter - mottej02@nyu.edu, Jennifer.Motter@nyulangone.org
Alyssa Pradhan - amp10295@nyu.edu, Alyssa.Pradhan@nyulangone.org
VICReg
- README documentation and files to perform HPL-VICReg and HPL-BarlowTwinsViT
- README documentation and files to perform HPL-ViTCLIP
- README documentation and files to perform HPL-CLIP and HPL-CONCH
All data are from https://github.com/AdalbertoCq/Histomorphological-Phenotype-Learning.
- For initial training, we used a 250k subsample of LUAD and LUSC samples: LUAD & LUSC 250K subsample
- For complete train, validation, and test sets, we used: LUAD & LUSC datasets
- To get original HPL tile embeddings, we used: LUAD & LUSC tile vector representations
- To get the original HPL-HPC assignments, we used: LUAD vs LUSC type classification and HPC assignments
Point person: Jennifer Motter
Original VICReg paper: https://arxiv.org/pdf/2105.04906
Details: https://github.com/yumibriones/HPL-Modified/blob/main/VICReg/README.md
We changed the self-supervised learning (SSL) method of HPL from Barlow Twins to Variance-Invariance-Covariance Regularization (VICReg).
Point person: Alyssa Pradhan
Original ViT paper: https://arxiv.org/pdf/2010.11929
Details: https://github.com/yumibriones/HPL-Modified/blob/main/ViT/README.md
We replaced the convolutional neural network (CNN) backbone of HPL to a vision transformer (ViT).
Point person: Yumi Briones
Original CLIP paper: https://arxiv.org/pdf/2103.00020
Details: https://github.com/yumibriones/HPL-Modified/blob/main/CLIP/README.md
To enable multimodal learning, we integrated Contrastive Language-Image Pre-Training (CLIP) by OpenAI (open_clip implementation) into the HPL pipeline.
As a bonus, we generated and clustered image embeddings from CONtrastive learning from Captions for Histopathology (CONCH) by the Mahmood Lab (https://github.com/mahmoodlab/CONCH). This is a CLIP-style model that has been trained on over a million histopathology image-caption pairs. A caveat is that pathological information is included in the captions, so clusters generated from this method will not be completely unsupervised.
We redid UMAP and Leiden clustering on the original HPL embeddings. We repeated this analysis for all modifications of HPL (i.e., HPL-CLIP, HPL-CONCH, HPL-VICReg, HPL-ViT). Results can be found here: HPL-Modified Results.
Briefly, this is how results were generated:
- Extract embeddings from the original HPL pipeline using extract_embeddings_hpl.ipynb.
- Extract embeddings from VICReg, ViT, CLIP with the scripts in each folder.
- Run UMAP/Leiden clustering on embeddings using run_umap_leiden.py.*
- Plot UMAP with clustering results/clinical features overlaid on top using plot_umap.py.*
*If submitting as a batch job on HPC, use corresponding scripts in each folder: VICReg, ViT, CLIP. Make sure to adjust filepaths accordingly.
We evaluated our models in terms of (1) similarity of clusters to the original HPL pipeline, and (2) how well the clusters separate LUAD from LUSC. Evaluation is done here: evaluation.ipynb.