Skip to content
This repository was archived by the owner on Jul 29, 2023. It is now read-only.

Commit 6fc9e14

Browse files
microDL config file documentation (#168)
* Commented microDL config files Doc directory added with commented config files explaining the parameters used in microDL workflow * config documentation update based on review The microDL config file documentation has been updated based on suggestions from @jennyfolkesson , @Christianfoley and @JohannaRahm. * Updated readme and config documentation Readme files for microDL and specific to preprocessing, train, inference modules and config file documentation are updated based on review. * Update preprocessing readme Description of details available in json updated. * Checking error in yaml files Spaces before comments were removed to eliminate error from yaml files. * added citation file * updated documentation and figures for clarity * Formatted inference readme files The readme files were linted with markdownlint * Updated preprocessing config using 2D U-Net Depth of tiles is specifically defined for 2D U-Net * Config files for microDL 2.5D U-Net model Config files are tailored to predict cell membrane using 3D image input and training a 2.5D U-Net model. * Match preprocessing to training config Preprocessing channels changed to match the channels mentioned in training and inference config files to avoid confusion. * moved all config files to the same folder. * update the paths of configs in the notebooks * Config file links attached Attached the links to config files forpreprocessing, training and inference for notebook, 2D model and 2.5D model. * Check for failed build Reformatting links on readme to check for failed build. * Added changes based on review Clarified changes added to the documentation based on input from @JohannaRahm and @jennyfolkesson . Co-authored-by: Shalin Mehta <2934183+mattersoflight@users.noreply.github.com>
1 parent 151cc25 commit 6fc9e14

32 files changed

+919
-208
lines changed

README.md

Lines changed: 72 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,66 @@
1+
# microDL
2+
3+
## robust and efficient virtual staining of label-free microscopy data
4+
15
![Build Status](https://github.com/czbiohub/microDL/workflows/build/badge.svg)
26
[![Code Coverage](https://codecov.io/gh/czbiohub/microDL/branch/master/graphs/badge.svg)](https://codecov.io/gh/czbiohub/microDL)
37

4-
# microDL
8+
microDL is a deep learning pipeline for efficient 2D and 3D image translation. We commonly use it to virtually stain label-free images, i.e., to predict fluorescence-like images. Label-free imaging visualizes many structures simultaneously. Virtual staining enables identification of diverse structures without extensive human annotation - the annotations are provided by the molecular markers of the structure. This pipeline was originally developed for 3D virutal staining of tissue and cell structures from label-free images of their density and anisotropy: <https://doi.org/10.7554/eLife.55502>. We are currently extending it to enable generalizable virtual staining of nuclei and membrane in diverse imaging conditions and across multiple cell types. We provide a computationally and memory efficient variant of U-Net (2.5D U-Net) for 3D virtual staining.
9+
10+
You can train a microDL model using label-free images and corresponding fluorescence channels you want to predict. Once the model is trained using the dataset provided you can use the model to predict the same fluorescence channels or segmneted masks in other datasets using the label-free images.
11+
12+
In the example below, phase images and corresponding nuclear and membrane stained images are used to train a 2.5D U-Net model.
13+
The model can be used to predict the nuclear and membrane channels using label-free phase images.
14+
15+
<p align="center">
16+
<img width="500" src="./figures/virtual_staining.png">
17+
<p/>
18+
19+
<p align="center">
20+
<img width="200" src="./figures/nuc_mem.png">
21+
<p/>
22+
23+
microDL allows you to design, train and evaluate U-Net models. It supports 2D U-Nets for 2D image translation and 2.5D (3D encoder, 2D decoder) U-Nets for 3D image translation.
524

6-
microDL allows you to design, train and evaluate U-Net models using just a few YAML config files. It supports 2D, 2.5D (3D encoder, 2D decoder) and 3D U-Nets, as well as 3D networks with anistropic filters. It also supports networks with an encoder plus dense layers for image to vector or image to scalar models. Our hope is that microDL will provide easy to use CLIs for segmentation, regression and classification tasks of microscopy images.
25+
Our goal is to enable robust translation of images across diverse microscopy methods.
726

8-
microDL consists of three modules:
27+
microDL consists of three modules that are accessible via CLI and customized via a configuration file in YAML format:
928

10-
* Preprocessing: normalization, flatfield correction, masking, tiling
11-
* Training: model creation, loss functions (w/wo masks), metrics, learning rates
12-
* Inference: on full images or on tiles that can be stitched to full images
29+
* [Preprocessing](micro_dl/preprocessing/readme.md): normalization, flatfield correction, masking, tiling
30+
* [Training](micro_dl/train/readme.md): model creation, loss functions (w/wo masks), metrics, learning rates
31+
* [Inference](micro_dl/inference/readme.md): on full images or on tiles that can be stitched to full images
32+
33+
Note: microDL also supports 3D U-Nets and image segmentation, but we don't use these features frequently and they are the least tested.
1334

1435
## Getting Started
1536

16-
Assuming your data is already formatted in a way that microDL understands (see Data Format below), you can run preprocessing, training and inference in three command lines.
17-
For config settings, see module specific readme's in micro_dl/preprocessing, micro_dl/training and micro_dl/inference.
37+
### Introductory exercise from DL@MBL
38+
39+
If you are new to image translation or U-Nets, start with [slides](notebooks/dlmbl2022/20220828_DLMBL_ImageTranslation.pdf) from the didactic lecture from [deep learning @ marine biological laboratory](https://www.mbl.edu/education/advanced-research-training-courses/course-offerings/dlmbl-deep-learning-microscopy-image-analysis).
40+
41+
You can download test data and walk through the exercise by following [these instructions](notebooks/dlmbl2022/README.md).
42+
43+
### Using command line interface (CLI)
44+
45+
Refer to the [requirements](#requirements) section to set up the microDL environment.
46+
47+
Build a [docker](#docker) to set up your microDL environment if the dependencies are not compatible with the hardware environment on your computational facility.
48+
49+
Format your input data to match the microDL [data format](#data-format) requirements.
50+
51+
Once your data is already formatted in a way that microDL understands, you can run preprocessing, training and inference in three command lines.
52+
For config settings, see module specific readme's in [micro_dl/preprocessing](micro_dl/preprocessing/readme.md),
53+
[micro_dl/training](micro_dl/train/readme.md) and
54+
[micro_dl/inference](micro_dl/inference/readme.md).
1855

1956
```buildoutcfg
2057
python micro_dl/cli/preprocessing_script.py --config <preprocessing yaml config file>
2158
```
59+
2260
```buildoutcfg
2361
python micro_dl/cli/train_script.py --config <train config yml> --gpu <int> --gpu_mem_frac <GPU memory fraction>
2462
```
63+
2564
```buildoutcfg
2665
python micro_dl/cli/inference_script.py --config <train config yml> --gpu <int> --gpu_mem_frac <GPU memory fraction>
2766
```
@@ -31,30 +70,41 @@ python micro_dl/cli/inference_script.py --config <train config yml> --gpu <int>
3170
It is recommended that you run microDL inside a Docker container, especially if you're using shared resources like a GPU server. microDL comes with two Docker images, one for Python3.6 with CUDA 9 support (which is most likely what
3271
you'll want), and one for Python3.5 with CUDA 8.0 support. If you're working at the CZ Biohub you should be in the Docker group on our GPU servers Fry/Fry2, if not you can request anyone in the data science team to join. The Python 3.6 image is already built on Fry/Fry2, but if you want to modify it and build your own Docker image/tag somewhere,
3372
you can do so:
73+
3474
```buildoutcfg
3575
docker build -t imaging_docker:gpu_py36_cu90 -f Dockerfile.imaging_docker_py36_cu90 .
3676
```
77+
3778
Now you want to start a Docker container from your image, which is the virtual environment you will run your code in.
79+
3880
```buildoutcfg
3981
nvidia-docker run -it -p <your port>:<exposed port> -v <your dir>:/<dirname inside docker> imaging_docker:gpu_py36_cu90 bash
4082
```
83+
4184
If you look in the Dockerfile, you can see that there are two ports exposed, one is typically used for Jupyter (8888)
4285
and one for Tensorboard (6006). To be able to view these in your browser, you need map the port with the -p argument.
4386
The -v arguments similarly maps directories. You can use multiple -p and -v arguments if you want to map multiple things.
44-
The final 'bash' is to signify that you want to run bash (your usual Unix shell).
87+
The final 'bash' is to signify that you want to run bash (your usual Unix shell).
4588

4689
If you want to launch a Jupyter notebook inside your container, you can do so with the following command:
90+
4791
```buildoutcfg
4892
jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --no-browser
4993
```
94+
5095
Then you can access your notebooks in your browser at:
96+
5197
```buildoutcfg
5298
http://<your server name (e.g. fry)>:<whatever port you mapped to when starting up docker>
5399
```
100+
54101
You will need to copy/paste the token generated in your Docker container.
55102

56103
### Data Format
57104

105+
Input data should be in the format of single page tiff files. If you use zarr files, you can convert
106+
them to single page tiff files using the [zarr to single page tiff conversion script](https://github.com/mehta-lab/microDL/blob/master/scripts/hcszarr2single_tif_mp.py).
107+
58108
To train directly on datasets that have already been split into 2D frames, the dataset
59109
should have the following structure:
60110

@@ -66,39 +116,30 @@ dir_name
66116
|- im_c***_z***_t***_p***.png
67117
|- ...
68118
```
119+
69120
The image naming convention is (parenthesis is their name in frames_meta.csv)
121+
70122
* **c** = channel index (channel_idx)
71123
* **z** = slice index in z stack (slice_idx)
72124
* **t** = timepoint index (time_idx)
73125
* **p** = position (field of view) index (pos_idx)
74126

75-
If you download your dataset from the CZ Biohub imaging database [imagingDB](https://github.com/czbiohub/imagingDB)
76-
you will get your dataset correctly formatted and can directly input that into microDL.
77-
If you don't have your data in the imaging database, write a script that converts your
78-
your data to image files that adhere to the naming convention above, then run
127+
If your data is not in the zarr or tiff format supported by the preprocessing module, write a script that converts your your data to image files that adhere to the naming convention above, then run
79128

80-
```buildoutcfg
129+
```sh
81130
python micro_dl/cli/generate_meta.py --input <directory name>
82131
```
132+
83133
That will generate the frames_meta.csv file you will need for data preprocessing.
84134

135+
Before preprocessing make sure the z stacked images are aligned to be centered at the focal plane at all positions. If the focal plane in image stacks imaged
136+
at different positions in a plate are at different z levels, align them using the [z alignment script](https://github.com/mehta-lab/microDL/blob/master/scripts/align_z_focus.py).
85137

86-
## Requirements
138+
## Failure modes
87139

88-
There is a requirements.txt file we use for continuous integration, and a requirements_docker.txt file we use to build the Docker image. The main packages you'll need are:
89-
90-
* keras
91-
* tensorflow
92-
* cv2
93-
* Cython
94-
* matplotlib
95-
* natsort
96-
* nose
97-
* numpy
98-
* pandas
99-
* pydot
100-
* scikit-image
101-
* scikit-learn
102-
* scipy
103-
* testfixtures (for running tests)
140+
Although deep learning pipelines solve complex computer vision problems with impressive accuracy, they can fail in ways that are not intuitive to human vision. We think that trasparent discussion of failure modes of deep learning pipelines is necessary for the field to continue advancing. These [slides](notebooks/dlmbl2022/20220830_DLMBL_FailureModes.pdf) from DL@MBL 2022 course summarize failure modes of the virtual stanining that we have identified and some ideas for improving its robustness. If microDL fails with your data, please start a discussion via issues on this repositroy.
141+
142+
## Requirements
104143

144+
These are the [required dependencies](requirements.txt) for continuous integration, and a [required dependencies](requirements_docker.txt) to build a docker image.
145+
This version (1.0.0) assumes single-page tiff data format and is built on tensorflow 1.13, keras 2.1.6. The next version will directly read zarr datasets and be re-written using pytorch.

citation.cff

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
cff-version: 1.2.0
2+
title: >-
3+
microDL: robust and efficient virtual staining of label-free microscopy data
4+
message: >-
5+
Please use citation information in this file if you publish results using this pipeline. If you use the pipeine for virtual staining, please also cite our paper:
6+
https://elifesciences.org/articles/55502
7+
type: software
8+
authors:
9+
- given-names: Jenny
10+
family-names: Folkesson
11+
orcid: "https://orcid.org/0000-0002-4673-0522"
12+
- given-names: Syuan-Ming
13+
family-names: Guo
14+
orcid: "https://orcid.org/0000-0003-3680-0387"
15+
- given-names: Anitha Priya
16+
family-names: Krishnan
17+
orcid: "https://orcid.org/0000-0003-0193-2494"
18+
- given-names: Christian
19+
family-names: Foley
20+
orcid: "https://orcid.org/0000-0002-5964-5060"
21+
- given-names: Soorya
22+
family-names: Pradeep
23+
orcid: "https://orcid.org/0000-0002-0926-1480"
24+
- given-names: Johanna
25+
family-names: Rahm
26+
- given-names: Shalin B.
27+
family-names: Mehta
28+
orcid: "https://orcid.org/0000-0002-2542-3582"
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Configuration detailing the staining prediction on label-free images
2+
3+
# define the dataset you want to use the trained model for virtual staining/segmentation,
4+
# the output formats required and predicted image quality metrics
5+
6+
# point to directory where your trained model is saved
7+
model_dir: '/home/Translation_temp_2/'
8+
9+
# directory with images on which you want to perform the prediction, the inference dataset
10+
image_dir: '/home/InferenceData/'
11+
12+
# preprocess_dir contains the preprocessing_info.json file, used to extract information about normalization step
13+
preprocess_dir: '/home/Processed_temp_1'
14+
15+
# define inference dataset channels
16+
dataset:
17+
input_channels: [2] # label-free channel used for prediction by model
18+
target_channels: [0] # target image channel (fluorescence image) to compare how well the prediction worked
19+
pos_ids: [0, 1, 3, 4, 6, 8, 10] # may not effect the positions where interference is performed if data split is defined
20+
slice_ids: [12,13,14] # slices where inference is performed, condition same as above
21+
22+
# define the output image format
23+
images:
24+
image_format: 'zyx' # output predicted image order of dimension
25+
image_ext: '.tif' # output images are stored as single page tiff files
26+
suffix: '25DUnet_membrane' # saved output image name suffix
27+
name_format: sms # 'sms' corresponds to image naming format 'img_channelname_t***_p***_z***_customfield', default naming convention is 'im_c***_z***_t***_p***'
28+
pred_chan_name: 'pred' # suffix added to saved output image name
29+
save_to_image_dir: False # 'False' saves output in model directory, 'True' in input image directory
30+
save_folder_name: predictions # specify the name of directory to be created inside model dir to save output images
31+
data_split: test # define which image set in train/val/test/all split are to be used for the prediction
32+
save_figs: True # do you want to save a figure panel to compare the predicted and target images
33+
34+
# metrics to be computed to define prediction quality, the values will be printed on output figure panel and saved as text files
35+
metrics:
36+
metrics: [ssim, corr, r2, mae, mse] # metrics for output image quality check: refer to readme for details
37+
metrics_orientations: ['xy'] # for 'xy' slice, 'xz' slice or 'yz' slice, where xz and yz for 3D predictions
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Configuration detailing the staining prediction on label-free images
2+
3+
# define the dataset you want to use the trained model for virtual staining/segmentation,
4+
# the output formats required and predicted image quality metrics
5+
6+
# point to directory where your trained model is saved
7+
model_dir: '/home/Translation_temp_2/'
8+
9+
# directory with images on which you want to perform the prediction, the inference dataset
10+
image_dir: '/home/InferenceData/'
11+
12+
# preprocess_dir contains the preprocessing_info.json file, used to extract information about normalization step
13+
preprocess_dir: '/home/Processed_temp_1'
14+
15+
# define inference dataset channels
16+
dataset:
17+
input_channels: [2] # label-free channel used for prediction by model
18+
target_channels: [1] # target image channel (fluorescence image) to compare how well the prediction worked
19+
pos_ids: [0, 1, 3, 4, 6, 8, 10] # may not effect the positions where interference is performed if data split is defined
20+
slice_ids: [0] # slices where inference is performed, condition same as above
21+
22+
# define the output image format
23+
images:
24+
image_format: 'zyx' # output predicted image order of dimension
25+
image_ext: '.tif' # output images are stored as single page tiff files
26+
suffix: '2DUnet_nucl' # saved output image name suffix
27+
name_format: sms # 'sms' corresponds to image naming format 'img_channelname_t***_p***_z***_customfield', default naming convention is 'im_c***_z***_t***_p***'
28+
pred_chan_name: 'pred' # suffix added to saved output image name
29+
save_to_image_dir: False # 'False' saves output in model directory, 'True' in input image directory
30+
save_folder_name: predictions # specify the name of directory to be created inside model dir to save output images
31+
data_split: val # define which image set in train/val/test/all split are to be used for the prediction
32+
save_figs: True # do you want to save a figure panel to compare the predicted and target images
33+
34+
# metrics to be computed to define prediction quality, the values will be printed on output figure panel and saved as text files
35+
metrics:
36+
metrics: [ssim, corr, r2, mae, mse] # metrics for output image quality check: refer to readme for details
37+
metrics_orientations: ['xy'] # for 'xy' slice, 'xz' slice or 'yz' slice, where xz and yz for 3D predictions

0 commit comments

Comments
 (0)