mehta-lab
diff --git a/‎README.md
Lines changed: 72 additions & 31 deletions b/‎README.md
Lines changed: 72 additions & 31 deletions
diff --git a/‎citation.cff
Lines changed: 28 additions & 0 deletions b/‎citation.cff
Lines changed: 28 additions & 0 deletions
diff --git a/‎config_files/Inference-config_2.5DUnet_regression_phase2membrane.yml
Lines changed: 37 additions & 0 deletions b/‎config_files/Inference-config_2.5DUnet_regression_phase2membrane.yml
Lines changed: 37 additions & 0 deletions
diff --git a/‎config_files/Inference-config_2DUnet_regression_phase2nucleus.yml
Lines changed: 37 additions & 0 deletions b/‎config_files/Inference-config_2DUnet_regression_phase2nucleus.yml
Lines changed: 37 additions & 0 deletions
@@ -1,27 +1,66 @@
+# microDL
+
+## robust and efficient virtual staining of label-free microscopy data
+
 ![Build Status](https://github.com/czbiohub/microDL/workflows/build/badge.svg)
 [![Code Coverage](https://codecov.io/gh/czbiohub/microDL/branch/master/graphs/badge.svg)](https://codecov.io/gh/czbiohub/microDL)
 
-# microDL
+microDL is a deep learning pipeline for efficient 2D and 3D image translation. We commonly use it to virtually stain label-free images, i.e., to predict fluorescence-like images. Label-free imaging  visualizes many structures simultaneously. Virtual staining enables identification of diverse structures without extensive human annotation - the annotations are provided by the molecular markers of the structure. This pipeline was originally developed for 3D virutal staining of tissue and cell structures from label-free images of their density and anisotropy: <https://doi.org/10.7554/eLife.55502>. We are currently extending it to enable generalizable virtual staining of nuclei and membrane in diverse imaging conditions and across multiple cell types. We provide a computationally and memory efficient variant of U-Net (2.5D U-Net) for 3D virtual staining.
+
+You can train a microDL model using label-free images and corresponding fluorescence channels you want to predict. Once the model is trained using the dataset provided you can use the model to predict the same fluorescence channels or segmneted masks in other datasets using the label-free images.
+
+In the example below, phase images and corresponding nuclear and membrane stained images are used to train a 2.5D U-Net model.
+The model can be used to predict the nuclear and membrane channels using label-free phase images.
+
+<p align="center">
+    <img width="500" src="./figures/virtual_staining.png">
+<p/>
+
+<p align="center">
+    <img width="200" src="./figures/nuc_mem.png">
+<p/>
+
+microDL allows you to design, train and evaluate U-Net models. It supports 2D U-Nets for 2D image translation and 2.5D (3D encoder, 2D decoder) U-Nets for 3D image translation.
 
-microDL allows you to design, train and evaluate U-Net models using just a few YAML config files. It supports 2D, 2.5D (3D encoder, 2D decoder) and 3D U-Nets, as well as 3D networks with anistropic filters. It also supports networks with an encoder plus dense layers for image to vector or image to scalar models. Our hope is that microDL will provide easy to use CLIs for segmentation, regression and classification tasks of microscopy images. 
+Our goal is to enable robust translation of images across diverse microscopy methods.
 
-microDL consists of three modules:
+microDL consists of three modules that are accessible via CLI and customized via a configuration file in YAML format:
 
-* Preprocessing: normalization, flatfield correction, masking, tiling
-* Training: model creation, loss functions (w/wo masks), metrics, learning rates
-* Inference: on full images or on tiles that can be stitched to full images
+* [Preprocessing](micro_dl/preprocessing/readme.md): normalization, flatfield correction, masking, tiling
+* [Training](micro_dl/train/readme.md): model creation, loss functions (w/wo masks), metrics, learning rates
+* [Inference](micro_dl/inference/readme.md): on full images or on tiles that can be stitched to full images
+
+Note: microDL also supports 3D U-Nets and image segmentation, but we don't use these features frequently and they are the least tested.
 
 ## Getting Started
 
-Assuming your data is already formatted in a way that microDL understands (see Data Format below), you can run preprocessing, training and inference in three command lines.
-For config settings, see module specific readme's in micro_dl/preprocessing, micro_dl/training and micro_dl/inference.
+### Introductory exercise from DL@MBL
+
+If you are new to image translation or U-Nets, start with [slides](notebooks/dlmbl2022/20220828_DLMBL_ImageTranslation.pdf) from the didactic lecture from  [deep learning @ marine biological laboratory](https://www.mbl.edu/education/advanced-research-training-courses/course-offerings/dlmbl-deep-learning-microscopy-image-analysis).
+
+You can download test data and walk through the exercise by following [these instructions](notebooks/dlmbl2022/README.md).
+
+### Using command line interface (CLI)
+
+Refer to the [requirements](#requirements) section to set up the microDL environment.
+
+Build a [docker](#docker) to set up your microDL environment if the dependencies are not compatible with the hardware environment on your computational facility.
+
+Format your input data to match the microDL [data format](#data-format) requirements.
+
+Once your data is already formatted in a way that microDL understands, you can run preprocessing, training and inference in three command lines.
+For config settings, see module specific readme's in [micro_dl/preprocessing](micro_dl/preprocessing/readme.md),
+[micro_dl/training](micro_dl/train/readme.md) and
+[micro_dl/inference](micro_dl/inference/readme.md).
 
 ```buildoutcfg
 python micro_dl/cli/preprocessing_script.py --config <preprocessing yaml config file>
 ```
+
 ```buildoutcfg
 python micro_dl/cli/train_script.py --config <train config yml> --gpu <int> --gpu_mem_frac <GPU memory fraction>
 ```
+
 ```buildoutcfg
 python micro_dl/cli/inference_script.py --config <train config yml> --gpu <int> --gpu_mem_frac <GPU memory fraction>
 ```
@@ -31,30 +70,41 @@ python micro_dl/cli/inference_script.py --config <train config yml> --gpu <int>
 It is recommended that you run microDL inside a Docker container, especially if you're using shared resources like a GPU server. microDL comes with two Docker images, one for Python3.6 with CUDA 9 support (which is most likely what
 you'll want), and one for Python3.5 with CUDA 8.0 support. If you're working at the CZ Biohub you should be in the Docker group on our GPU servers Fry/Fry2, if not you can request anyone in the data science team to join. The Python 3.6 image is already built on Fry/Fry2, but if you want to modify it and build your own Docker image/tag somewhere,
 you can do so:
+
 ```buildoutcfg
 docker build -t imaging_docker:gpu_py36_cu90 -f Dockerfile.imaging_docker_py36_cu90 .
 ```
+
 Now you want to start a Docker container from your image, which is the virtual environment you will run your code in.
+
 ```buildoutcfg
 nvidia-docker run -it -p <your port>:<exposed port> -v <your dir>:/<dirname inside docker> imaging_docker:gpu_py36_cu90 bash
 ```
+
 If you look in the Dockerfile, you can see that there are two ports exposed, one is typically used for Jupyter (8888)
 and one for Tensorboard (6006). To be able to view these in your browser, you need map the port with the -p argument.
 The -v arguments similarly maps directories. You can use multiple -p and -v arguments if you want to map multiple things.
-The final 'bash' is to signify that you want to run bash (your usual Unix shell). 
+The final 'bash' is to signify that you want to run bash (your usual Unix shell).
 
 If you want to launch a Jupyter notebook inside your container, you can do so with the following command:
+
 ```buildoutcfg
 jupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --no-browser
 ```
+
 Then you can access your notebooks in your browser at:
+
 ```buildoutcfg
 http://<your server name (e.g. fry)>:<whatever port you mapped to when starting up docker>
 ```
+
 You will need to copy/paste the token generated in your Docker container.
 
 ### Data Format
 
+Input data should be in the format of single page tiff files. If you use zarr files, you can convert
+them to single page tiff files using the [zarr to single page tiff conversion script](https://github.com/mehta-lab/microDL/blob/master/scripts/hcszarr2single_tif_mp.py).
+
 To train directly on datasets that have already been split into 2D frames, the dataset
 should have the following structure:
 
@@ -66,39 +116,30 @@ dir_name
     |- im_c***_z***_t***_p***.png
     |- ...
 ```
+
 The image naming convention is (parenthesis is their name in frames_meta.csv)
+
 * **c** = channel index     (channel_idx)
 * **z** = slice index in z stack (slice_idx)
 * **t** = timepoint index   (time_idx)
 * **p** = position (field of view) index (pos_idx)
 
-If you download your dataset from the CZ Biohub imaging database [imagingDB](https://github.com/czbiohub/imagingDB)
-you will get your dataset correctly formatted and can directly input that into microDL.
-If you don't have your data in the imaging database, write a script that converts your 
-your data to image files that adhere to the naming convention above, then run 
+If your data is not in the zarr or tiff format supported by the preprocessing module, write a script that converts your your data to image files that adhere to the naming convention above, then run
 
-```buildoutcfg
+```sh
 python micro_dl/cli/generate_meta.py --input <directory name>
 ```
+
 That will generate the frames_meta.csv file you will need for data preprocessing.
 
+Before preprocessing make sure the z stacked images are aligned to be centered at the focal plane at all positions. If the focal plane in image stacks imaged
+at different positions in a plate are at different z levels, align them using the [z alignment script](https://github.com/mehta-lab/microDL/blob/master/scripts/align_z_focus.py).
 
-## Requirements
+## Failure modes
 
-There is a requirements.txt file we use for continuous integration, and a requirements_docker.txt file we use to build the Docker image. The main packages you'll need are:
-
-* keras
-* tensorflow
-* cv2
-* Cython
-* matplotlib
-* natsort
-* nose
-* numpy
-* pandas
-* pydot
-* scikit-image
-* scikit-learn
-* scipy
-* testfixtures (for running tests)
+Although deep learning pipelines solve complex computer vision problems with impressive accuracy, they can fail in ways that are not intuitive to human vision. We think that trasparent discussion of failure modes of deep learning pipelines is necessary for the field to continue advancing. These [slides](notebooks/dlmbl2022/20220830_DLMBL_FailureModes.pdf) from DL@MBL 2022 course summarize failure modes of the virtual stanining that we have identified and some ideas for improving its robustness. If microDL fails with your data, please start a discussion via issues on this repositroy.
+
+## Requirements
 
+These are the [required dependencies](requirements.txt)  for continuous integration, and a [required dependencies](requirements_docker.txt) to build a docker image.
+This version (1.0.0) assumes single-page tiff data format and is built on tensorflow 1.13, keras 2.1.6. The next version will directly read zarr datasets and be re-written using pytorch.
@@ -0,0 +1,28 @@
+cff-version: 1.2.0
+title: >-
+  microDL: robust and efficient virtual staining of label-free microscopy data
+message: >-
+  Please use citation information in this file if you publish results using this pipeline. If you use the pipeine for virtual staining, please also cite our paper:
+  https://elifesciences.org/articles/55502
+type: software
+authors:
+  - given-names: Jenny
+    family-names: Folkesson
+    orcid: "https://orcid.org/0000-0002-4673-0522"
+  - given-names: Syuan-Ming
+    family-names: Guo
+    orcid: "https://orcid.org/0000-0003-3680-0387"
+  - given-names: Anitha Priya
+    family-names: Krishnan
+    orcid: "https://orcid.org/0000-0003-0193-2494"
+  - given-names: Christian
+    family-names: Foley
+    orcid: "https://orcid.org/0000-0002-5964-5060"
+  - given-names: Soorya
+    family-names: Pradeep
+    orcid: "https://orcid.org/0000-0002-0926-1480"
+  - given-names: Johanna
+    family-names: Rahm
+  - given-names: Shalin B.
+    family-names: Mehta
+    orcid: "https://orcid.org/0000-0002-2542-3582"
@@ -0,0 +1,37 @@
+# Configuration detailing the staining prediction on label-free images
+
+# define the dataset you want to use the trained model for virtual staining/segmentation,
+# the output formats required and predicted image quality metrics
+
+# point to directory where your trained model is saved
+model_dir: '/home/Translation_temp_2/'
+
+# directory with images on which you want to perform the prediction, the inference dataset
+image_dir: '/home/InferenceData/'
+
+# preprocess_dir contains the preprocessing_info.json file, used to extract information about normalization step
+preprocess_dir: '/home/Processed_temp_1'
+
+# define inference dataset channels
+dataset:
+  input_channels: [2]   # label-free channel used for prediction by model
+  target_channels: [0]  # target image channel (fluorescence image) to compare how well the prediction worked
+  pos_ids: [0, 1, 3, 4, 6, 8, 10]  # may not effect the positions where interference is performed if data split is defined
+  slice_ids: [12,13,14] # slices where inference is performed, condition same as above
+
+# define the output image format
+images:
+    image_format: 'zyx'     # output predicted image order of dimension
+    image_ext: '.tif'       # output images are stored as single page tiff files
+    suffix: '25DUnet_membrane'   # saved output image name suffix
+    name_format: sms        # 'sms' corresponds to image naming format 'img_channelname_t***_p***_z***_customfield', default naming convention is 'im_c***_z***_t***_p***'
+    pred_chan_name: 'pred'  # suffix added to saved output image name
+save_to_image_dir: False    # 'False' saves output in model directory, 'True' in input image directory
+save_folder_name: predictions # specify the name of directory to be created inside model dir to save output images
+data_split: test             # define which image set in train/val/test/all split are to be used for the prediction
+save_figs: True             # do you want to save a figure panel to compare the predicted and target images
+
+# metrics to be computed to define prediction quality, the values will be printed on output figure panel and saved as text files
+metrics:
+    metrics: [ssim, corr, r2, mae, mse]  # metrics for output image quality check: refer to readme for details
+    metrics_orientations: ['xy']   # for 'xy' slice, 'xz' slice or 'yz' slice, where xz and yz for 3D predictions
@@ -0,0 +1,37 @@
+# Configuration detailing the staining prediction on label-free images
+
+# define the dataset you want to use the trained model for virtual staining/segmentation,
+# the output formats required and predicted image quality metrics
+
+# point to directory where your trained model is saved
+model_dir: '/home/Translation_temp_2/'
+
+# directory with images on which you want to perform the prediction, the inference dataset
+image_dir: '/home/InferenceData/'
+
+# preprocess_dir contains the preprocessing_info.json file, used to extract information about normalization step
+preprocess_dir: '/home/Processed_temp_1'
+
+# define inference dataset channels
+dataset:
+  input_channels: [2]   # label-free channel used for prediction by model
+  target_channels: [1]  # target image channel (fluorescence image) to compare how well the prediction worked
+  pos_ids: [0, 1, 3, 4, 6, 8, 10]  # may not effect the positions where interference is performed if data split is defined
+  slice_ids: [0] # slices where inference is performed, condition same as above
+
+# define the output image format
+images:
+    image_format: 'zyx'     # output predicted image order of dimension
+    image_ext: '.tif'       # output images are stored as single page tiff files
+    suffix: '2DUnet_nucl'   # saved output image name suffix
+    name_format: sms        # 'sms' corresponds to image naming format 'img_channelname_t***_p***_z***_customfield', default naming convention is 'im_c***_z***_t***_p***'
+    pred_chan_name: 'pred'  # suffix added to saved output image name
+save_to_image_dir: False    # 'False' saves output in model directory, 'True' in input image directory
+save_folder_name: predictions # specify the name of directory to be created inside model dir to save output images
+data_split: val             # define which image set in train/val/test/all split are to be used for the prediction
+save_figs: True             # do you want to save a figure panel to compare the predicted and target images
+
+# metrics to be computed to define prediction quality, the values will be printed on output figure panel and saved as text files
+metrics:
+    metrics: [ssim, corr, r2, mae, mse]  # metrics for output image quality check: refer to readme for details
+    metrics_orientations: ['xy']   # for 'xy' slice, 'xz' slice or 'yz' slice, where xz and yz for 3D predictions