A MATLAB implementation of deep hybrid modeling for HEK293 cell culture processes, combining Long Short-Term Memory (LSTM) networks with first-principles equations for bioprocess prediction and optimization.
- Overview
- Features
- System Requirements
- Installation
- Quick Start
- Data Description
- Model Architecture
- Usage Examples
- Output Files
- Customization
- Troubleshooting
- Citation
- License
- Contact
This repository contains the implementation of hybrid neural network models that combine:
- Deep learning: LSTM and Feed-Forward Neural Networks (FFNN)
- First-principles: Mass balance equations and biokinetic relationships
- Bioprocess data: Synthetic HEK293 cell culture datasets
The hybrid approach leverages the strengths of both mechanistic knowledge and data-driven learning to predict bioprocess dynamics with improved accuracy and interpretability.
- 🧬 Bioprocess-specific: Optimized for mammalian cell culture (HEK293)
- 🤖 Hybrid AI: Combines LSTM/FFNN with mechanistic equations
- 📊 Pre-trained models: Ready-to-use LSTM and FFNN models included
- 📈 Comprehensive analysis: Automated plotting and statistical evaluation
- 🔄 Flexible training: Easy model retraining with custom parameters
- 📁 Structured data: Well-organized synthetic datasets with DoE design
- MATLAB R2019a or later
- Required MATLAB Toolboxes:
- Deep Learning Toolbox
- Statistics and Machine Learning Toolbox
- Optimization Toolbox
- Curve Fitting Toolbox (recommended)
- RAM: Minimum 8GB (16GB+ recommended for large-scale training)
- Storage: At least 500MB free space
- CPU: Multi-core processor recommended for faster training
-
Clone the repository:
git clone https://github.com/jrcramos/Hybrid-modeling-of-bioreactor-with-LSTM.git cd Hybrid-modeling-of-bioreactor-with-LSTM
-
Open MATLAB and navigate to the repository directory:
cd('/path/to/Hybrid-modeling-of-bioreactor-with-LSTM')
-
Add all subdirectories to MATLAB path:
addpath(genpath(pwd))
% Run the main script
hybnet_train_main
% When prompted, select "Simulate" to use pre-trained models
% This will generate plots and analysis using existing LSTM and FFNN models
% Run the main script
hybnet_train_main
% When prompted, select "New training" to train models from scratch
% Note: Training may take 10-30 minutes depending on your system
- Source: Synthetic HEK293 cell culture data
- Base model: Robitaille et al. (2015) metabolic model
- Design: 3³ factorial Design of Experiments (DoE)
- Experiments: 9 bioreactor runs (Br1-Br9)
- Duration: 240 hours per run
Br5
|
|
Br1 --+-- Br2
| |
| |
Br7 --+ Br9 +-- Br8
| |
| |
Br3 --+-- Br4
|
|
Br6
data/data.xlsx
: Raw experimental data with feed compositionsdata/data.mat
: Processed data in MATLAB format- Pre-trained models:
hybrid_LSTM_1.mat
,hybrid_FFNN_1.mat
Each experiment contains:
data(i).time
: Time points (hours)data(i).conc
: Metabolite concentrations (mM)data(i).accum
: Accumulated masses in bioreactordata(i).m_r
: Reacted amounts (metabolic rates)data(i).vol
: Reactor volume (L)
- Input layer: 27 features (concentrations + process variables)
- Hidden layers:
- Dense layer (27 → 10 neurons, ReLU activation)
- Dense layer (10 → 10 neurons, ReLU activation)
- LSTM layer (10 → 4 neurons)
- Output: 4 principal components of reaction rates
- Input layer: 27 features
- Hidden layers:
- Dense layer (27 → 10 neurons, ReLU activation)
- Dense layer (10 → 10 neurons, ReLU activation)
- Dense layer (10 → 4 neurons, ReLU activation)
- Output: 4 principal components of reaction rates
- Algorithm: ADAM optimizer
- Learning rate: Adaptive (starts at 1e-3, decays to 1e-6)
- Iterations: 5000 (default, configurable)
- Training runs: 5 repetitions with different initializations
- Validation: Cross-validation with held-out experiments
% Load and simulate pre-trained LSTM model
hybnet_train_fun('hybrid_LSTM_1');
% Load and simulate pre-trained FFNN model
hybnet_train_fun('hybrid_FFNN_1');
% Define custom model architecture
layers = {
hnetfflayer(27, 20, 'relu'), % Input layer
hnetfflayer(20, 15, 'relu'), % Hidden layer
hnetLSTMlayer(15, 4) % LSTM output layer
};
% Training parameters
niter = 3000; % Number of iterations
nruns = 10; % Number of training repetitions
npcs = 4; % Principal components
% Define training/validation splits
Indtr = [1 2 3 4]; % Training experiments
Indcr = [7 9]; % Validation experiments
% Run training
hybnet_train_fun('my_custom_model', layers, Indtr, Indcr, {niter, nruns, npcs});
% Regenerate processed data from raw Excel files
cd data
main_data_processing
cd ..
After running the models, you'll get:
- Concentration profiles: Time-series of all metabolites
- Parity plots: Predicted vs. experimental values
- Training curves: Loss evolution during optimization
- Statistical analysis: R², RMSE, AIC metrics
structures_fit_results.xlsx
: Comprehensive model performance metrics- Model files: Saved trained models (
.mat
format) - Figure files: All generated plots
- RMSE: Root Mean Square Error
- R²: Coefficient of determination
- AIC: Akaike Information Criterion
- Cross-validation scores: Validation set performance
- Add data to
data/data.xlsx
following the existing format - Run
data/main_data_processing.m
to generatedata.mat
- Update experiment indices in
hybnet_train_main.m
% Example: Deeper LSTM network
layers = {
hnetfflayer(27, 30, 'relu'),
hnetfflayer(30, 20, 'relu'),
hnetLSTMlayer(20, 10),
hnetfflayer(10, 4, 'linear')
};
% In hybnet_train_main.m, modify:
niter = 10000; % More iterations for complex models
nruns = 20; % More repetitions for robustness
npcs = 6; % More principal components
Problem: "Undefined function or variable" errors Solution: Ensure all subdirectories are added to MATLAB path:
addpath(genpath(pwd))
Problem: Out of memory during training Solution:
- Reduce batch size in model parameters
- Use fewer training repetitions (
nruns
) - Close other MATLAB applications
Problem: Poor model convergence Solution:
- Increase number of iterations (
niter
) - Adjust learning rate in
hnet
parameters - Try different random initializations
Problem: Missing toolbox functions Solution: Install required MATLAB toolboxes or contact your MATLAB administrator
- Check the troubleshooting section above
- Review the original paper for methodological details
- Examine the
data/read_me_data_processing.doc
file - Contact the authors (see Contact section)
Please cite this work as:
@article{ramos2024deep,
title={Deep hybrid modeling of a HEK293 process: Combining long short-term memory networks with first principles equations},
author={Ramos, João RC and Pinto, José and Poiares-Oliveira, Gil and Peeters, Ludovic and Dumas, Patrick and Oliveira, Rui},
journal={Biotechnology and Bioengineering},
pages={1--15},
year={2024},
publisher={Wiley},
doi={10.1002/bit.28668}
}
Paper DOI: https://doi.org/10.1002/bit.28668
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Corresponding Author:
Rui Oliveira
LAQV-REQUIMTE, Department of Chemistry
NOVA School of Science and Technology
NOVA University Lisbon, Portugal
📧 Email: rmo@fct.unl.pt
Authors:
- João R. C. Ramos¹
- José Pinto¹
- Gil Poiares-Oliveira¹
- Ludovic Peeters²
- Patrick Dumas²
- Rui Oliveira¹
¹ LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, 2829-516 Caparica, Portugal
² GSK, 89 rue de l'Institut, 1330 Rixensart, Belgium
This README was designed to help scientists easily understand and reuse this hybrid modeling framework for bioprocess applications.