ENH: final revision adjustments

EPFL-LCSB · Dec 12, 2023 · 4d1c634 · 4d1c634
1 parent 4bd4c89
commit 4d1c634
Show file tree

Hide file tree

Showing 35 changed files with 772 additions and 285 deletions.
diff --git a/README.md b/README.md
@@ -60,3 +60,4 @@ Run times for each of the scripts is provided in the readme file in the scripts
 3. The parameters for the 13 enhanced kinetic models for the second study, eK_trpD9923, are provided in ./study-2-K_trpD9923/data/enhanced_kinetic_models.hdf5
 4. The final set of 34 unique designs from eK_trpD9923 is in the csv file ./study-2-K_trpD9923/output/data/all_unique_designs_eK_trpD9923.csv
 5. The final set of 13 unique designs from eK_trpD9923_d2 is in the csv file ./study-2-K_trpD9923/output/data/all_unique_designs_eK_trpD9923_d2.csv
+6. The source data for all the figures in the paper is provided in the root directory in source_data.xlsx
diff --git a/docker/.gitignore b/docker/.gitignore
@@ -1,3 +1,6 @@
+# solver binaries
+solvers/*
+
 # Gurobi key info
 gurobi.lic
 

diff --git a/source_data.xlsx b/source_data.xlsx
diff --git a/study-1-K_trpD9923/scripts/README.txt b/study-1-K_trpD9923/scripts/README.txt
@@ -20,7 +20,7 @@ in activity for that enzyme to 1e-3)
 - We then store the solution value that NRA gives us.
 - In this manner we obtain a matrix of 41 x 10 NRA solutions
 - The results of this will be used to get the top 5 designs (by average predicted inc. in anthranilate yield as per NRA)
-- The results will also be used to plot figure 4 in the main text and figure S.1 A in supplementary note III
+- The results will also be used to plot figure 4 in the main text and figure S.7a
 - A run takes around 30 minutes.
 
 ** III_get_fold_changes_top_designs
@@ -33,7 +33,7 @@ in a nonlinear setting across different models and the sensitivity of the design
 ** IV_verification_of_top_designs_in_reactor
 - We apply each of the top 5 designs to each kinetic model using the NRA suggested fold changes for that model and that design.
 - We then simulate the response of the models to each design.
-- This is used to plot supplementary figures S.2A and S.2B.
+- This is used to plot supplementary figures S.8a and S.8b.
 - A run takes around 40 minutes.
 
 ** V_expression_sensitivity_of_top_designs
@@ -42,35 +42,44 @@ in a nonlinear setting across different models and the sensitivity of the design
 - Note that for the fold changes we use the mean of the NRA suggested enzyme fold changes for each enzyme in a design across the 10 kinetic models.
 - We can apply to perturbation to either a single enzyme expression about its mean NRA predicted value while keeping the other 2 
 at the mean NRA proposed values.
-- To plot figure S.2C-F you need to run this script 4 times, for perturbing each enzyme individually and then all 3 together
+- To plot figure S.8c-f you need to run this script 4 times, for perturbing each enzyme individually and then all 3 together
 - You can change which enzyme to perturb within the script.
 - Each run consisting of 10x10x5 ODE simulations takes around 4 hours. 
 
-** S.VI_phenotype_perturbation_analysis
+** S.I_phenotype_perturbation_analysis
 - This is a sensitivity analysis test where we want to study how the NRA predicted anthranilate yield and the actual anthranilate production rate in nonlinear simulations
 varies depending on how close / far we are from the reference phenotype.
 - We vary the allowable fold changes in concentration from 2-fold to 20-fold
 - We also vary the allowable fold changes in enzyme activity from 2-fold to 10-fold
 - For each combination of enzyme activity and concentration fold changes, we generate the top design using NRA and simulate it in a bioreactor
 - You can either run this script once or run it in batches on parallel CPUs
 - For a given combination of concentration fold change and enzyme activity change, it takes around 20-30 minutes. Note the lower time, since we only simulate a single design (the top design).
-- The results of this script are used for plotting figure S.7
+- The results of this script are used for plotting figure S.1
 
-** S.VII_double_mutant
-- Two enzymes - DDPA and GLUDy - appear in all of our top 5 targets (see II_design_robustness and III_get_fold_changes_top_design)
-- We wanted to know (prompted by a reviewer) what happens if we only target these two and how this double mutant performs compared to the triple mutant
-- The results of the reactor simulations in this script as used to plot figure S.8
+** S.II_MCA_design_analysis
+- This code examines the impact of constraining the allowable fold change in enzyme activities on MCA based designs
+- It is used for the study presented in Supplementary note II
+- For 3 different allowable fold changes in enz activity (1.2, 2, and 5) we target the top 3 yield control coefficients for anthranilate with respect to glucose that do not adversely impact growth.
+- The script in total takes less than 20 minutes to run.
+- The results are required to get figures S.2, S.3 and S.4
 
-** S.VIII_i_aroGtktA_analysis_design_generation
-- Prompted by reviews, we wanted to find out why our in-silico version of the experimentally implemented strain aroG^fbrtktA had such poor titers compared to the experimental implementation
+** S.V_i_K_trpD9923_d2_analysis_design_generation
+- Prompted by reviews, we wanted to find out why our in-silico version (K_trpD9923_d2) of the experimentally implemented strain aroG^fbrtktA had such poor titers compared to the experimental implementation
 - We use NRA to propose 4 additional modifications (apart from DDPA, TKT1 and TKT2) to increase anthranilate yield for each model
 - We generated designs within 99% of the maximum predicted increase in anthranilate yield for each model
 - The designs are then used by the next script for pruning
 
-** S.VIII_ii_aroGtktA_analysis_design_collation
+** S.V_ii_K_trpD9923_d2_analysis_design_collation
 - After running the previous script, this script takes in all the generated designs, collates them and finds the most frequently occurring one.
 
-** S.VIII_iii_aroGtktA_analysis_design_verification
+** S.V_iii_K_trpD9923_d2_analysis_design_verification
 - In this script, we verify the top design that occurs 6/10 times across the models.
 - We do this in a reactor setting.
-- The output of this script is used to generate figure S.9 in the supplementary material.
+- The output of this script is used to generate figure S.6 in the supplementary material.
+
+** S.VII_double_mutant
+- Two enzymes - DDPA and GLUDy - appear in all of our top 5 targets (see II_design_robustness and III_get_fold_changes_top_design)
+- We wanted to know (prompted by a reviewer) what happens if we only target these two and how this double mutant performs compared to the triple mutant
+- The results of the reactor simulations in this script as used to plot figure S.9
+
+
diff --git a/study-1-K_trpD9923/scripts/S.II_MCA_design_analysis.py b/study-1-K_trpD9923/scripts/S.II_MCA_design_analysis.py
@@ -0,0 +1,219 @@
+"""
+This code is used to run the experiments presented in Supplementary Note II - MCA design.
+For each of the 10 models, we do the following
+1. We identify the yield control coefficients (YCCs) for anthranilate wrt glucose
+2. We then prune the YCCs for those enzymes that have the same sign of the YCC as they do for their
+biomass control coefficient. This ensures that perturbing this enzyme does not negatively impact
+growth
+3. We then choose the top pruned YCCs and perturb them in bioreactor simulations with 1.2 fold,
+2 fold, and 5 fold changes in enzyme activity
+"""
+import numpy as np
+import pandas as pd
+
+from pytfa.io.json import load_json_model
+from pytfa.optim.constraints import *
+
+from skimpy.io.yaml import load_yaml_model
+from skimpy.io.regulation import load_enzyme_regulation
+from skimpy.analysis.oracle.load_pytfa_solution import load_fluxes, load_concentrations, load_equilibrium_constants
+from skimpy.core.parameters import load_parameter_population
+from skimpy.core.solution import ODESolutionPopulation
+from skimpy.utils.tabdict import TabDict
+from skimpy.utils.namespace import NET, QSSA
+from skimpy.simulations.reactor import make_batch_reactor
+
+from NRAplus.core.model import NRAmodel
+from NRAplus.analysis.design import enumerate_solutions
+
+import os
+import glob
+import time as T
+import matplotlib.pyplot as plt
+
+
+SIMULATE_IN_REACTOR = True
+
+# Design parameters
+MAX_FOLD_ENZ_ACTIVITY = [1.2, 2, 5]
+N_ENZ = 3
+
+# Cellular parameters
+CONCENTRATION_SCALING = 1e9  # 1 mol to 1 mmol
+TIME_SCALING = 1  # 1hr
+DENSITY = 1105  # g/L
+GDW_GWW_RATIO = 0.3  # Assumes 70% Water
+flux_scaling_factor = 1e-3 * (GDW_GWW_RATIO * DENSITY) * CONCENTRATION_SCALING / TIME_SCALING
+concentrations_to_plot = { 'anth_e': 136.13 / CONCENTRATION_SCALING,
+                          'biomass_strain_1': 0.28e-12 / 0.05}
+
+# Ode simulation parameters
+TOTAL_TIME = 65
+N_STEPS = 1000
+
+# Paths
+path_to_tmodel = './../models/tfa_model.json'
+path_to_kmodel = './../models/kinetic_model_scaffold.yaml'
+path_to_tfa_samples = './../data/tfa_samples.csv'
+path_to_params = './../data/kinetic_params_top_10_models.hdf5'
+path_to_regulation_data = './../data/allosteric_regulations.csv'
+
+
+# Load models
+tmodel = load_json_model(path_to_tmodel)
+kmodel_draft = load_yaml_model(path_to_kmodel)
+
+# Add regulation data to kinetic model
+df = pd.read_csv(path_to_regulation_data)
+df_regulations_all = df[df['reaction_id'].isin(list(kmodel_draft.reactions.keys()))]
+df_regulations_all = df_regulations_all[df_regulations_all['regulator'].isin(list(kmodel_draft.reactants.keys()))]
+kmodel = load_enzyme_regulation(kmodel_draft, df_regulations_all)
+
+# Get the correct parameter sets
+parameter_population = load_parameter_population(path_to_params)
+kinetic_models = list(parameter_population._index.keys())
+
+# Get the list of parameters for which we want control coefficients
+transport_reactions = ['vmax_forward_' + r.id for r in tmodel.reactions
+                       if (len(r.compartments) > 1)
+                       and not ('í' in r.compartments)
+                       and not r.id.startswith('LMPD_')
+                       and r.id in kmodel.reactions]
+transport_reactions.append('vmax_forward_ATPM')
+parameter_list = TabDict([(k, p.symbol) for k, p in kmodel.parameters.items()
+                          if p.name.startswith('vmax_forward')
+                          and str(p.symbol) not in transport_reactions])
+
+# Prepare the kinetic models for MCA
+kmodel.prepare()
+kmodel.compile_mca(mca_type=NET, sim_type=QSSA, parameter_list=parameter_list, ncpu=4)
+
+# Reactor initialization
+reactor = make_batch_reactor('single_species.yaml', df_regulation= df_regulations_all)
+reactor.compile_ode(add_dilution=False)
+reactor_volume = reactor.models.strain_1.parameters.strain_1_volume_e.value
+
+# Loop through all kinetic models, get NRA designs and test everything in a bioreactor setup
+for this_model in kinetic_models:
+
+    # Get TFA and kinetic model indices
+    list_ix = this_model.split(',')
+    tfa_ix = int(list_ix[0])
+    kin_ix = int(list_ix[1])
+
+    folder_for_output = './../output/mca-study/{}'.format(this_model)
+    if not os.path.exists(folder_for_output):
+        os.makedirs(folder_for_output)
+
+    # Load tfa sample and kinetic parameters into kinetic model
+    tfa_sample = pd.read_csv(path_to_tfa_samples, header=0, index_col=0).iloc[tfa_ix]
+    parameter_set = parameter_population['{}'.format(this_model)]
+    kmodel.parameters = parameter_set
+
+    # Load steady-state fluxes and concentrations into kmodel
+    fluxes = load_fluxes(tfa_sample, tmodel, kmodel,
+                         density=DENSITY,
+                         ratio_gdw_gww=GDW_GWW_RATIO,
+                         concentration_scaling=CONCENTRATION_SCALING,
+                         time_scaling=TIME_SCALING)
+    concentrations = load_concentrations(tfa_sample, tmodel, kmodel,
+                                         concentration_scaling=CONCENTRATION_SCALING)
+
+    # Fetch equilibrium constants
+    load_equilibrium_constants(tfa_sample, tmodel, kmodel,
+                               concentration_scaling=CONCENTRATION_SCALING,
+                               in_place=True)
+
+    """
+    GET TOP CCs
+    """
+    # Get yield control coefficients
+    flux_control_coeff = kmodel.flux_control_fun(fluxes, concentrations, [parameter_set])
+    concentration_control_coeff = kmodel.concentration_control_fun(fluxes, concentrations, [parameter_set])
+    ccc = concentration_control_coeff.mean('sample')
+    fcc = flux_control_coeff.mean('sample')
+    yield_cc = fcc.loc['ANS'] - fcc.loc['GLCtex']
+
+    # Only those enzymes which don't kill biomass can be targets
+    biomass_cc = fcc.loc['LMPD_biomass_c_1_420']
+
+    # Modified yield control coefficients - YCCs that have the same sign for biomass and anthranilate yield
+    enz_to_keep = []
+    for this_enz in yield_cc.index:
+        sign_ycc = np.sign(yield_cc.loc[this_enz])
+        sign_gcc = np.sign(biomass_cc.loc[this_enz])
+
+        if sign_ycc == sign_gcc:
+            enz_to_keep.append(this_enz) # If they have the same directional impact, keep it
+
+    yield_cc = yield_cc.loc[enz_to_keep]
+    highest_ycc = yield_cc.abs().sort_values().index[-N_ENZ:]
+
+    # get combined ccs
+    df_combined = pd.concat([yield_cc, biomass_cc.loc[enz_to_keep]], axis=1)
+    df_combined.loc[df_combined.abs().sort_values(by=0, ascending=False).index].to_csv(folder_for_output + '/combined_ccs.csv')
+
+    ''' 
+    Reactor simulations 
+    '''
+    def reset_reactor():
+        reactor.parametrize(parameter_set, 'strain_1')
+        reactor.initialize(concentrations, 'strain_1')
+        reactor.initial_conditions['biomass_strain_1'] = 0.037 * 0.05 / 0.28e-12
+
+        for met_ in reactor.medium.keys():
+            LC_id = 'LC_' + met_
+            LC_id = LC_id.replace('_L', '-L')
+            LC_id = LC_id.replace('_D', '-D')
+            reactor.initial_conditions[met_] = np.exp(tfa_sample.loc[LC_id]) * 1e9
+
+        # Not nice but necessary
+        reactor.models.strain_1.parameters.strain_1_volume_e.value = reactor_volume
+        reactor.models.strain_1.parameters.strain_1_cell_volume_e.value = 1.0  # 1.0 #(mum**3) look up cell volume bionumbers
+        reactor.models.strain_1.parameters.strain_1_cell_volume_c.value = 1.0  # 1.0 #(mum**3)
+        reactor.models.strain_1.parameters.strain_1_cell_volume_p.value = 1.0  # 1.0 #(mum**3)
+        reactor.models.strain_1.parameters.strain_1_volume_c.value = 0.9 * 1.0  # (mum**3)
+        reactor.models.strain_1.parameters.strain_1_volume_p.value = 0.1 * 1.0  # (mum**3)
+
+    ''' 1. Wildtype '''
+    reset_reactor()
+    start = T.time()
+    sol_ode_wildtype = reactor.solve_ode(np.linspace(0, TOTAL_TIME, N_STEPS),
+                                         solver_type='cvode',
+                                         rtol=1e-9,
+                                         atol=1e-9,
+                                         max_steps=1e9,
+                                         )
+    end = T.time()
+    print("WT: {}".format(end-start))
+
+    ''' 2. MCA based designs '''
+    sols_ode_fcc = []
+    for this_enz_activity in MAX_FOLD_ENZ_ACTIVITY:
+        reset_reactor()
+
+        # Manipulate the top fccs proportionally (proportional to their magnitude)
+        for this_ycc in highest_ycc:
+            reactor.parameters['strain_1_'+this_ycc].value *= \
+                this_enz_activity**(np.sign(yield_cc.loc[this_ycc]))
+
+        start = T.time()
+        sol_ode_fcc = reactor.solve_ode(np.linspace(0, TOTAL_TIME, N_STEPS),
+                                        solver_type='cvode',
+                                rtol=1e-9,
+                                atol=1e-9,
+                                max_steps=1e9,
+                                )
+        end = T.time()
+        print("MCA: {}".format(end-start))
+        sols_ode_fcc.append(sol_ode_fcc)
+
+    sol_odes_all = [sol_ode_wildtype]
+    sol_odes_all.extend(sols_ode_fcc)
+    solpop = ODESolutionPopulation(sol_odes_all, np.arange(0, len(sol_odes_all)))
+    solpop.data.to_csv(folder_for_output + '/ode_solutions.csv')
+    del solpop
+
+    # Print outputs to file
+    yield_cc.loc[highest_ycc].to_csv(folder_for_output + '/top_yccs.csv')
+    biomass_cc.loc[highest_ycc].to_csv(folder_for_output + '/top_yccs_growth.csv')
diff --git a/....VI_phenotype_perturbation_sensitivity.py → ...S.I_phenotype_perturbation_sensitivity.py b/....VI_phenotype_perturbation_sensitivity.py → ...S.I_phenotype_perturbation_sensitivity.py
@@ -6,7 +6,7 @@
 -- enumerate top designs within 99%
 -- implement designs in reactor
 
-Results are used to plot figure S.9 (./plot/figure_S.7.py)
+Results are used to plot figure S.1 (./plot/figure_S.1.py)
 """
 import numpy as np
 import pandas as pd
@@ -118,7 +118,7 @@
             tfa_ix = int(list_ix[0])
 
             # Create folder for output
-            output_folder = './../output/data/S.VI-phenotype-perturbation-sensitivity/{}-fold-conc-change-{}-fold-enz-activity/{}/'.format(this_fold_conc_change,
+            output_folder = './../output/data/S.I-phenotype-perturbation-sensitivity/{}-fold-conc-change-{}-fold-enz-activity/{}/'.format(this_fold_conc_change,
                                                                                                    this_fold_enz_activity,
                                                                                                    this_model
                                                                                                    )

diff --git a/study-1-K_trpD9923/scripts/S.VII_double_mutant.py b/study-1-K_trpD9923/scripts/S.VII_double_mutant.py
@@ -1,6 +1,6 @@
 """
 We found that DDPA and GLUDy appeared in each of our top 5 designs. Hence, we decided to test the performance of just
-this double mutant as suggested by one of the reviewers. The data is used to generate figure S.10 in supplementary note
+this double mutant as suggested by one of the reviewers. The data is used to generate figure S.9 in supplementary note
 S.VII
 """
 import numpy as np
@@ -172,9 +172,7 @@ def reset_reactor():
                           'biomass_strain_1': 0.28e-12 / 0.05}
 
 
-# Plotting - NOTE this is now done separately by the script in the plot folder (figure_S.10)
-import matplotlib.pyplot as plt
-
+# Collating data
 df_sols_design = []
 df_sols_wt = []
 for ix in range(10):
@@ -187,35 +185,6 @@ def reset_reactor():
 
 df_sols_wt = pd.concat(df_sols_wt)
 df_sols_design = pd.concat(df_sols_design)
-#
-# df_wt_mean = df_sols_wt.groupby('time').quantile(0.5)
-# df_wt_upper = df_sols_wt.groupby('time').quantile(0.75)
-# df_wt_lower = df_sols_wt.groupby('time').quantile(0.25)
-#
-# df_designs_mean = df_sols_design.groupby('time').quantile(0.5)
-# df_designs_upper = df_sols_design.groupby('time').quantile(0.75)
-# df_designs_lower = df_sols_design.groupby('time').quantile(0.25)
-
-# for conc, scaling in concentrations_to_plot.items():
-#     time = df_wt_mean.index
-#     # Plot wildtype
-#     plt.plot(time, df_wt_mean[conc] * scaling, color='orange', label='wt')
-#     plt.fill_between(time, df_wt_lower[conc] * scaling, df_wt_upper[conc] * scaling, facecolor='orange',
-#                      interpolate=True,
-#                      alpha=0.1)
-#
-#     # Plot design alternatives
-#     plt.plot(time, df_designs_mean[conc] * scaling, color='blue', label='NOMAD - DDPA + GLUDy')
-#     plt.fill_between(time, df_designs_lower[conc] * scaling, df_designs_upper[conc] * scaling,
-#                      facecolor='blue',
-#                      interpolate=True,
-#                      alpha=0.1)
-#
-#     plt.legend()
-#     plt.xlabel('time (h)')
-#     plt.ylabel(conc.replace('strain_1', '') + ' (g/L)')
-#     plt.savefig(folder_for_figures + '/ddpa_gludy_only_{}.png'.format(conc))
-#     plt.close()
 
 df_sols_wt.to_csv(folder_for_output + '/ode_sols_wt.csv')
 df_sols_design.to_csv(folder_for_output + '/ode_sols_ddpa_gludy.csv')
diff --git a/..._i_aroGtktA_analysis_design_generation.py → ...trpD9923_d2_analysis_design_generation.py b/..._i_aroGtktA_analysis_design_generation.py → ...trpD9923_d2_analysis_design_generation.py
@@ -117,7 +117,7 @@
     tfa_ix = int(list_ix[0])
 
     # Create folder for output
-    output_folder = './../output/data/S.VIII-aroGtktA-analysis/design-generation/{}/'.format(this_model)
+    output_folder = './../output/data/S.V-aroGtktA-analysis/design-generation/{}/'.format(this_model)
     if not os.path.exists(output_folder):
         os.makedirs(output_folder)
 

diff --git a/..._ii_aroGtktA_analysis_design_collation.py → ..._trpD9923_d2_analysis_design_collation.py b/..._ii_aroGtktA_analysis_design_collation.py → ..._trpD9923_d2_analysis_design_collation.py
@@ -10,7 +10,7 @@
 
 
 # Paths
-path_to_data = './../output/data/S.VIII-aroGtktA-analysis/design-generation/{}'
+path_to_data = '../output/data/S.V-aroGtktA-analysis/design-generation/{}'
 kinetic_models = ['1712,6', '1715,6', '1717,6', '2392,6', '2482,8', '3927,1', '4230,7', '4241,5', '4456,7', '4468,6']
 
 # Collate all the designs into one dataframe with all the designs and the corresponding enz activity changes and NRA solution