Skip to content

Commit

Permalink
ENH: final revision adjustments
Browse files Browse the repository at this point in the history
  • Loading branch information
Miskovic committed Dec 12, 2023
1 parent 4bd4c89 commit 4d1c634
Show file tree
Hide file tree
Showing 35 changed files with 772 additions and 285 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,4 @@ Run times for each of the scripts is provided in the readme file in the scripts
3. The parameters for the 13 enhanced kinetic models for the second study, eK_trpD9923, are provided in ./study-2-K_trpD9923/data/enhanced_kinetic_models.hdf5
4. The final set of 34 unique designs from eK_trpD9923 is in the csv file ./study-2-K_trpD9923/output/data/all_unique_designs_eK_trpD9923.csv
5. The final set of 13 unique designs from eK_trpD9923_d2 is in the csv file ./study-2-K_trpD9923/output/data/all_unique_designs_eK_trpD9923_d2.csv
6. The source data for all the figures in the paper is provided in the root directory in source_data.xlsx
3 changes: 3 additions & 0 deletions docker/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# solver binaries
solvers/*

# Gurobi key info
gurobi.lic

Expand Down
Binary file added source_data.xlsx
Binary file not shown.
37 changes: 23 additions & 14 deletions study-1-K_trpD9923/scripts/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ in activity for that enzyme to 1e-3)
- We then store the solution value that NRA gives us.
- In this manner we obtain a matrix of 41 x 10 NRA solutions
- The results of this will be used to get the top 5 designs (by average predicted inc. in anthranilate yield as per NRA)
- The results will also be used to plot figure 4 in the main text and figure S.1 A in supplementary note III
- The results will also be used to plot figure 4 in the main text and figure S.7a
- A run takes around 30 minutes.

** III_get_fold_changes_top_designs
Expand All @@ -33,7 +33,7 @@ in a nonlinear setting across different models and the sensitivity of the design
** IV_verification_of_top_designs_in_reactor
- We apply each of the top 5 designs to each kinetic model using the NRA suggested fold changes for that model and that design.
- We then simulate the response of the models to each design.
- This is used to plot supplementary figures S.2A and S.2B.
- This is used to plot supplementary figures S.8a and S.8b.
- A run takes around 40 minutes.

** V_expression_sensitivity_of_top_designs
Expand All @@ -42,35 +42,44 @@ in a nonlinear setting across different models and the sensitivity of the design
- Note that for the fold changes we use the mean of the NRA suggested enzyme fold changes for each enzyme in a design across the 10 kinetic models.
- We can apply to perturbation to either a single enzyme expression about its mean NRA predicted value while keeping the other 2
at the mean NRA proposed values.
- To plot figure S.2C-F you need to run this script 4 times, for perturbing each enzyme individually and then all 3 together
- To plot figure S.8c-f you need to run this script 4 times, for perturbing each enzyme individually and then all 3 together
- You can change which enzyme to perturb within the script.
- Each run consisting of 10x10x5 ODE simulations takes around 4 hours.

** S.VI_phenotype_perturbation_analysis
** S.I_phenotype_perturbation_analysis
- This is a sensitivity analysis test where we want to study how the NRA predicted anthranilate yield and the actual anthranilate production rate in nonlinear simulations
varies depending on how close / far we are from the reference phenotype.
- We vary the allowable fold changes in concentration from 2-fold to 20-fold
- We also vary the allowable fold changes in enzyme activity from 2-fold to 10-fold
- For each combination of enzyme activity and concentration fold changes, we generate the top design using NRA and simulate it in a bioreactor
- You can either run this script once or run it in batches on parallel CPUs
- For a given combination of concentration fold change and enzyme activity change, it takes around 20-30 minutes. Note the lower time, since we only simulate a single design (the top design).
- The results of this script are used for plotting figure S.7
- The results of this script are used for plotting figure S.1

** S.VII_double_mutant
- Two enzymes - DDPA and GLUDy - appear in all of our top 5 targets (see II_design_robustness and III_get_fold_changes_top_design)
- We wanted to know (prompted by a reviewer) what happens if we only target these two and how this double mutant performs compared to the triple mutant
- The results of the reactor simulations in this script as used to plot figure S.8
** S.II_MCA_design_analysis
- This code examines the impact of constraining the allowable fold change in enzyme activities on MCA based designs
- It is used for the study presented in Supplementary note II
- For 3 different allowable fold changes in enz activity (1.2, 2, and 5) we target the top 3 yield control coefficients for anthranilate with respect to glucose that do not adversely impact growth.
- The script in total takes less than 20 minutes to run.
- The results are required to get figures S.2, S.3 and S.4

** S.VIII_i_aroGtktA_analysis_design_generation
- Prompted by reviews, we wanted to find out why our in-silico version of the experimentally implemented strain aroG^fbrtktA had such poor titers compared to the experimental implementation
** S.V_i_K_trpD9923_d2_analysis_design_generation
- Prompted by reviews, we wanted to find out why our in-silico version (K_trpD9923_d2) of the experimentally implemented strain aroG^fbrtktA had such poor titers compared to the experimental implementation
- We use NRA to propose 4 additional modifications (apart from DDPA, TKT1 and TKT2) to increase anthranilate yield for each model
- We generated designs within 99% of the maximum predicted increase in anthranilate yield for each model
- The designs are then used by the next script for pruning

** S.VIII_ii_aroGtktA_analysis_design_collation
** S.V_ii_K_trpD9923_d2_analysis_design_collation
- After running the previous script, this script takes in all the generated designs, collates them and finds the most frequently occurring one.

** S.VIII_iii_aroGtktA_analysis_design_verification
** S.V_iii_K_trpD9923_d2_analysis_design_verification
- In this script, we verify the top design that occurs 6/10 times across the models.
- We do this in a reactor setting.
- The output of this script is used to generate figure S.9 in the supplementary material.
- The output of this script is used to generate figure S.6 in the supplementary material.

** S.VII_double_mutant
- Two enzymes - DDPA and GLUDy - appear in all of our top 5 targets (see II_design_robustness and III_get_fold_changes_top_design)
- We wanted to know (prompted by a reviewer) what happens if we only target these two and how this double mutant performs compared to the triple mutant
- The results of the reactor simulations in this script as used to plot figure S.9


219 changes: 219 additions & 0 deletions study-1-K_trpD9923/scripts/S.II_MCA_design_analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
"""
This code is used to run the experiments presented in Supplementary Note II - MCA design.
For each of the 10 models, we do the following
1. We identify the yield control coefficients (YCCs) for anthranilate wrt glucose
2. We then prune the YCCs for those enzymes that have the same sign of the YCC as they do for their
biomass control coefficient. This ensures that perturbing this enzyme does not negatively impact
growth
3. We then choose the top pruned YCCs and perturb them in bioreactor simulations with 1.2 fold,
2 fold, and 5 fold changes in enzyme activity
"""
import numpy as np
import pandas as pd

from pytfa.io.json import load_json_model
from pytfa.optim.constraints import *

from skimpy.io.yaml import load_yaml_model
from skimpy.io.regulation import load_enzyme_regulation
from skimpy.analysis.oracle.load_pytfa_solution import load_fluxes, load_concentrations, load_equilibrium_constants
from skimpy.core.parameters import load_parameter_population
from skimpy.core.solution import ODESolutionPopulation
from skimpy.utils.tabdict import TabDict
from skimpy.utils.namespace import NET, QSSA
from skimpy.simulations.reactor import make_batch_reactor

from NRAplus.core.model import NRAmodel
from NRAplus.analysis.design import enumerate_solutions

import os
import glob
import time as T
import matplotlib.pyplot as plt


SIMULATE_IN_REACTOR = True

# Design parameters
MAX_FOLD_ENZ_ACTIVITY = [1.2, 2, 5]
N_ENZ = 3

# Cellular parameters
CONCENTRATION_SCALING = 1e9 # 1 mol to 1 mmol
TIME_SCALING = 1 # 1hr
DENSITY = 1105 # g/L
GDW_GWW_RATIO = 0.3 # Assumes 70% Water
flux_scaling_factor = 1e-3 * (GDW_GWW_RATIO * DENSITY) * CONCENTRATION_SCALING / TIME_SCALING
concentrations_to_plot = { 'anth_e': 136.13 / CONCENTRATION_SCALING,
'biomass_strain_1': 0.28e-12 / 0.05}

# Ode simulation parameters
TOTAL_TIME = 65
N_STEPS = 1000

# Paths
path_to_tmodel = './../models/tfa_model.json'
path_to_kmodel = './../models/kinetic_model_scaffold.yaml'
path_to_tfa_samples = './../data/tfa_samples.csv'
path_to_params = './../data/kinetic_params_top_10_models.hdf5'
path_to_regulation_data = './../data/allosteric_regulations.csv'


# Load models
tmodel = load_json_model(path_to_tmodel)
kmodel_draft = load_yaml_model(path_to_kmodel)

# Add regulation data to kinetic model
df = pd.read_csv(path_to_regulation_data)
df_regulations_all = df[df['reaction_id'].isin(list(kmodel_draft.reactions.keys()))]
df_regulations_all = df_regulations_all[df_regulations_all['regulator'].isin(list(kmodel_draft.reactants.keys()))]
kmodel = load_enzyme_regulation(kmodel_draft, df_regulations_all)

# Get the correct parameter sets
parameter_population = load_parameter_population(path_to_params)
kinetic_models = list(parameter_population._index.keys())

# Get the list of parameters for which we want control coefficients
transport_reactions = ['vmax_forward_' + r.id for r in tmodel.reactions
if (len(r.compartments) > 1)
and not ('í' in r.compartments)
and not r.id.startswith('LMPD_')
and r.id in kmodel.reactions]
transport_reactions.append('vmax_forward_ATPM')
parameter_list = TabDict([(k, p.symbol) for k, p in kmodel.parameters.items()
if p.name.startswith('vmax_forward')
and str(p.symbol) not in transport_reactions])

# Prepare the kinetic models for MCA
kmodel.prepare()
kmodel.compile_mca(mca_type=NET, sim_type=QSSA, parameter_list=parameter_list, ncpu=4)

# Reactor initialization
reactor = make_batch_reactor('single_species.yaml', df_regulation= df_regulations_all)
reactor.compile_ode(add_dilution=False)
reactor_volume = reactor.models.strain_1.parameters.strain_1_volume_e.value

# Loop through all kinetic models, get NRA designs and test everything in a bioreactor setup
for this_model in kinetic_models:

# Get TFA and kinetic model indices
list_ix = this_model.split(',')
tfa_ix = int(list_ix[0])
kin_ix = int(list_ix[1])

folder_for_output = './../output/mca-study/{}'.format(this_model)
if not os.path.exists(folder_for_output):
os.makedirs(folder_for_output)

# Load tfa sample and kinetic parameters into kinetic model
tfa_sample = pd.read_csv(path_to_tfa_samples, header=0, index_col=0).iloc[tfa_ix]
parameter_set = parameter_population['{}'.format(this_model)]
kmodel.parameters = parameter_set

# Load steady-state fluxes and concentrations into kmodel
fluxes = load_fluxes(tfa_sample, tmodel, kmodel,
density=DENSITY,
ratio_gdw_gww=GDW_GWW_RATIO,
concentration_scaling=CONCENTRATION_SCALING,
time_scaling=TIME_SCALING)
concentrations = load_concentrations(tfa_sample, tmodel, kmodel,
concentration_scaling=CONCENTRATION_SCALING)

# Fetch equilibrium constants
load_equilibrium_constants(tfa_sample, tmodel, kmodel,
concentration_scaling=CONCENTRATION_SCALING,
in_place=True)

"""
GET TOP CCs
"""
# Get yield control coefficients
flux_control_coeff = kmodel.flux_control_fun(fluxes, concentrations, [parameter_set])
concentration_control_coeff = kmodel.concentration_control_fun(fluxes, concentrations, [parameter_set])
ccc = concentration_control_coeff.mean('sample')
fcc = flux_control_coeff.mean('sample')
yield_cc = fcc.loc['ANS'] - fcc.loc['GLCtex']

# Only those enzymes which don't kill biomass can be targets
biomass_cc = fcc.loc['LMPD_biomass_c_1_420']

# Modified yield control coefficients - YCCs that have the same sign for biomass and anthranilate yield
enz_to_keep = []
for this_enz in yield_cc.index:
sign_ycc = np.sign(yield_cc.loc[this_enz])
sign_gcc = np.sign(biomass_cc.loc[this_enz])

if sign_ycc == sign_gcc:
enz_to_keep.append(this_enz) # If they have the same directional impact, keep it

yield_cc = yield_cc.loc[enz_to_keep]
highest_ycc = yield_cc.abs().sort_values().index[-N_ENZ:]

# get combined ccs
df_combined = pd.concat([yield_cc, biomass_cc.loc[enz_to_keep]], axis=1)
df_combined.loc[df_combined.abs().sort_values(by=0, ascending=False).index].to_csv(folder_for_output + '/combined_ccs.csv')

'''
Reactor simulations
'''
def reset_reactor():
reactor.parametrize(parameter_set, 'strain_1')
reactor.initialize(concentrations, 'strain_1')
reactor.initial_conditions['biomass_strain_1'] = 0.037 * 0.05 / 0.28e-12

for met_ in reactor.medium.keys():
LC_id = 'LC_' + met_
LC_id = LC_id.replace('_L', '-L')
LC_id = LC_id.replace('_D', '-D')
reactor.initial_conditions[met_] = np.exp(tfa_sample.loc[LC_id]) * 1e9

# Not nice but necessary
reactor.models.strain_1.parameters.strain_1_volume_e.value = reactor_volume
reactor.models.strain_1.parameters.strain_1_cell_volume_e.value = 1.0 # 1.0 #(mum**3) look up cell volume bionumbers
reactor.models.strain_1.parameters.strain_1_cell_volume_c.value = 1.0 # 1.0 #(mum**3)
reactor.models.strain_1.parameters.strain_1_cell_volume_p.value = 1.0 # 1.0 #(mum**3)
reactor.models.strain_1.parameters.strain_1_volume_c.value = 0.9 * 1.0 # (mum**3)
reactor.models.strain_1.parameters.strain_1_volume_p.value = 0.1 * 1.0 # (mum**3)

''' 1. Wildtype '''
reset_reactor()
start = T.time()
sol_ode_wildtype = reactor.solve_ode(np.linspace(0, TOTAL_TIME, N_STEPS),
solver_type='cvode',
rtol=1e-9,
atol=1e-9,
max_steps=1e9,
)
end = T.time()
print("WT: {}".format(end-start))

''' 2. MCA based designs '''
sols_ode_fcc = []
for this_enz_activity in MAX_FOLD_ENZ_ACTIVITY:
reset_reactor()

# Manipulate the top fccs proportionally (proportional to their magnitude)
for this_ycc in highest_ycc:
reactor.parameters['strain_1_'+this_ycc].value *= \
this_enz_activity**(np.sign(yield_cc.loc[this_ycc]))

start = T.time()
sol_ode_fcc = reactor.solve_ode(np.linspace(0, TOTAL_TIME, N_STEPS),
solver_type='cvode',
rtol=1e-9,
atol=1e-9,
max_steps=1e9,
)
end = T.time()
print("MCA: {}".format(end-start))
sols_ode_fcc.append(sol_ode_fcc)

sol_odes_all = [sol_ode_wildtype]
sol_odes_all.extend(sols_ode_fcc)
solpop = ODESolutionPopulation(sol_odes_all, np.arange(0, len(sol_odes_all)))
solpop.data.to_csv(folder_for_output + '/ode_solutions.csv')
del solpop

# Print outputs to file
yield_cc.loc[highest_ycc].to_csv(folder_for_output + '/top_yccs.csv')
biomass_cc.loc[highest_ycc].to_csv(folder_for_output + '/top_yccs_growth.csv')
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
-- enumerate top designs within 99%
-- implement designs in reactor
Results are used to plot figure S.9 (./plot/figure_S.7.py)
Results are used to plot figure S.1 (./plot/figure_S.1.py)
"""
import numpy as np
import pandas as pd
Expand Down Expand Up @@ -118,7 +118,7 @@
tfa_ix = int(list_ix[0])

# Create folder for output
output_folder = './../output/data/S.VI-phenotype-perturbation-sensitivity/{}-fold-conc-change-{}-fold-enz-activity/{}/'.format(this_fold_conc_change,
output_folder = './../output/data/S.I-phenotype-perturbation-sensitivity/{}-fold-conc-change-{}-fold-enz-activity/{}/'.format(this_fold_conc_change,
this_fold_enz_activity,
this_model
)
Expand Down
35 changes: 2 additions & 33 deletions study-1-K_trpD9923/scripts/S.VII_double_mutant.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
We found that DDPA and GLUDy appeared in each of our top 5 designs. Hence, we decided to test the performance of just
this double mutant as suggested by one of the reviewers. The data is used to generate figure S.10 in supplementary note
this double mutant as suggested by one of the reviewers. The data is used to generate figure S.9 in supplementary note
S.VII
"""
import numpy as np
Expand Down Expand Up @@ -172,9 +172,7 @@ def reset_reactor():
'biomass_strain_1': 0.28e-12 / 0.05}


# Plotting - NOTE this is now done separately by the script in the plot folder (figure_S.10)
import matplotlib.pyplot as plt

# Collating data
df_sols_design = []
df_sols_wt = []
for ix in range(10):
Expand All @@ -187,35 +185,6 @@ def reset_reactor():

df_sols_wt = pd.concat(df_sols_wt)
df_sols_design = pd.concat(df_sols_design)
#
# df_wt_mean = df_sols_wt.groupby('time').quantile(0.5)
# df_wt_upper = df_sols_wt.groupby('time').quantile(0.75)
# df_wt_lower = df_sols_wt.groupby('time').quantile(0.25)
#
# df_designs_mean = df_sols_design.groupby('time').quantile(0.5)
# df_designs_upper = df_sols_design.groupby('time').quantile(0.75)
# df_designs_lower = df_sols_design.groupby('time').quantile(0.25)

# for conc, scaling in concentrations_to_plot.items():
# time = df_wt_mean.index
# # Plot wildtype
# plt.plot(time, df_wt_mean[conc] * scaling, color='orange', label='wt')
# plt.fill_between(time, df_wt_lower[conc] * scaling, df_wt_upper[conc] * scaling, facecolor='orange',
# interpolate=True,
# alpha=0.1)
#
# # Plot design alternatives
# plt.plot(time, df_designs_mean[conc] * scaling, color='blue', label='NOMAD - DDPA + GLUDy')
# plt.fill_between(time, df_designs_lower[conc] * scaling, df_designs_upper[conc] * scaling,
# facecolor='blue',
# interpolate=True,
# alpha=0.1)
#
# plt.legend()
# plt.xlabel('time (h)')
# plt.ylabel(conc.replace('strain_1', '') + ' (g/L)')
# plt.savefig(folder_for_figures + '/ddpa_gludy_only_{}.png'.format(conc))
# plt.close()

df_sols_wt.to_csv(folder_for_output + '/ode_sols_wt.csv')
df_sols_design.to_csv(folder_for_output + '/ode_sols_ddpa_gludy.csv')
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@
tfa_ix = int(list_ix[0])

# Create folder for output
output_folder = './../output/data/S.VIII-aroGtktA-analysis/design-generation/{}/'.format(this_model)
output_folder = './../output/data/S.V-aroGtktA-analysis/design-generation/{}/'.format(this_model)
if not os.path.exists(output_folder):
os.makedirs(output_folder)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@


# Paths
path_to_data = './../output/data/S.VIII-aroGtktA-analysis/design-generation/{}'
path_to_data = '../output/data/S.V-aroGtktA-analysis/design-generation/{}'
kinetic_models = ['1712,6', '1715,6', '1717,6', '2392,6', '2482,8', '3927,1', '4230,7', '4241,5', '4456,7', '4468,6']

# Collate all the designs into one dataframe with all the designs and the corresponding enz activity changes and NRA solution
Expand Down
Loading

0 comments on commit 4d1c634

Please sign in to comment.