Skip to content

james-cole/PyBrainAge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyBrainAge

PyBrainAge development

Here we introduce PyBrainAge (beta version) which is a Brain-age model designed to estimate age (brain-age) based on structural T1-weighted MRI brain scans. Below, you will find additional information about the model and instructions on running it with your preprocessed T1-weighted MRI data using FreeSurfer. BETA version: Your feedback/results using our model are most welcome. Please do get in touch.

Dataset

Data used for training the model consisted of a sample size of N=29,175 from, primarily, the UK Biobank dataset, covering a wide age range from 2 to 100 years. The sample included 51.1% female participants, with an average age of 46.9 years and a standard deviation of 24.4 years. The data was obtained from a healthy participant population across 76 different sites. For more in-depth information about the characteristics of this sample, please refer to the Rutherford et al 2022 study: [(https://elifesciences.org/articles/72904)] (Supplementary File 1 contains the complete sample description; University of Michigan datasets were excluded). Before running the model on your data, it may be worth checking if your sample overlaps with the training set as this is an important consideration when interpreting the results.

Warning: Do not use the PyBrainAge model if your input data includes UK Biobank data or other datasets included in the training set (see details above) as this would comprimise the validity of your results.

Model inputs and model training

The model utilised neuroimaging features including cortical thickness and subcortical volumes based on the Destrieux atlas parcellations. These were extracted using FreeSurfer's aparcstats2table function with aparc.a2009s for both the left and right hemispheres and asegstats2table function to extract subcortical volumes, resulting in a total of 187 input features. The dataset was then divided into training and test sets using the train_test_split function from sklearn, with a test size of 0.2. Both training and test sets were standardized using the StandardScaler function.

An Extra Trees regression was used to predict chronological age from the neuroimaging input features in the training set. This was achieved using the ExtraTreesRegressor function from sklearn.

Model testing and performance measure

After training, the model was evaluated using the testing set (N=29,174) which allowed us to assess its performance. The evaluation produced several performance measures, including a Mean Absolute Error (MAE) = 4.7 years, R-squared = .42 and a correlation coefficient, r = .66.

Alt Text

How to use PyBrainAge

To extract brain-age estimates of your structural T1s using PyBrainAge, you will need to follow these steps:

1 Prepare the data

a. Extract the neuroimaging input features based on the Destrieux brain atlas, using Freesurfer (aparcstats2table with aparc.a2009s and asegstats2table).

b. Create a .csv file that contains 189 columns, with the first column being 'ID', the second column 'Age', and the remaining 187 neuroimaging features extracted from the previous step. The precise list of features can be found in ROIS_input_template.txt on this GitHub page. Please ensure that the order and naming of the columns match this template perfectly. We'll refer to this file as 'ROIs.csv.' Please note that Freesurfer uses 'and' instead of '&' in its naming convention, and 'Left-Thalamus-Proper' refers to the 'Left-Thalamus', while 'Right-Thalamus-Proper' refers to the 'Right-Thalamus'.

c. It is advisable to run QC of your neuroimaging data before and after Freesurfer preprocessing (see https://elifesciences.org/articles/72904 for suggestions).

d. Missing ROIs: Occasionally, Freesurfer may produce NaNs for certain ROIs. Pybrainage cannot process data with NaNs. To handle such cases without sacrificing them, you might consider imputing missing values for the affected ROIs. For example, you can use KNNImputer from the sklearn.impute module, especially when dealing with a small number of missing ROIs.

Warning: Do not use the PyBrainAge model if your input data includes UK Biobank data given this was used to train the model. Including UK Biobank data would comprimise the validity of your results.

2 Create a Conda Environment

Create and activate a new Anaconda environment as follows:

conda create  --name pybrainage_env python=3.7 scikit-learn=0.24.2 pandas=1.3.4 numpy=1.20.3
conda activate pybrainage_env 

Here we are specifying precise versions of Python, scikit-learn, pandas, and numpy to operate within the isolated environment, ensuring it does not interfere with your other environments. This approach is essential to eliminate warning messages and potential errors caused by version conflicts, especially concerning scikit-learn.

However, please be aware that even with these configurations, you may encounter the following message. Please ignore!

UserWarning: Trying to unpickle estimator StandardScaler from version 1.2.0 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk 

To verify whether the activation of the "pybrainage_env" environment was successful, execute the following command:

conda env list

You should observe an asterisk (*) next to "pybrainage_env," confirming that you are currently working within the "pybrainage_env" environment you have just created.

3 Run PyBrainAge using predict.py

The script requires the following inputs:

  1. Your ROIs.csv input file.
  2. scaler.pkl (available on this GitHub page).
  3. ExtraTreesModel (downloaded from Zenodo, use this link: https://doi.org/10.5281/zenodo.10406573)

The output will be saved in a PyBrainAge_Output.csv file, containing ID, Age, Brain-Age, and Brain-PAD (refer to the note below for more details on Brain-PAD).

Brain-PAD

Typical Brain-age analyses involve calculating Brain-PAD (Brain Predicted Age Difference), also referred to as BrainGAP, BrainAge Delta, or similar variations. Brain-PAD is determined by subtracting chronological age from Brain-age ("Age" minus "Brain-age" columns, which is already calculated for you in the PyBrainAge_Output.csv file). This metric can be utilised to examine associations with health outcomes. For further detailed discussion, refer to the work of Cole and Franke (2017). For example, larger Brain-PADs (older-appearing brains) have been associated to an increased risk in a future diagnosis of dementia in memory clinic patients (Biondo et al, 2022).

Statistical analyses involving Brain-PAD (or Brain-age) values require minimising the regression-to-the-mean effect by including linear and non-linear terms of age as covariates (see de Lange and Cole, 2020).

Acknowledgements and References

This software was co-created: by Andre Marquand1, Saige Rutherford1, Ayodeji Ijishakin2, Francesca Biondo2 & James Cole2. 1 The Donders Institute, Netherlands; 2 University College London (UCL), United Kingdom.

Citations:

We have not used this software in published work yet. However, if you need a reference for this software, feel free to cite this page or the Rutherford et al paper in the list of references below.

References

Rutherford, Fraza, Dinga, Kia, Wolfers, Zabihi... Beckmann and Marquand (2022). Charting brain growth and aging at high spatial precision. eLife. DOI

Cole and Franke (2017). Predicting age using neuroimaging: innovative brain ageing biomarkers. Trends in Neurosciences. DOI

Biondo, Jewell, Pritchard, Aarsland, Steves, Mueller and Cole (2022). Brain-age is associated with progression to dementia in memory clinic patients. Neuroimage: Clinical. DOI

de Lange & Cole (2020). Commentary: Correction procedures in brain-age prediction. Neuroimage: Clinical DOI

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages