A script for analysing clusters in n-body simulations.
Clusters are identified using DBSCAN, INDICATE and the star's energies. The clusters are then compared to a standard Maschberger IMF model with the given upper and lower bounds using 1-way ks-tests and Cramér-von Mises tests.
Chi-squared tests are also used to compare the histogram of the mass distribution, but the results of these are not statistically valid due to low bin counts - the results are just for curiosity.
python3 ./cluster_tool.py data_file output_dir [options]
Option | Description |
---|---|
data_file |
Path to input data file |
output_dir |
Path to output directory |
Option | Description | Default |
---|---|---|
--data_sep [DATA_SEP] |
Separator used in simulation data | ', ' |
--data_header [DATA_HEADER] |
Path to a text file containing headers for simulation data | None - Read header from data file |
-d [DIMENSIONS], --dimensions [DIMENSIONS] |
No. of dimensions to run the analysis in | 2 |
-e [EPS], --eps [EPS] |
DBSCAN: Maximum distance between stars in a cluster | 3.0 |
-m [MIN_SAMPLES], --min_samples [MIN_SAMPLES] |
DBSCAN: Minimum no. of stars per cluster | 10 |
-u [N_DIST], --n_dist [N_DIST] |
INDICATE: No. of uniform distributions | 1 |
-n [NEAREST_NN], --nearest_nn [NEAREST_NN] |
INDICATE: No. of nearest neighbours | 5 |
--pos_axes [POS_AXES] |
Column names of position axes | 'x,y,z ' |
--vel_axes [VEL_AXES] |
Column names of velocity axes | 'v_x,v_y,v_z ' |
--min_mass [MIN_MASS] |
Minimum mass | 0.1 |
--max_mass [MAX_MASS] |
Maximum mass | 50 |
-f, --force-all-steps |
Force all steps to run - even if they have already been run before. | False |
-h, --help |
Show a help message and exit | N/A |
This program requires Python 3 (ideally >= 3.11), and pip is required to install other Python libraries.
- matplotlib ~= 3.7
- numpy ~= 1.25
- scipy ~= 1.11
- scikit-learn ~= 1.3
- pandas ~= 2.0
You can quickly install these in either of two ways:
-
Install using pip and requirements.txt (this installs them into your system or user packages)
python3 -m pip install -r requirements.txt
-
Install using pipenv and Pipfile.lock (this installs them separately from system or user packages)
python3 -m pip install pipenv pipenv install
Note that using this method, you will have to run the script using pipenv like so:
pipenv run ./cluster_tool.py [...]
The following columns are required:
Name | Description |
---|---|
snapshot |
Snapshot number - must start from 0 |
star_id |
Star's unique id number |
mass |
Mass of the star |
x |
x co-ordinate of the star's position |
y |
y co-ordinate of the star's position |
z |
z co-ordinate of the star's position |
v_x |
x component of the star's velocity |
v_y |
y component of the star's velocity |
v_z |
z component of the star's velocity |
The main output file will have the columns included in the input file plus the following columns:
Name | Description |
---|---|
dbscan_cluster_id |
Initial cluster determined by DBSCAN |
indicate_index |
Star's INDICATE index |
indicate_sig_index |
Significant INDICATE index for that snapshot |
ke |
Star's kinetic energy |
pe |
Star's potential energy |
indicate_clustered |
Whether or not the star's INDICATE index is above the significant index |
bound_clustered |
Whether or not the star is gravitationally bound |
closest_cluster_id |
The nearest cluster to the star as found by DBSCAN |
The first two columns are the snapshot and the cluster id:
Name | Description |
---|---|
snapshot |
Snapshot number |
cluster |
Cluster id |
After that lies the results for the statistical tests performed on the clusters:
Name | Description |
---|---|
no-of-stars |
Number of stars in the cluster |
ks-statistic |
ks-test statistic |
ks-pvalue |
ks-test p-value |
cs-statistic |
Chi-squared test statistic (do not use) |
cs-pvalue |
Chi-squared test p-value (do not use) |
cvm-statistic |
CvM test statistic |
cvm-pvalue |
CvM test p-value |
Clusters are determined by multiple methods, and the column names for the results of each method has a prefix added:
Method | Prefix | Logic |
---|---|---|
DBSCAN | None | Just use dbscan_cluster_id |
DBSCAN + INDICATE | indicate_ |
If indicate_index > indicate_sig_index : use closest_cluster_id ; else use -1 |
DBSCAN + graviationally bound | bound_ |
If (ke + pe ) < 0: use closest_cluster_id ; else use -1 |
DBSCAN + INDICATE + graviationally bound | indicate+bound_ |
If (indicate_index > indicate_sig_index ) and ((ke + pe ) < 0): use closest_cluster_id ; else use -1 |
You can contribute to this project in multiple ways:
- 🐛 Reporting bugs: If you find any bugs, please let us know by opening an issue.
- ✨ Feature requests: If there's a feature that would be really useful to add, let us know on the discussion board.
- 📖 Documentation: If there's any errors or missing parts to any documentation, you can make improvements and open a pull request.
- 🖥️ Code: This project is open source, so anyone is free to modify the code as they wish. If you implement new functionality that may be useful to others, please consider opening a pull request.
There is no guarantee that there are not mistakes in the code! Use at your own risk.
- Maschberger, T., “On the function describing the stellar initial mass function”, Monthly Notices of the Royal Astronomical Society, vol. 429, no. 2, pp. 1725–1733, 2013. doi:10.1093/mnras/sts479.
- Buckner, A. S. M., “The spatial evolution of young massive clusters. I. A new tool to quantitatively trace stellar clustering”, Astronomy and Astrophysics, vol. 622, 2019. doi:10.1051/0004-6361/201832936.
- Blaylock-Squibbs, G. A., Parker, R. J., Buckner, A. S. M., and Güdel, M., “Investigating the structure of star-forming regions using INDICATE”, Monthly Notices of the Royal Astronomical Society, vol. 510, no. 2, pp. 2864–2882, 2022. doi:10.1093/mnras/stab3447.
Cluster Tool is licensed under the MIT License.
The INDICATE module is based off code by George Baylock-Squibbs, which is based off abuckner89/INDICATE by Anne S.M. Buckner (used under the MIT License).