Skip to content

A Fast Branch and Bound Algorithm for the Perfect Tumor Phylogeny Reconstruction Problem

Notifications You must be signed in to change notification settings

algo-cancer/PhISCS-BnB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhISCS-BnB

PhISCS-BnB is a fast tool for reconstructing the perfect tumor phylogeny using single-cell data via branch and bound.

PhISCS-BnB has been published in Bioinformatics (Proceedings of ISMB 2020) (doi:10.1093/bioinformatics/btaa464). If you find this code useful in your research, please consider citing.

@article{azer2020phiscs,
  doi           = {10.1093/bioinformatics/btaa464},
  url           = {https://doi.org/10.1093/bioinformatics/btaa464},
  year          = 2020,
  month         = jul,
  publisher     = {Oxford University Press ({OUP})},
  volume        = {36},
  number        = {Supplement{\_}1},
  pages         = {i169--i176},
  author        = {Erfan {Sadeqi Azer} and Farid {Rashidi Mehrabadi} and Salem Maliki{\'{c}} and Xuan Cindy Li and Osnat Bartok and Kevin Litchfield and Ronen Levy and Yardena Samuels and Alejandro A Sch\"{a}ffer and E Michael Gertz and Chi-Ping Day and Eva P{\'{e}}rez-Guijarro and Kerrie Marie and Maxwell P Lee and Glenn Merlino and Funda Ergun and S Cenk Sahinalp},
  title         = {{{PhISCS}-{BnB}: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem}},
  journal       = {Bioinformatics}
}

Contents

  1. Installation
  2. Running
  3. Example
  4. Contact

Installation

PhISCS-BnB is written in Python. It supports Python 3. Currently it is intended to be run on POSIX-based systems (only Linux and macOS have been tested).

~$ git clone https://github.com/algo-cancer/PhISCS-BnB.git
~$ cd PhISCS-BnB
~$ pip install -r requirements.txt
~$ python main.py --help

Running

Input

Single-cell input is assumed to be represented in the form of ternary, tab-delimited, matrix with rows corresponding to single-cells and columns corresponding to mutations. We assume that this file contains headers and that matrix is ternary matrix with 0 denoting the absence and 1 denoting the presence of mutation in a given cell, whereas ? represents the lack of information about presence/absence of mutation in a given cell (i.e. missing entry). In order to simplify parsing of the matrix, we also assume that upper left corner equals to string cellID/mutID.

Below is an example of single-cell data matrix. Note that mutation and cell names are arbitrary strings not containing tabs or spaces, however they must be unique.

cellID/mutID  mut0  mut1  mut2  mut3  mut4  mut5  mut6  mut7
cell0         0     0     ?     0     0     0     0     0
cell1         0     ?     1     0     0     0     1     1
cell2         0     0     1     0     0     0     1     1
cell3         1     1     0     0     0     0     0     0
cell4         0     0     1     0     0     0     0     0
cell5         1     0     0     0     0     0     0     0
cell6         0     0     1     0     0     0     1     1
cell7         0     0     1     0     0     0     0     0
cell8         ?     0     0     0     ?     0     ?     1
cell9         0     1     0     0     0     0     0     0

Output

The program will generate a file in OUT_DIR folder (which is set by argument -o or --outDir). This folder will be created automatically if it does not exist.

The output matrix is also a tab-delimited file having the same format as the input matrix, except that eliminated mutations (columns) are excluded (so, in case when mutation elimination is allowed, this matrix typically contains less columns than the input matrix). Output matrix represents genotypes-corrected matrix (where false positives and false negatives from the input are corrected and each of the missing entries set to 0 or 1). Suppose the input file is INPUT_MATRIX.ext, the output matrix will be stored in file OUT_DIR/INPUT_MATRIX.CFMatrix. For example:

 input file: data/ALL2.SC
output file: OUT_DIR/ALL2.CFMatrix

Parameters

Parameter Description Default Mandatory
-i Path to single-cell data matrix file - 🔘
-o Output directory current
-b Bounding algorithm 1
-t Draw output tree with Graphviz -

Example

~$ python main.py -i example/data1.SC -o example -b 2 -t

[02/04 12:53:49] Size: (20, 20)
[02/04 12:53:49] NAValue: 3
[02/04 12:53:49] #Zeros: 226
[02/04 12:53:49] #Ones: 94
[02/04 12:53:49] #NAs: 80
[02/04 12:53:49] Time: 00:00:00
[02/04 12:53:49] #0->1: 10
[02/04 12:53:49] #1->0: 0
[02/04 12:53:49] #na->0: 60
[02/04 12:53:49] #na->1: 20
[02/04 12:53:49] isDone: True
[02/04 12:53:49] The output phylogenetic tree is in 'example' directory!

This is the clonal tree that has been created:

For each node, the number inside the brackets denotes its node id and the number inside the parentheses shows the total number of mutations occurring on the path from the germline (root) to the node (i.e., the total number of mutations harbored by the node). The edge labels represent the number of mutations occurring between a parent and its child node. The complete list of mutations occurring at each edge can be found data1.mutsAtEdges file which contains:

[12]->[11]: mut7 mut8 mut9 mut10 mut11
[12]->[10]: mut0 mut2
[11]->[9]: mut15 mut16 mut17
[10]->[8]: mut1
[9]->[5]: mut18
[8]->[7]: mut3
[8]->[1]: mut12 mut14
[7]->[6]: mut5
[6]->[4]: mut4
[5]->[2]: mut19
[4]->[3]: mut6
[1]->[0]: mut13

Contact

If you have any questions please e-mail us at [email protected] or [email protected].