The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered.
SPP_DCJ is Integer Linear Program-based algorithm to solve the SPP for natural genomes, i.e., genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments.
The following steps show howto run SPP_DCJ with Gurobi.
SPP_DCJ requires (i) a given phylogeny and (ii) a table with candidate adjacencies for all genomes corresponding to nodes of the given phylogeny. The candidate adjacency table has the following columns:
#Species Gene_1 Ext_1 Species Gene_2 Ext_2 Weight
The phylogeny must be given as table format:
#Child Parent
Generate ILP (-a is a parameter of the objective function that is set here to 1):
spp-dcj ilp -a 1 -m example/idmap.txt example/tree.txt example/adjacencies.txt > example/example.ilp
Compute heuristic solution for warm starting the solver (optional)
spp-dcj heuristic -a 1 example/tree.txt example/adjacencies.txt example/idmap.txt > example/example_init.sol
Run ILP (you may omit the InputFile argument in case you don’t want to warm-start the solver)
SPP-DCJ
Introduction
The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered.
SPP_DCJis Integer Linear Program-based algorithm to solve the SPP for natural genomes, i.e., genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments.SPP_DCJis an extension of DING.Installation
From bioconda channel
Make sure you have conda/mamba installed!
From repository
Dependencies
How to run
The following steps show howto run
SPP_DCJwith Gurobi.SPP_DCJrequires (i) a given phylogeny and (ii) a table with candidate adjacencies for all genomes corresponding to nodes of the given phylogeny. The candidate adjacency table has the following columns:#Species Gene_1 Ext_1 Species Gene_2 Ext_2 WeightThe phylogeny must be given as table format:
#Child ParentGenerate ILP (
-ais a parameter of the objective function that is set here to 1):spp-dcj ilp -a 1 -m example/idmap.txt example/tree.txt example/adjacencies.txt > example/example.ilpCompute heuristic solution for warm starting the solver (optional)
spp-dcj heuristic -a 1 example/tree.txt example/adjacencies.txt example/idmap.txt > example/example_init.solRun ILP (you may omit the
InputFileargument in case you don’t want to warm-start the solver)gurobi_cl InputFile=example/example_init.sol ResultFile=example/example.sol example/example.ilpExtract adjacencies from solution
spp-dcj sol2adj example/example.sol example/idmap.txt > example/resolved_adjacencies.txtVisualize candidate and predicted adjacencies
spp-dcj draw -i example/resolved_adjacencies.txt example/adjacencies.txt > example/adjacencies.pdfExample
The code above is included in the directory in
example.