Reconciliation-Assisted Divergence Time Estimation (RADTE / rædˈti:) is a method to date gene trees with the aid of dated species trees.
This program can handle a rooted gene tree containing duplication/loss events.
The divergence time of duplication nodes are estimated while constraining speciation nodes by transferring the known or pre-estimated divergence time from the species tree to the gene tree.
Dependency
R: Started developing with 3.5 and most recently tested with 4.1.
If you want the latest repository code, download the radte.r script by, for example, git or svn, and change the file permission.
You can also download a zipped repository from Code -> Download ZIP above.
# With git
git clone https://github.com/kfuku52/RADTE
cd RADTE
# With svn
svn export https://github.com/kfuku52/RADTE/trunk/radte.r
# Change permission
chmod +x ./radte.r
Options
--species_tree
Species tree with estimated divergence time.
By default, leaves (species) should be labeled as GENUS_SPECIES (e.g., Homo_sapiens).
If --species-parser=taxonomic is used, taxonomically qualified labels such as Dictyostelium_cf_discoideum are also accepted.
The tree is expected to be ultrametric and branch lengths should represent evolutionary time (e.g., million years).
Internal nodes including the root node must be uniquely labeled and the same file should be consistently used for NOTUNG/GeneRax and RADTE.
Don’t know how to label internal nodes? Try this R one-liner.
R -q -e "library(ape); t=read.tree('species_tree_noLabel.nwk'); \
t[['node.label']]=paste0('s',1:Nnode(t)); \
write.tree(t, 'species_tree.nwk')"
--gene_tree
Rooted newick tree. By default, leaves (genes) should be labeled as GENUS_SPECIES_GENEID (e.g., Homo_sapiens_ENSG00000102144).
If --species-parser is switched to taxonomic, regex, or map, gene tips are interpreted with that parser instead.
The tree is expected to be non-ultrametric and branch lengths should represent substitutions per site.
Use the tree that NOTUNG produces because its internal nodes are correctly labeled in accordance with the NOTUNG parsable file, another input for this program.
--notung_parsable
An output file from NOTUNG (tested with version 2.9) can be used to acquire the species–gene relationships in phylogeny reconciliation. See Examples for details.
--generax_nhx
Instead of the NOTUNG output, the NHX tree from GeneRax can also be used as an input. If specified, --gene_tree and --notung_parsable will be ignored. See Examples for details.
The NHX species annotation tag S is required for all nodes and must match species-tree node labels.
The duplication tag D is optional (missing values are treated as non-duplication). Accepted duplication values are Y, YES, TRUE, T, 1; accepted non-duplication values are N, NO, FALSE, F, 0.
--species-parser
Optional species-label parser. Default is legacy.
Supported values are:
legacy: current GENUS_SPECIES / GENUS_SPECIES_GENEID behavior.
taxonomic: accepts taxonomically qualified species labels such as Dictyostelium_cf_discoideum_gene123 or Arabidopsis_thaliana_subsp_lyrata_gene456.
regex: extracts species labels from gene tips using --species-regex.
map: resolves species labels from --species-map-tsv.
--species-regex
Required when --species-parser=regex.
RADTE uses the first capture group if the regex contains captures; otherwise it uses the full match as the extracted species label.
--species-map-tsv
Required when --species-parser=map.
This should be a tab-delimited file with label and species columns.
An optional taxonomy_query column can be used to override scientific-name conversion.
--species_node_bounds_tsv
Optional tab-delimited file for species-tree node age constraints.
This file should contain a node column plus either age or the pair age_min / age_max.
The node names must match the labeled internal/root nodes in --species_tree.
The point age implied by the species-tree branch lengths must fall within the supplied interval.
RADTE transfers these bounds to gene-tree speciation nodes. If the same species-tree node corresponds to multiple gene-tree speciation nodes, mcmctree enforces a shared age parameter, while chronos only reuses the same interval bounds.
Accepted examples:
node age
n1 10
root 30
or
node age_min age_max
n1 8 12
root 28 32
Use age if you want an exact calibration for that species-tree node.
Use age_min / age_max if you want a confidence interval to be propagated to the corresponding gene-tree speciation nodes.
--max_age
If duplication nodes are deeper than the root node of the species tree, this value will be used as an upper limit of the root node.
--chronos_lambda
Passed to chronos for divergence time estimation. See chronos in the ape documentation.
--chronos_model
Passed to chronos for divergence time estimation. Supported values are discrete, relaxed, and correlated.
If an unsupported value is given (e.g., typo like difscrete), RADTE now exits with a clear error and suggestion.
See chronos in the ape documentation.
--pad_short_edge
Prohibit dated branches shorter than this value. If detected, the branch length is readjusted by transferring a small portion of branch length from the parent branch.
--allow_constraint_drop
true/false (1/0, yes/no are also accepted). Default is true.
RADTE now first tries to keep all root/speciation constraints by stabilizing conflict-prone bounds.
In this no-drop phase, RADTE also performs multi-seed retries and soft-bound retries (plus alternative chronos model/lambda trials) to avoid numerical failures.
If these chronos retries still fail and --allow_constraint_drop=false, RADTE uses a deterministic no-drop fallback (constraint-preserving node dating without dropping calibration nodes).
If this option is true, RADTE runs the same exhaustive retry pipeline stage-by-stage while dropping constraints in order (RS -> S -> R), moving to the next stage only after the current stage is exhausted.
Set --allow_constraint_drop=false to disable S/R drop stages and keep the run strictly no-drop.
--chronos_attempt_timeout_sec
Per-attempt timeout (seconds) for each chronos call. Use a non-negative number, or inf/none/off to disable per-attempt timeout.
When --allow_constraint_drop=false, RADTE now defaults to 60 seconds to avoid infinite waits and then proceeds to no-drop fallback.
--chronos_total_timeout_sec
Total timeout budget (seconds) across all chronos retries (RS + retry strategies + S/R if enabled). Use a non-negative number, or inf/none/off to disable total budgeting.
When --allow_constraint_drop=false, RADTE now defaults to 300 seconds.
--dating_backend
Dating engine. Supported values are chronos (default) and mcmctree.
chronos uses the current ape::chronos workflow and supports --allow_constraint_drop plus the --chronos_* options.
When speciation-node intervals are supplied through --species_node_bounds_tsv, chronos uses those age.min / age.max bounds but cannot force separate internal nodes to share one estimated age unless the bounds are exact.
mcmctree runs the external PAML program MCMCTree on the reconciled gene tree using the transferred root/speciation calibrations.
For repeated speciation events caused by ancestral duplications, RADTE uses MCMCTree mirror labels (#1, #2, …) so that corresponding gene-tree speciation nodes can share the same age parameter.
The current RADTE integration supports usedata=1 only and requires a MCMCTree-compatible alignment file.
BEAST is not yet integrated.*
--mcmctree_seqfile
Required when --dating_backend=mcmctree.
This should be an alignment file readable by MCMCTree whose taxon names exactly match the gene-tree tip labels.
--mcmctree_bin
Optional path to the mcmctree executable. Default is mcmctree in PATH.
--mcmctree_workdir
Optional staging directory for the MCMCTree run. RADTE writes the generated tree/control files there and captures out.txt, mcmc.txt, FigTree.tre, and the stdout/stderr logs.
--mcmctree_usedata
Passed to MCMCTree as usedata.
The current RADTE integration supports only 1.
For input data, see data/example_generax_01.
For your own data, please run GeneRax and obtain a nhx file for the gene tree.
In Generax, --rec-model UndatedDTL may not be compatible with RADTE, so please use --rec-model UndatedDL.
This transfers the interval for s2 and s1 to the corresponding gene-tree speciation nodes.
With chronos, repeated speciation nodes created by ancestral duplications receive the same interval bounds, but they are not forced to have exactly identical estimated ages unless the bounds are exact.
Check the transferred constraints in:
radte_species_tree.tsv: branch-length point ages and the effective age_min / age_max
radte_gene_tree.tsv: transferred lower_age / upper_age, constraint_sp_node, and shared_speciation_group
Example 4: RADTE with the MCMCTree backend
MCMCTree requires an alignment file whose taxon labels match the gene-tree tips.
If you use PHYLIP sequential format, separate each taxon name from the sequence by at least two spaces because MCMCTree is strict about this.
When the same labeled species-tree node is mapped to multiple gene-tree speciation nodes, RADTE writes MCMCTree mirror labels (#1, #2, …) so that those nodes share one age parameter.
Output files
See data/example_notung_01 and data/example_generax_01 for example files.
radte_gene_tree_output.nwk
This is the main output file of RADTE. Branch lengths represent the estimated evolutionary time.
Node ages represent the estimated divergence time.
The unit of the branch length is the same as that in the input species tree.
radte_*.pdf
RADTE generates pdf files for input and output trees in which nodes are colored (see above examples). Red and blue respectively indicate unconstrained and constrained nodes.
While the divergence time of blue nodes is transferred from the species tree, that of red nodes is estimated.
When the root node is blue, it means the divergence time is either transferred from the species tree or bounded by --max_age.
radte_calibration_all.tsv
This table contains all identified calibration nodes where the divergence time may be transferred from the species tree to the gene tree.
radte_calibration_used.tsv
This table is a subset of radte_calibration_all.tsv and contain only calibration nodes that are used to transfer the divergence time.
RADTE first stabilizes risky descendant/ancestor bounds to keep constraints without dropping nodes.
If --allow_constraint_drop=true (default), a part of calibration points may still be dropped only when all no-drop attempts fail.
radte_gene_tree.tsv
This table summarizes gene tree nodes.
In the column event, S and D respectively denote speciation node or duplication node inferred by Notung or GeneRax.
The root node is indicated as S(R) or D(R).
lower_sp_node and upper_sp_node together indicate which node/branch of the species tree the gene tree node is mapped.
constraint_sp_node identifies speciation constraints that correspond to a single labeled species-tree node, and shared_speciation_group marks repeated speciation nodes that are linked to the same species-tree event.
radte_species_tree.tsv
This table summarizes species tree nodes.
When --species_node_bounds_tsv is used, the table also records the transferred age_min and age_max bounds alongside the branch-length point age.
radte_calibrated_nodes.txt
This file records what types of gene tree nodes are constrained in the divergence time estimation.
RADTE first attempts to constrain all available calibration points transferred from the species tree (R, root node; S, speciation node) for the divergence time estimation by chronos from the ape package.
If the estimation succeeded, the content of this file should be RS, because both R and S nodes were used.
If you supply --species_node_bounds_tsv, RADTE may report RS even when the input gene tree has no duplication nodes, because chronos needs to run to satisfy interval constraints.
If the first estimation failed, RADTE retries while preserving RS by stabilizing risky bounds, edge scaling, multi-seed restarts, and soft-bound/alternative-parameter retries.
If all RS retries fail and --allow_constraint_drop=true, RADTE repeats the same exhaustive retry pipeline at S, then at R (order: RS -> S -> R).
This differs from the method described in Fukushima and Pollock (2020), where duplication nodes (D) may be constrained with the upper and lower limits.
radte_mcmctree_*
When --dating_backend=mcmctree is used, RADTE also copies the generated MCMCTree artifacts into the output directory with the prefix radte_mcmctree_ (for example radte_mcmctree_out.txt, radte_mcmctree_mcmc.txt, radte_mcmctree_FigTree.tre, and the generated control/tree files).
Testing
RADTE includes a comprehensive test suite using testthat. To run the tests:
Overview
Reconciliation-Assisted Divergence Time Estimation (RADTE / rædˈti:) is a method to date gene trees with the aid of dated species trees. This program can handle a rooted gene tree containing duplication/loss events. The divergence time of duplication nodes are estimated while constraining speciation nodes by transferring the known or pre-estimated divergence time from the species tree to the gene tree.
Dependency
--generax_nhx--dating_backend=mcmctreeIn addition to the above dependencies, RADTE needs an output from a phylogeny reconciliation program. NOTUNG and GeneRax are supported.
Installation
Option 1: Bioconda (recommended)
RADTE is available on Bioconda.
Option 2: Source script (development/latest repository version)
If you want the latest repository code, download the
radte.rscript by, for example,gitorsvn, and change the file permission. You can also download a zipped repository fromCode -> Download ZIPabove.Options
--species_treeSpecies tree with estimated divergence time. By default, leaves (species) should be labeled as
GENUS_SPECIES(e.g., Homo_sapiens). If--species-parser=taxonomicis used, taxonomically qualified labels such asDictyostelium_cf_discoideumare also accepted. The tree is expected to be ultrametric and branch lengths should represent evolutionary time (e.g., million years). Internal nodes including the root node must be uniquely labeled and the same file should be consistently used for NOTUNG/GeneRax and RADTE. Don’t know how to label internal nodes? Try this R one-liner.--gene_treeRooted newick tree. By default, leaves (genes) should be labeled as
GENUS_SPECIES_GENEID(e.g., Homo_sapiens_ENSG00000102144). If--species-parseris switched totaxonomic,regex, ormap, gene tips are interpreted with that parser instead. The tree is expected to be non-ultrametric and branch lengths should represent substitutions per site. Use the tree that NOTUNG produces because its internal nodes are correctly labeled in accordance with the NOTUNG parsable file, another input for this program.--notung_parsableAn output file from NOTUNG (tested with version 2.9) can be used to acquire the species–gene relationships in phylogeny reconciliation. See Examples for details.
--generax_nhxInstead of the NOTUNG output, the NHX tree from GeneRax can also be used as an input. If specified,
--gene_treeand--notung_parsablewill be ignored. See Examples for details. The NHX species annotation tagSis required for all nodes and must match species-tree node labels. The duplication tagDis optional (missing values are treated as non-duplication). Accepted duplication values areY,YES,TRUE,T,1; accepted non-duplication values areN,NO,FALSE,F,0.--species-parserOptional species-label parser. Default is
legacy. Supported values are:legacy: currentGENUS_SPECIES/GENUS_SPECIES_GENEIDbehavior.taxonomic: accepts taxonomically qualified species labels such asDictyostelium_cf_discoideum_gene123orArabidopsis_thaliana_subsp_lyrata_gene456.regex: extracts species labels from gene tips using--species-regex.map: resolves species labels from--species-map-tsv.
Required when--species-regex--species-parser=regex. RADTE uses the first capture group if the regex contains captures; otherwise it uses the full match as the extracted species label.
Required when--species-map-tsv--species-parser=map. This should be a tab-delimited file withlabelandspeciescolumns. An optionaltaxonomy_querycolumn can be used to override scientific-name conversion.
Optional tab-delimited file for species-tree node age constraints. This file should contain a--species_node_bounds_tsvnodecolumn plus eitherageor the pairage_min/age_max. The node names must match the labeled internal/root nodes in--species_tree. The point age implied by the species-tree branch lengths must fall within the supplied interval. RADTE transfers these bounds to gene-tree speciation nodes. If the same species-tree node corresponds to multiple gene-tree speciation nodes,mcmctreeenforces a shared age parameter, whilechronosonly reuses the same interval bounds. Accepted examples:orUseageif you want an exact calibration for that species-tree node. Useage_min/age_maxif you want a confidence interval to be propagated to the corresponding gene-tree speciation nodes.
If duplication nodes are deeper than the root node of the species tree, this value will be used as an upper limit of the root node.--max_age
Passed to--chronos_lambdachronosfor divergence time estimation. Seechronosin the ape documentation.
Passed to--chronos_modelchronosfor divergence time estimation. Supported values arediscrete,relaxed, andcorrelated. If an unsupported value is given (e.g., typo likedifscrete), RADTE now exits with a clear error and suggestion. Seechronosin the ape documentation.
Prohibit dated branches shorter than this value. If detected, the branch length is readjusted by transferring a small portion of branch length from the parent branch.--pad_short_edge--allow_constraint_droptrue/false(1/0,yes/noare also accepted).Default is
true. RADTE now first tries to keep all root/speciation constraints by stabilizing conflict-prone bounds. In this no-drop phase, RADTE also performs multi-seed retries and soft-bound retries (plus alternativechronosmodel/lambdatrials) to avoid numerical failures. If thesechronosretries still fail and--allow_constraint_drop=false, RADTE uses a deterministic no-drop fallback (constraint-preserving node dating without dropping calibration nodes). If this option istrue, RADTE runs the same exhaustive retry pipeline stage-by-stage while dropping constraints in order (RS->S->R), moving to the next stage only after the current stage is exhausted. Set--allow_constraint_drop=falseto disableS/Rdrop stages and keep the run strictly no-drop.
Per-attempt timeout (seconds) for each--chronos_attempt_timeout_secchronoscall.Use a non-negative number, or
inf/none/offto disable per-attempt timeout. When--allow_constraint_drop=false, RADTE now defaults to60seconds to avoid infinite waits and then proceeds to no-drop fallback.
Total timeout budget (seconds) across all--chronos_total_timeout_secchronosretries (RS + retry strategies + S/R if enabled).Use a non-negative number, or
inf/none/offto disable total budgeting. When--allow_constraint_drop=false, RADTE now defaults to300seconds.
Dating engine. Supported values are--dating_backendchronos(default) andmcmctree.chronosuses the currentape::chronosworkflow and supports--allow_constraint_dropplus the--chronos_*options. When speciation-node intervals are supplied through--species_node_bounds_tsv,chronosuses thoseage.min/age.maxbounds but cannot force separate internal nodes to share one estimated age unless the bounds are exact.mcmctreeruns the external PAML program MCMCTree on the reconciled gene tree using the transferred root/speciation calibrations. For repeated speciation events caused by ancestral duplications, RADTE uses MCMCTree mirror labels (#1,#2, …) so that corresponding gene-tree speciation nodes can share the same age parameter. The current RADTE integration supportsusedata=1only and requires a MCMCTree-compatible alignment file.
Required when--mcmctree_seqfile--dating_backend=mcmctree. This should be an alignment file readable by MCMCTree whose taxon names exactly match the gene-tree tip labels.
Optional path to the--mcmctree_binmcmctreeexecutable. Default ismcmctreeinPATH.
Optional staging directory for the MCMCTree run. RADTE writes the generated tree/control files there and captures--mcmctree_workdirout.txt,mcmc.txt,FigTree.tre, and the stdout/stderr logs.
Passed to MCMCTree as--mcmctree_usedatausedata. The current RADTE integration supports only1.
Passed to MCMCTree as--mcmctree_seqtypeseqtype. Default is0.
Passed to MCMCTree as--mcmctree_clockclock. Default is2.
Passed to MCMCTree as--mcmctree_modelmodel. Default is0.
Passed to MCMCTree as--mcmctree_burnin,--mcmctree_sampfreq,--mcmctree_nsample,--mcmctree_ncatGburnin,sampfreq,nsample, andncatG.Example 1: RADTE after NOTUNG
For input data, see
data/example_notung_01.species_tree.nwk
gene_tree.nwk.reconciled
radte_gene_tree_output.nwk
Example 2: RADTE after GeneRax
For input data, see
data/example_generax_01. For your own data, please run GeneRax and obtain anhxfile for the gene tree. In Generax,--rec-model UndatedDTLmay not be compatible with RADTE, so please use--rec-model UndatedDL.species_tree.nwk
gene_tree.nhx
radte_gene_tree_output.nwk
Example 3: transfer species-tree node age CI to gene-tree speciation nodes
Prepare a species-node bounds file.
Run RADTE with the default
chronosbackend.This transfers the interval for
s2ands1to the corresponding gene-tree speciation nodes. Withchronos, repeated speciation nodes created by ancestral duplications receive the same interval bounds, but they are not forced to have exactly identical estimated ages unless the bounds are exact.Check the transferred constraints in:
radte_species_tree.tsv: branch-length point ages and the effectiveage_min/age_maxradte_gene_tree.tsv: transferredlower_age/upper_age,constraint_sp_node, andshared_speciation_groupExample 4: RADTE with the MCMCTree backend
MCMCTreerequires an alignment file whose taxon labels match the gene-tree tips. If you use PHYLIP sequential format, separate each taxon name from the sequence by at least two spaces because MCMCTree is strict about this.To combine
MCMCTreewith species-node CI:When the same labeled species-tree node is mapped to multiple gene-tree speciation nodes, RADTE writes MCMCTree mirror labels (
#1,#2, …) so that those nodes share one age parameter.Output files
See
data/example_notung_01anddata/example_generax_01for example files.radte_gene_tree_output.nwk
This is the main output file of RADTE. Branch lengths represent the estimated evolutionary time. Node ages represent the estimated divergence time. The unit of the branch length is the same as that in the input species tree.
radte_*.pdf
RADTE generates pdf files for input and output trees in which nodes are colored (see above examples). Red and blue respectively indicate unconstrained and constrained nodes. While the divergence time of blue nodes is transferred from the species tree, that of red nodes is estimated. When the root node is blue, it means the divergence time is either transferred from the species tree or bounded by
--max_age.radte_calibration_all.tsv
This table contains all identified calibration nodes where the divergence time may be transferred from the species tree to the gene tree.
radte_calibration_used.tsv
This table is a subset of
radte_calibration_all.tsvand contain only calibration nodes that are used to transfer the divergence time. RADTE first stabilizes risky descendant/ancestor bounds to keep constraints without dropping nodes. If--allow_constraint_drop=true(default), a part of calibration points may still be dropped only when all no-drop attempts fail.radte_gene_tree.tsv
This table summarizes gene tree nodes. In the column
event,SandDrespectively denotespeciation nodeorduplication nodeinferred by Notung or GeneRax. The root node is indicated asS(R)orD(R).lower_sp_nodeandupper_sp_nodetogether indicate which node/branch of the species tree the gene tree node is mapped.constraint_sp_nodeidentifies speciation constraints that correspond to a single labeled species-tree node, andshared_speciation_groupmarks repeated speciation nodes that are linked to the same species-tree event.radte_species_tree.tsv
This table summarizes species tree nodes. When
--species_node_bounds_tsvis used, the table also records the transferredage_minandage_maxbounds alongside the branch-length point age.radte_calibrated_nodes.txt
This file records what types of gene tree nodes are constrained in the divergence time estimation. RADTE first attempts to constrain all available calibration points transferred from the species tree (R, root node; S, speciation node) for the divergence time estimation by
chronosfrom the ape package. If the estimation succeeded, the content of this file should be RS, because both R and S nodes were used. If you supply--species_node_bounds_tsv, RADTE may report RS even when the input gene tree has no duplication nodes, becausechronosneeds to run to satisfy interval constraints. If the first estimation failed, RADTE retries while preserving RS by stabilizing risky bounds, edge scaling, multi-seed restarts, and soft-bound/alternative-parameter retries. If all RS retries fail and--allow_constraint_drop=true, RADTE repeats the same exhaustive retry pipeline at S, then at R (order: RS -> S -> R). This differs from the method described in Fukushima and Pollock (2020), where duplication nodes (D) may be constrained with the upper and lower limits.radte_mcmctree_*
When
--dating_backend=mcmctreeis used, RADTE also copies the generated MCMCTree artifacts into the output directory with the prefixradte_mcmctree_(for exampleradte_mcmctree_out.txt,radte_mcmctree_mcmc.txt,radte_mcmctree_FigTree.tre, and the generated control/tree files).Testing
RADTE includes a comprehensive test suite using
testthat. To run the tests:Citation
The prototype of RADTE is described in this publication.
Fukushima K, Pollock DD. 2020. Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nature Communications 11: 4459 (DOI: 10.1038/s41467-020-18090-8)
Licensing
This program is MIT-licensed. See LICENSE for details.