A Python library for creating a Newick formatted tree from a set of classification strings (e.g. a taxonomy)
treemaker is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.
Research in linguistics or cultural evolution often produces or uses tree taxonomies or classifications. However, these are usually not in a format readily available for use in programs that can understand and manipulate trees. For example, the global taxonomy of languages published by the Ethnologue classifies languages into families and subgroups using a taxonomy string e.g. the language Kalam is classified as “Trans-New Guinea, Madang, Kalam-Kobon”, while Mauwake is classified as “Trans-New Guinea, Madang, Croisilles, Pihom”, and Kare is “Trans-New Guinea, Madang, Croisilles, Kare”. This classification indicates that while all these languages are part of the Madang subgroup of the Trans-New Guinea language family, Kare and Mauwake are more closely related (as they belong to the Croisilles subgroup).
Other publications use a tabular indented format to demarcate relationships, such as the example in Figure 1 from Stephen Wurm’s classification of his proposed Yele-Solomons language phylum (Wurm 1975).
Both the taxonomy string and tabular format however are hard to load into software packages that can analyse, compare, visualise and manipulate trees. treemaker aims to make this easy by converting taxonomic data into Newick and Nexus (Maddison 1997) formats commonly used by phylogenetic manipulation programs.
Converting a Taxonomy to a Tree:
treemaker can convert a text file with a taxonomy to a tree. These taxonomies can easily be obtained from Ethnologue or manually entered, such as this example from Wurm’s (outdated) classification of Yele-Solomons in Figure 1:
Bilua Yele-Solomons, Central Solomon
Baniata Yele-Solomons, Central Solomon
Lavukaleve Yele-Solomons, Central Solomon
Savosavo Yele-Solomons, Central Solomon
Kazukuru Yele-Solomons, Kazukuru
Guliguli Yele-Solomons, Kazukuru
Dororo Yele-Solomons, Kazukuru
Yele Yele-Solomons
treemaker can then generate a Newick representation:
treemaker
A Python library for creating a Newick formatted tree from a set of classification strings (e.g. a taxonomy)
treemakeris a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.Research in linguistics or cultural evolution often produces or uses tree taxonomies or classifications. However, these are usually not in a format readily available for use in programs that can understand and manipulate trees. For example, the global taxonomy of languages published by the Ethnologue classifies languages into families and subgroups using a taxonomy string e.g. the language Kalam is classified as “Trans-New Guinea, Madang, Kalam-Kobon”, while Mauwake is classified as “Trans-New Guinea, Madang, Croisilles, Pihom”, and Kare is “Trans-New Guinea, Madang, Croisilles, Kare”. This classification indicates that while all these languages are part of the Madang subgroup of the Trans-New Guinea language family, Kare and Mauwake are more closely related (as they belong to the Croisilles subgroup).
Other publications use a tabular indented format to demarcate relationships, such as the example in Figure 1 from Stephen Wurm’s classification of his proposed Yele-Solomons language phylum (Wurm 1975).
Both the taxonomy string and tabular format however are hard to load into software packages that can analyse, compare, visualise and manipulate trees.
treemakeraims to make this easy by converting taxonomic data into Newick and Nexus (Maddison 1997) formats commonly used by phylogenetic manipulation programs.Converting a Taxonomy to a Tree:
treemakercan convert a text file with a taxonomy to a tree. These taxonomies can easily be obtained from Ethnologue or manually entered, such as this example from Wurm’s (outdated) classification of Yele-Solomons in Figure 1:treemakercan then generate a Newick representation:…which can then be loaded into phylogenetic programs to visualise or manipulate as in Figure 2.
treemakerhas been used to enable the analyses in (Bromham et al. 2018), and a number of forthcoming articles.Installation:
Installation is only a pip install away:
Or from git:
Usage: Command line:
Basic usage:
e.g. Given a text file:
… then you can build a taxonomy/classification tree from that as follows:
To write to file:
Usage: Library:
generate a tree manually:
Add from a list:
API Documentation:
The API is documented here.
Running treemaker’s tests:
To run treemaker’s tests simply run:
Version History:
Support:
For questions on how to use or update this, feel free to open an issue. I’ll get to it as soon as I can.
Acknowledgements:
Thank you to Richard Littauer, Mitsuhiro Nakamura, and Dillon Niederhut.
References: