OrthoFinder identifies orthogroups, infers gene trees for all orthogroups, and analyzes the gene trees to identify the rooted species tree. The method subsequently identifies all gene duplication events in the complete set of gene trees, and analyses them at both gene tree and species tree level. OrthoFinder further analyzes all of this phylogenetic information to identify the complete set of orthologs between all species, and provides extensive comparative genomics statistics.
The following commands provide three ways to download the source code of OrthoFinder locally into a directory named OrthoFinder.
# Download via git
git clone https://github.com/OrthoFinder/OrthoFinder.git
# or download the orthofinder-linux-intel-3.1.4.tar.gz and unzip it into OrthoFinder if you are on a Linux Intel machine
mkdir OrthoFinder && \
wget -qO- https://github.com/OrthoFinder/OrthoFinder/releases/download/v3.1.4/orthofinder-linux-intel-3.1.4.tar.gz | \
tar -xz --strip-components=1 -C OrthoFinder
Next, you can run the following commands to install OrthoFinder inside the of3_env virtural environment.
cd OrthoFinder
python3 -m venv of3_env # Create an virtural environment named of3_env
. of3_env/bin/activate # Activate of3_env
pip install .
Whether you’ve installed OrthoFinder directly from GitHub or downloaded and set it up locally, the OrthoFinder package will only be available within the of3_env virtual environment. This avoids potential conflicts with Python dependencies.
To deactivate the virtual environment when you are finished, run:
deactivate
To activate the virtual environment you have created, run:
. of3_env/bin/activate
Test your installation
Once you have installed OrthoFinder, you can print the help information and version, and test it on the example data.
orthofinder --help # Print out help informatioin
orthofinder --version # Check the version
orthofinder -f ExampleData # Test OrthoFinder on an example dataset - this should take a few minutes to run.
Uninstalling
To uninstall on conda:
conda deactivate
conda remove -n of3_env --all
To remove the virtual environment where OrthoFinder is installed:
deactivate
cd ..
rm -rf OrthoFinder
Simple Usage
Run OrthoFinder on FASTA format proteomes in <dir>
orthofinder [options] -f <dir>
OrthoFinder requires one FASTA file for each species. Each file should contain the complete set of protein sequences from that species’ genome, with a single representative sequence for each gene.
If your files have multiple transcript variants for each gene, then we provide a script primary_transcripts.py to extract the longest variant per gene. This script should be run on your files prior to running OrthoFinder;
for f in *fa ; do python primary_transcript.py $f ; done
Advanced Usage - Scaling to Thousands of Species
If you are analysing >100 species, we recommend that you use the scalable implementation.
Add the files for 64 species into one directory <core>
Add the remaining files into another directory <additional>
First, run OrthoFinder on the subset of 64 species
orthofinder [options] -f <core>
Then, add the additional species to the results of the core run
To choose which 64 species to include in the core, aim to capture a broad range of the evolutionary diversity of your species.
Note that this alternative way of running OrthoFinder requires that the core species are run using the multiple sequence alignment option. You cannot add additional species to OrthoFinder results that were run with the -M dendroblast option, which was the default for OrthoFinder2
Command-line options
Command-line options for OrthoFinder
Adding additional species
Parameter
Description
--assign <dir1> --core <dir2>
Assign species from <dir1> to existing orthogroups in <dir2>.
Start OrthoFinder from pre-computed BLAST results in <dir>.
Other options
Parameter
Description
-1
Only perform one-way sequence search.
-z
Don’t trim MSAs (columns >= 90% gap, min. alignment length 500).
-y
Split paralogous clades below the root of a HOG into separate HOGs.
-h
Print this help text.
-v
Print version.
Output files
From OrthoFinder v3.1.4, N0.tsv is removed from /Phylogenetic_Hierarchical_Orthogroups. Instead, Orthogroups/Orthogroups.tsv contains the orthogroups from N0.tsv.
A standard OrthoFinder run produces a set of files describing the orthogroups, orthologs, gene trees, resolve gene trees, the rooted species tree, gene duplication events, and comparative genomic statistics for the set of species being analysed. These files are located in an intuitive directory structure.
Full details on the output files and directories can be found here. The directories that are useful for most users are
/Orthogroups
Orthogroups.tsv is the main orthogroup file. Each row contains the genes belonging to a single orthogroup. The genes from each orthogroup are organized into columns, one per species.
Orthogroups.txt is a text file with each line showing the genes in a single orthogroup. It differs from Orthogroups.tsv in that it doesn’t show the species which each gene belongs to.
Orthogroups.GeneCount.tsv is a tab separated text file that contains counts of the number of genes for each species in each orthogroup.
Orthogroups_SingleCopyOrthologues.txt is a list of orthogroups that contain exactly one gene per species
Orthogrouops_UnassignedGenes.tsv is a tab separated text file that contains all of the genes that were not assigned to any orthogroup.
/Phylogenetic_Hierarchical_Orthogroups
Each file is a phylogenetic hierarchical orthogroup (HOG) for a different node of the species tree.
Each row of a file contain the genes belonging to a single orthogroup.
Each species is represented by a single column.
N0.tsv from the old version is now Orthogroups/Orthogroups.tsv
/Orthologues
Each species has a sub-directory that in turn contains a file for each pairwise species comparison, listing the orthologs between that species pair.
/Comparative_Genomics_Statistics
Files containing summary statistics across all orthogroups, as well as comparisons between each pair of species.
/Resolved_Gene_Trees
A rooted phylogenetic tree inferred for each orthogroup with 4 or more sequences and resolved using the OrthoFinder hybrid species-overlap/duplication-loss coalescent model.
/Species_Tree
SpeciesTree_rooted.txt is a species tree inferred using STAG or ASTRAL-Pro.
SpeciesTree_rooted_node_labels.txt is the same tree, but with nodes labels instead of support values. This labelled version is useful for interpreting and analysing the results of the gene duplication analyses.
/Gene_Duplication_Events
Duplications.tsv has a row for each gene duplication event, with information on orthogroup in which it occured, the species that contain the duplicated gene, the node in the species tree on which the gene duplication event occured, and the support score for the gene duplication event.
SpeciesTree_Gene_Duplications_0.5_Support.txt provides a summation of the above duplications over the branches of the species tree.
/Orthogroup_Sequences
A FASTA file for each orthogroup giving the amino acid sequences for each gene in the orthogroup.
Latest additions
The current version of OrthoFinder has several major changes compared to OrthoFinder version 2 (Emms & Kelly 2019).
New workflow for scalability
The --core --assign workflow uses the SHOOT algorithm to create profiles for previously computed orthogroups, and adds new genes to these orthogroups without requiring a costly all-versus-all sequence search. Genes that cannot be assigned using the SHOOT approach are analysed using a standard OrthoFinder workflow.
Phylogenetic Hierarchical Orthogroups
OrthoFinder has now extended its phylogenetic approach to orthogroups, allowing orthogroups to be defined for each node within the species tree. This significantly increases the accuracy of orthogroups, and enables users to perform orthogroup analyses for any clade of species in the species tree.
Citation
Latest
[1] David M Emms, Yi Liu, Laurence Belcher, Jonathan Holmes, Steven Kelly, 2025.OrthoFinder: scalable phylogenetic orthology inference for comparative genomics. bioRxiv.
Introduced the SHOOT method to perform phylogenetic gene search
[2] Emms, D.M., Kelly, S. SHOOT: phylogenetic gene search and ortholog inference. Genome Biol 23, 85 (2022).
Introduced the phylogenetic inference of orthologs, including rooted gene and species trees, and gene duplication events
[3] Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019).
Introduced the STRIDE method to root an unrooted species tree.
[4] Emms DM, Kelly S. STRIDE: Species Tree Root Inference from Gene Duplication Events. Mol Biol Evol. 2017 Dec 1;34(12):3267-3278.
Introduced the STAG method of species tree inference
[5] D.M. Emms, S. Kelly, 2017. STAG: Species Tree Inference from All Genes bioRxiv.
Introduced the orthogroup inference method
[6] Emms, D.M., Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015).
System Requirements
Operating system
OrthoFinder was designed to run on Linux (including WSL2).
We have tested OrthoFinder v3.1 on debian 12.9, centOS v8, macOS 14.4.1, macOS 13.2.1.
Dependencies
Python >=3.11
Diamond >=2.1.7,<2.2
Famsa >=2.2.3
Fasttree >=2.1.11
Numpy >=2.3.2
Scipy >=1.16
Biopython >=1.85
Rich >=14.1.0
Scikit-learn >=1.7.1
Meet the team
OrthoFinder was developed by David Emms & Steve Kelly
OrthoFinder



OrthoFinder identifies orthogroups, infers gene trees for all orthogroups, and analyzes the gene trees to identify the rooted species tree. The method subsequently identifies all gene duplication events in the complete set of gene trees, and analyses them at both gene tree and species tree level. OrthoFinder further analyzes all of this phylogenetic information to identify the complete set of orthologs between all species, and provides extensive comparative genomics statistics.
Table of contents
For more information please visit our website.
Installation
Install in conda (recommended)
The simplest way to install OrthoFinder is through conda. If you’re unfamiliar with conda, this tutorial offers a beginner-friendly introduction.
Alternatively, you could install via github, or download the source code and install locally.
Install via github
Install locally from source code
The following commands provide three ways to download the source code of OrthoFinder locally into a directory named
OrthoFinder.Next, you can run the following commands to install OrthoFinder inside the of3_env virtural environment.
Whether you’ve installed OrthoFinder directly from GitHub or downloaded and set it up locally, the OrthoFinder package will only be available within the
of3_envvirtual environment. This avoids potential conflicts with Python dependencies.To deactivate the virtual environment when you are finished, run:
To activate the virtual environment you have created, run:
Test your installation
Once you have installed OrthoFinder, you can print the help information and version, and test it on the example data.
Uninstalling
To uninstall on conda:
To remove the virtual environment where OrthoFinder is installed:
Simple Usage
Run OrthoFinder on FASTA format proteomes in
<dir>OrthoFinder requires one FASTA file for each species. Each file should contain the complete set of protein sequences from that species’ genome, with a single representative sequence for each gene.
If your files have multiple transcript variants for each gene, then we provide a script
primary_transcripts.pyto extract the longest variant per gene. This script should be run on your files prior to running OrthoFinder;Advanced Usage - Scaling to Thousands of Species
If you are analysing >100 species, we recommend that you use the scalable implementation.
Add the files for 64 species into one directory
<core>Add the remaining files into another directory<additional>First, run OrthoFinder on the subset of 64 species
Then, add the additional species to the results of the core run
To choose which 64 species to include in the core, aim to capture a broad range of the evolutionary diversity of your species.
Note that this alternative way of running OrthoFinder requires that the core species are run using the multiple sequence alignment option. You cannot add additional species to OrthoFinder results that were run with the
-M dendroblastoption, which was the default for OrthoFinder2Command-line options
Command-line options for OrthoFinder
Adding additional species
--assign <dir1> --core <dir2><dir1>to existing orthogroups in<dir2>.Method choices
-Mmsadendroblast,msa-Sdiamondblast,diamond,diamond_ultra_sens,blastp,mmseqs,blastn-A-M msafamsafamsa,mafft,muscle,-T-M msafasttreefasttree,fasttree_fastest,raxml,iqtree-I1.21-10Input options
-d-sOutput options
-X-n <txt>-o <txt>Parallel processing options
-tAll available-a16 or t/8 (whichever lower)Workflow stopping options
-opWorkflow restart options
-b <dir><dir>.Other options
-1-z-y-h-vOutput files
A standard OrthoFinder run produces a set of files describing the orthogroups, orthologs, gene trees, resolve gene trees, the rooted species tree, gene duplication events, and comparative genomic statistics for the set of species being analysed. These files are located in an intuitive directory structure.
Full details on the output files and directories can be found here. The directories that are useful for most users are
/OrthogroupsOrthogroups.tsvis the main orthogroup file. Each row contains the genes belonging to a single orthogroup. The genes from each orthogroup are organized into columns, one per species.Orthogroups.txtis a text file with each line showing the genes in a single orthogroup. It differs from Orthogroups.tsv in that it doesn’t show the species which each gene belongs to.Orthogroups.GeneCount.tsvis a tab separated text file that contains counts of the number of genes for each species in each orthogroup.Orthogroups_SingleCopyOrthologues.txtis a list of orthogroups that contain exactly one gene per speciesOrthogrouops_UnassignedGenes.tsvis a tab separated text file that contains all of the genes that were not assigned to any orthogroup./Phylogenetic_Hierarchical_OrthogroupsN0.tsvfrom the old version is nowOrthogroups/Orthogroups.tsv/Orthologues/Comparative_Genomics_Statistics/Resolved_Gene_Trees/Species_TreeSpeciesTree_rooted.txtis a species tree inferred using STAG or ASTRAL-Pro.SpeciesTree_rooted_node_labels.txtis the same tree, but with nodes labels instead of support values. This labelled version is useful for interpreting and analysing the results of the gene duplication analyses./Gene_Duplication_EventsDuplications.tsvhas a row for each gene duplication event, with information on orthogroup in which it occured, the species that contain the duplicated gene, the node in the species tree on which the gene duplication event occured, and the support score for the gene duplication event.SpeciesTree_Gene_Duplications_0.5_Support.txtprovides a summation of the above duplications over the branches of the species tree./Orthogroup_SequencesLatest additions
The current version of OrthoFinder has several major changes compared to OrthoFinder version 2 (Emms & Kelly 2019).
New workflow for scalability
The
--core --assignworkflow uses the SHOOT algorithm to create profiles for previously computed orthogroups, and adds new genes to these orthogroups without requiring a costly all-versus-all sequence search. Genes that cannot be assigned using the SHOOT approach are analysed using a standard OrthoFinder workflow.Phylogenetic Hierarchical Orthogroups
OrthoFinder has now extended its phylogenetic approach to orthogroups, allowing orthogroups to be defined for each node within the species tree. This significantly increases the accuracy of orthogroups, and enables users to perform orthogroup analyses for any clade of species in the species tree.
Citation
Latest
[1] David M Emms, Yi Liu, Laurence Belcher, Jonathan Holmes, Steven Kelly, 2025. OrthoFinder: scalable phylogenetic orthology inference for comparative genomics. bioRxiv.
Introduced the SHOOT method to perform phylogenetic gene search
[2] Emms, D.M., Kelly, S. SHOOT: phylogenetic gene search and ortholog inference. Genome Biol 23, 85 (2022).
Introduced the phylogenetic inference of orthologs, including rooted gene and species trees, and gene duplication events
[3] Emms, D.M., Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20, 238 (2019).
Introduced the STRIDE method to root an unrooted species tree.
[4] Emms DM, Kelly S. STRIDE: Species Tree Root Inference from Gene Duplication Events. Mol Biol Evol. 2017 Dec 1;34(12):3267-3278.
Introduced the STAG method of species tree inference
[5] D.M. Emms, S. Kelly, 2017. STAG: Species Tree Inference from All Genes bioRxiv.
Introduced the orthogroup inference method
[6] Emms, D.M., Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015).
System Requirements
Operating system
OrthoFinder was designed to run on Linux (including WSL2).
We have tested OrthoFinder v3.1 on debian 12.9, centOS v8, macOS 14.4.1, macOS 13.2.1.
Dependencies
>=3.11>=2.1.7,<2.2>=2.2.3>=2.1.11>=2.3.2>=1.16>=1.85>=14.1.0>=1.7.1Meet the team
OrthoFinder was developed by David Emms & Steve Kelly
Current members of the OrthoFinder team:
Yi Liu, Jonathan Holmes, Laurie Belcher