Yleaf-pipelines: Pipeline-optimized Y-chromosomal haplogroup inference from NGS data
Note: This is a fork of the original Yleaf software with customizations to facilitate integration with Nextflow and other bioinformatics pipelines.
Original authors: Arwin Ralf, Diego Montiel Gonzalez, Kaiyin Zhong and Manfred Kayser
Pipeline adaptation: Alaina Hardie (@trianglegrrl, Toronto, ON, Canada)
Department of Genetic Identification
Erasmus MC University Medical Centre Rotterdam, The Netherlands
Pipeline adaptations
This fork of Yleaf includes customizations that make it more easily integrated into Nextflow and other bioinformatics pipelines, including:
Command-line parameters to specify custom reference genomes (-fg, -yr)
Other pipeline-friendly modifications while maintaining the original functionality
These adaptations aim to streamline the integration of Yleaf into automated workflows while preserving the core functionality and accuracy of the original software.
Future plans
In the future, we will attempt to maintain all of the core functionality of the original Yleaf, with
optimizations for use in Nextflow pipelines.
We endeavour to stay in release sync with the official Yleaf repo. As such, our first release is
3.3.0, which includes the additional command-line parameters described above that are required for
basic smooth operation within pipelines.
Requirements
Operating system: Linux only.
Internet connection: when running for the first time for downloading the reference genome. Alternatively you
can configure your own references.
Data storage: For installation we recommend a storage capacity of > 8 GB.
Installation
The easiest way to get Yleaf up and running is by using a conda environment.
# first clone this repository to get the environment_yleaf.yaml
git clone https://github.com/genid/Yleaf.git
cd Yleaf
# create the conda environment from the .yaml the environment will be called yleaf
conda env create --file environment_yleaf.yaml
# activate the environment
conda activate yleaf
# pip install the cloned yleaf into your environment. Using the -e flag allows you to modify the config file in your cloned folder
pip install -e .
# verify that Yleaf is installed correctly. You can call this command from any directory on your system
Yleaf -h
or manually install everything
# install python and libraries
apt-get install python3.6
pip3 install pandas
pip3 install numpy
# install Burrows-Wheeler Aligner for FASTQ files
sudo apt-get install minimap2
# install SAMtools
wget https://github.com/samtools/samtools/releases/download/1.4.1/
samtools-1.4.1.tar.bz2 -O samtools.tar.bz2
tar -xjvf samtools.tar.bz2 3.
cd samtools-1.4.1/
./configure 5. make
make install
# clone the yleaf repository
git clone https://github.com/genid/Yleaf.git
# pip install the yleaf repository
cd Yleaf
pip install -e .
# verify that Yleaf is installed correctly. You can call this command from any directory on your system
Yleaf -h
After installation you can navigate to the yleaf/config.txt folder and add custom paths for the files listed there. This will make sure that Yleaf does not download the files on the first go or downloads the files in the provided location. This allows you to use a custom reference if you want. Please keep in mind that custom reference files might cause other issues or give problems in combination with already existing data files. Positions are based on either hg38 or hg19.
Usage and examples
Here follow some minimal working examples of how to use Yleaf with different input files. There are additional options
that can be used to tune how strict Yleaf is as well as options to get private mutations as well as a graph showing
the positioning of predicted haplogroups of all your samples in the Haplogroup tree.
Note: In version 3.0 we switched to using YFull (v10.01) for the underlying tree structure of the haplogroups.
This also means that predictions are a bit different compared to earlier versions.
All credit for the original Yleaf software and methodology goes to Arwin Ralf, Diego Montiel Gonzalez, Kaiyin Zhong, Manfred Kayser and the Department of Genetic Identification at Erasmus MC University Medical Centre Rotterdam. This fork builds upon their excellent work to enhance pipeline compatibility.
Yleaf-pipelines: Pipeline-optimized Y-chromosomal haplogroup inference from NGS data
Original authors: Arwin Ralf, Diego Montiel Gonzalez, Kaiyin Zhong and Manfred Kayser
Pipeline adaptation: Alaina Hardie (@trianglegrrl, Toronto, ON, Canada)
Department of Genetic Identification
Erasmus MC University Medical Centre Rotterdam, The Netherlands
Pipeline adaptations
This fork of Yleaf includes customizations that make it more easily integrated into Nextflow and other bioinformatics pipelines, including:
-fg,-yr)These adaptations aim to streamline the integration of Yleaf into automated workflows while preserving the core functionality and accuracy of the original software.
Future plans
In the future, we will attempt to maintain all of the core functionality of the original Yleaf, with optimizations for use in Nextflow pipelines.
We endeavour to stay in release sync with the official Yleaf repo. As such, our first release is 3.3.0, which includes the additional command-line parameters described above that are required for basic smooth operation within pipelines.
Requirements
Installation
The easiest way to get Yleaf up and running is by using a conda environment.
or manually install everything
After installation you can navigate to the yleaf/config.txt folder and add custom paths for the files listed there. This will make sure that Yleaf does not download the files on the first go or downloads the files in the provided location. This allows you to use a custom reference if you want. Please keep in mind that custom reference files might cause other issues or give problems in combination with already existing data files. Positions are based on either hg38 or hg19.
Usage and examples
Here follow some minimal working examples of how to use Yleaf with different input files. There are additional options that can be used to tune how strict Yleaf is as well as options to get private mutations as well as a graph showing the positioning of predicted haplogroups of all your samples in the Haplogroup tree.
Note: In version 3.0 we switched to using YFull (v10.01) for the underlying tree structure of the haplogroups. This also means that predictions are a bit different compared to earlier versions.
Yleaf: FASTQ (raw reads)
Yleaf: BAM or CRAM format
With drawing predicted haplogroups in a tree and showing all private mutations
Using custom reference genomes
You can specify custom reference genomes instead of using the default downloaded ones:
Where:
-fgor--full_genome_referencespecifies the path to a custom full genome reference file-yror--y_chromosome_referencespecifies the path to a custom Y chromosome reference fileBoth references must be in FASTA format (
.fa,.fasta, or.fna).Extracting Y chromosome from a reference genome
If you have a full genome reference but need to extract just the Y chromosome, use the included extraction tool:
Additional information
For a more comprehensive manual please have a look at the yleaf_manual.
If you have a bug to report or a question about installation consider sending an email to a.ralf at erasmusmc.nl or create an issue on GitHub.
References and Supporting Information
A. Ralf, et al., Yleaf: software for human Y-chromosomal haplogroup inference from next generation sequencing data (2018).
https://academic.oup.com/mbe/article/35/5/1291/4922696
Acknowledgments
All credit for the original Yleaf software and methodology goes to Arwin Ralf, Diego Montiel Gonzalez, Kaiyin Zhong, Manfred Kayser and the Department of Genetic Identification at Erasmus MC University Medical Centre Rotterdam. This fork builds upon their excellent work to enhance pipeline compatibility.