2. Required input files and structure of pipeline directory
Please create a working directory to run the Snakefile in. The structure of the directory should be as follows. Items marked by asterisk are required by the snakefile to be located where they are specified. The location of non-asterisked items are passed to the Snakefile by the user, so their location in the working directory is flexible.
chr_gmap: path to genetic map files for individual chromosomes. 3 space delimited columns. 1-position bp, 2-Rate(cM/Mb) 3- Map(cM)
mincM: minimum cM length of IBD segments to use in IBDne program
colors: comma delimited list of colors for the output plots (karyograms and ibdne) to take on, in alphabetical order of ancestry. For example, if the reference populations are CHB,GBR,LWK,NAMA then the colors “green,purple,red,blue” will be assigned as CHB=green, GBR=purple, LWK=red, NAMA=blue.
5. Run Snakemake
# Dry run: always run first with -n flag to make sure the workflow will execute properly
nice /share/hennlab/progs/miniconda3/bin/snakemake --configfile config.yaml -j 20 -n
# Generate DAG
/share/hennlab/progs/miniconda3/bin/snakemake --configfile config.yaml -j 20 -n --rulegraph | dot -Tpng > rulegraph.png
# Run pipeline
nice /share/hennlab/progs/miniconda3/bin/snakemake --configfile config.yaml -j 20
6. Checking concordance between rfmix and admixture
If your reference panel has more than 3 populations, its recommended that you run admixture and compare it to the rfmix results as a check. To do this, use Alicia Martin’s script lai_global.py to convert the bedfiles output by the rule msp_to_bed and compare to the admixture output Q files.
This pipeline will output rfmix karyogram plots for every admixed individual, and 1 ibdne plot showing effective population size estimates for each reference population
AS-IBDNe
Snakemake pipeline for running ancestry-specific IBDNe
1. Set Up:
Creating conda environment (first time only):
Activating conda environment (before running pipeline)
2. Required input files and structure of pipeline directory
Please create a working directory to run the Snakefile in. The structure of the directory should be as follows. Items marked by asterisk are required by the snakefile to be located where they are specified. The location of non-asterisked items are passed to the Snakefile by the user, so their location in the working directory is flexible.
The input dataset to this pipeline is a single bim/bed/fam fileset containing both the admixed individuals and the reference individuals.
3. QC beforehand
4. Config file
The config file will contain all the file paths that change with every run of the snakemake pipeline.
Example file:
Explanation of config input parameters:
5. Run Snakemake
6. Checking concordance between rfmix and admixture
If your reference panel has more than 3 populations, its recommended that you run admixture and compare it to the rfmix results as a check. To do this, use Alicia Martin’s script
lai_global.pyto convert the bedfiles output by the rulemsp_to_bedand compare to the admixture output Q files.Alicia’s script: https://github.com/armartin/ancestry_pipeline#estimate-global-ancestry-proportions-from-local-ancestry-inference
Admixture tutorial: https://github.com/hennlab/training/blob/main/common_analyses/admixture_pong.md
7. Output plots
This pipeline will output rfmix karyogram plots for every admixed individual, and 1 ibdne plot showing effective population size estimates for each reference population
8. Acknowledgements and sources: