Hominid Ancestral Population analysis (HARP) mode (set “–HARP_flag” to 1 )
Install through Bioconda (The latest version 1.2.4):
conda install MARS
(Please ensure channels are properly setup for bioconda before installing)
MARS_step1 --help
MARS_step2 --help
# You can also check the below corresponding scripts for more details
Install through Github:
git clone https://github.com/maiziex/MARS.git
cd MARS
chmod +x install.sh
./install.sh
Dependencies through Github install:
MARS utilizes Python3, paftools, SAMtools, and minimap2. To be able to execute the above programs by typing their name on the command line, the program executables must be in one of the directories listed in the PATH environment variable (“.bashrc”).
Or you could just run “./install.sh” to install them, but make sure you have installed “conda” and “wget” first.
To use MARS, please also make sure install the below python packages through pip or conda. pip install Bio Conda install pysam Conda install pandas conda install minimap2 conda install openpyxl
–assembly_dir: “Aquila_results_4samples” is the input folder where you store the diploid assembled contig files for each sample by Aquila/Aquila_stLFR.
–ref_file: “refdata-GRCh38-2.1.0/fasta/genome.fa” is the human reference fasta (hg38) file.
–sample_list: ‘HG00250’,’HG00353’,’HG00512’ … are the sample names corresponding to your contig files, which is the prefix of the contig files.
*Optional parameters
–out_dir: default = ./MARS_step1_results, it is the folder name you can define to store the final results.
–SV_len: default = 20, it is the SV size you can define.
–num_threads: default = 2, it is the number of threads you can define to perform assembly-based variant calling, which corresponds to number of samples.
–chr_start –chr_end: For example: use “–chr_start 1 –chr_end 5” will perform results for chromsomes 1,2,3,4,5. Use “–chr_start 22 –chr_end 22” will only assembly chromosome 22.
Step 2: Merge SV and Generate the multiple samples alignment-based SV files
–in_dir: “MARS_step1_results” is the folder to store SV calling results from step1.
–assembly_dir: “Aquila_results_4samples” is the input folder where you store the diploid assembled contig files for each sample.
–ref_file: “refdata-GRCh38-2.1.0/fasta/genome.fa” is the human reference fasta (hg38) file.
–sample_list: ‘HG00250’,’HG00353’,’HG00512’… are the sample names corresponding to your contig files, which is the prefix of the contig files.
*Optional parameters
–out_dir: default = ./MARS_step2_results, it is the folder name you can define to store the final results.
–SV_len: default = 20, it is the SV size you can define.
–num_threads_bychr: default = 2, it is the number of chromosomes you can define to perform MARS parallely.
–chr_start –chr_end: For example: use “–chr_start 1 –chr_end 5” will perform results for chromsomes 1,2,3,4,5. Use “–chr_start 22 –chr_end 22” will only assembly chromosome 22.
–gnomad_flag_linked_snp; -gnomad_flag: default = 0. If flag set to 1, MARS will output linked dbSNP from gnomad for each SV.
–HARP_flag: default = 0. If flag set to 1, it will output SV with ancestral state: derived deletions and derived insertions.
–Ape_ref_list: If –HARP_flag set to 1, the users need to iniatize the Ape reference genomes they want to use. “Gorilla_gorilla_ref.fasta”, “pan_troglodytes_ref.fasta”, “pongo_abelii_ref.fasta”, and “macaca_mulatta_ref.fasta” are the reference fasta files for each Ape. Each reference file is seperately by comma (“,”)
Output files:
1. In the “MARS_step2_results/Final_tables” folder:
2. In the “MARS_step2_results/MSA_SV_results_chr*/MSA_SV_files” folder, it includes txt/html files for each SV:
SV with 10bp left and right flanking regions around breakpoints, for example a txt file called by “chr22_29430067_29430067_ins_SV_Ref_bk_2.txt”,
“chr22_29430067_29430067_ins” is the name of the SV which corresponds to the first field/column in “SV_msa_table_chr*.txt” file.
SV with around 500bp left and right flanking regions: a MSA html file:
a html file
Multiple Alignment-based Refinement of Svs (MARS)
Hominid Ancestral Population analysis (HARP) mode (set “–HARP_flag” to 1 )
Install through Bioconda (The latest version 1.2.4):
(Please ensure channels are properly setup for bioconda before installing)
Install through Github:
Dependencies through Github install:
MARS utilizes Python3, paftools, SAMtools, and minimap2. To be able to execute the above programs by typing their name on the command line, the program executables must be in one of the directories listed in the PATH environment variable (“.bashrc”).
Or you could just run “./install.sh” to install them, but make sure you have installed “conda” and “wget” first.
To use MARS, please also make sure install the below python packages through pip or conda.
pip install Bio
Conda install pysam
Conda install pandas
conda install minimap2
conda install openpyxl
Example data in Zenodo:
Please download the example data from Zenodo.
Running The Code:
Put the “MARS/bin” in the “.bashrc” file, and source the “.bashrc” file
Or use the fullpath of “MARS_step1.py” and “MARS_step2.py”
Step 1: Assembly-based structural variants calling for population
*Required parameters
–assembly_dir: “Aquila_results_4samples” is the input folder where you store the diploid assembled contig files for each sample by Aquila/Aquila_stLFR.
–ref_file: “refdata-GRCh38-2.1.0/fasta/genome.fa” is the human reference fasta (hg38) file.
–sample_list: ‘HG00250’,’HG00353’,’HG00512’ … are the sample names corresponding to your contig files, which is the prefix of the contig files.
*Optional parameters
–out_dir: default = ./MARS_step1_results, it is the folder name you can define to store the final results.
–SV_len: default = 20, it is the SV size you can define.
–num_threads: default = 2, it is the number of threads you can define to perform assembly-based variant calling, which corresponds to number of samples.
–chr_start –chr_end: For example: use “–chr_start 1 –chr_end 5” will perform results for chromsomes 1,2,3,4,5. Use “–chr_start 22 –chr_end 22” will only assembly chromosome 22.
Step 2: Merge SV and Generate the multiple samples alignment-based SV files
OR add “-gnomad_flag”
OR add “–HARP_flag”
OR add “–HARP_flag” and “-gnomad_flag”
*Required parameters
–in_dir: “MARS_step1_results” is the folder to store SV calling results from step1.
–assembly_dir: “Aquila_results_4samples” is the input folder where you store the diploid assembled contig files for each sample.
–ref_file: “refdata-GRCh38-2.1.0/fasta/genome.fa” is the human reference fasta (hg38) file.
–sample_list: ‘HG00250’,’HG00353’,’HG00512’… are the sample names corresponding to your contig files, which is the prefix of the contig files.
*Optional parameters
–out_dir: default = ./MARS_step2_results, it is the folder name you can define to store the final results.
–SV_len: default = 20, it is the SV size you can define.
–num_threads_bychr: default = 2, it is the number of chromosomes you can define to perform MARS parallely.
–chr_start –chr_end: For example: use “–chr_start 1 –chr_end 5” will perform results for chromsomes 1,2,3,4,5. Use “–chr_start 22 –chr_end 22” will only assembly chromosome 22.
–gnomad_flag_linked_snp; -gnomad_flag: default = 0. If flag set to 1, MARS will output linked dbSNP from gnomad for each SV.
–gnomad_dir: If “gnomad_flag” set to 1, the users need to download gnomad VCF files and give a path to the folder which stores these gnomAD VCF files. (For example: use “wget https://storage.googleapis.com/gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.10.liftover_grch38.vcf.bgz" to download gnomAD VCF file for chr10) For details, please check gnomAD downloads website)
–HARP_flag: default = 0. If flag set to 1, it will output SV with ancestral state: derived deletions and derived insertions.
–Ape_ref_list: If –HARP_flag set to 1, the users need to iniatize the Ape reference genomes they want to use. “Gorilla_gorilla_ref.fasta”, “pan_troglodytes_ref.fasta”, “pongo_abelii_ref.fasta”, and “macaca_mulatta_ref.fasta” are the reference fasta files for each Ape. Each reference file is seperately by comma (“,”)
Output files:
1. In the “MARS_step2_results/Final_tables” folder:
2. In the “MARS_step2_results/MSA_SV_results_chr*/MSA_SV_files” folder, it includes txt/html files for each SV:
Troubleshooting:
Please submit issues on the github page for MARS.