Purpose: Calculate the Dosage-score of each genomic region from the NGS alignment data.
Required options
-r1 Fasta Full path of reference fasta file
-g Txt Genome information file
-b Txt BAM information file
-o Directry Output directory
Additional options
-r2 Fasta Full path of reference fasta file with repeat region replaced by N [None]
-l Link Link file generated by GetTwoGenomeSyn.pl in NGenomeSyn package [None]
-t Int Thread [1]
-minMQ Int Minimum mapping quality [60]
-minBQ Int Minimum base quality [13]
-window_size Int Sliding window size [2000000]
-step_size Int Step size [500000]
command : dosage_score_plot
Purpose: Plot the Dosage-score.
Required options
-r1 Fasta Full path of reference fasta file used to "dosage-score" command
-b Txt BAM information file used to "dosage-score" command
-o Directry Output directory used to "dosage-score" command
Additional options
-chr_position Str:Int:Int Specific chromosomal regions focused for plotting. [Whole genome]
-window_size Int Sliding window size used to "dosage-score" command [2000000]
-c Txt Plot color file [None]
-min_rate Float Excloud position less than minimum ratio (=analysis/window size) [0.05]
The input file format
Genome information file (-g)
This is a genome informational file sepalated tab.
Example:
A A01,A02,A03,A04,A05,A06,A07,A08,A09,A10
C C1,C2,C3,C4,C5,C6,C7,C8,C9
Colom 1: Genome name.
Colom 2: Chromosome names listed in reference fasta separated by “,”.
BAM information file (-b)
This is a sample informational file sepalated tab.
Example:
Rapa /Full/path/Rapa.bam Control A
Oleracea /Full/path/Oleracea.bam Control C
Napus /Full/path/Napus.bam Test A,C
BC1F1 /Full/path/BC1F1.bam Test A
Colom 1: Sample name.
Colom 2: Full path of BAM file.
Colom 3: Control or Test. “Control” samples are diploid species used ancistrial filter. Positions that exhibit coverage depth =0 in selected genomes and coverage depth >0 are unselected genomes are subsequently filtered out in “Control” samples.
Colom 4: The genome name entered in the 1st column of the Genome information file where the dose of the sample is expected to be 2. If the dose is expected to be 2 for multiple genomes such as allotetraploid, you can enter 2 genome separated by “,”. Dosage-score calculation uses median values for selected genomic regions for correction.
Reference fasta file with repeat region replaced by N (-ref2)
This is a fasta file with repeat region replaced by N used repeat filter. If a file is specified, N regions are excluded from the analysis.
Example:
Fasta file that combines those generated by repeat masker for each fasta file of each subgenome
Link file (-l)
This is a link file generated by GetTwoGenomeSyn.pl in NGenomeSyn used multi- and non-homologous filter.
If a file is specified, only the specified area will be used for analysis.
This file lists the chromosome regions to be analyzed based on the link file.
A01.1.2000000 etc. in 2_sliding_window_average/region_txt
This file is created for each window based on the reference array. The file name is the chromosome name and the start and end positions of the window. The regions where homolog regions are defined between subgenomes by link files are listed.
Colom 1: Chr name.
Colom 2: Start position.
Colom 3: End position.
A01.1.2000000 etc. in 2_sliding_window_average/sliging_window_total
This file is created for each window based on the reference array. The file name is the chromosome name and the start and end positions of the window.
Colom 1: Number of positions analyzed after filtering.
Colom 2~: Total coverage depth for each sample.
sliging_window_average.txt in 2_sliding_window_average
Average coverage depth list for each window. Windows with analysis position =0 are excluded by the filter.
Colom 1: Chr name.
Colom 2: Position.
Colom 3: Number of positions analyzed after filtering.
Colom 4: Average coverage depth for each sample.
dosage_score.txt in 3_dosage_score
Dosage-score list for each window.
Colom 1: Chr name.
Colom 2: Position.
Colom 3: Number of positions analyzed after filtering.
Colom 4: Dosage-score for each sample.
filtered_dosage_score.txt in 4_plot
A file listing only the windows where the number of analysis positions per window exceeds the min_rate value. Same format as 3_dosage_score/dosage_score.txt
Rapa.png etc. in 4_plot
A png file plotting the Dosage-score for each sample.
Dosage-score
Table of contents
Introduction of DNAMarkMaker
Dosage-score is a tool for estimating dosage of each genomic region from Next-Generation Sequencing (NGS) alignment data.
Installation
Dependencies
Installation using bioconda
You can install DNAMarkMaker using bioconda.
Alternatively, if you want to create DNAMarkMaker specific environment.
Usage
command : dosage_score
Purpose: Calculate the Dosage-score of each genomic region from the NGS alignment data.
Required options
Additional options
command : dosage_score_plot
Purpose: Plot the Dosage-score.
Required options
Additional options
The input file format
Genome information file (-g)
This is a genome informational file sepalated tab.
Example:
Colom 1: Genome name. Colom 2: Chromosome names listed in reference fasta separated by “,”.
BAM information file (-b)
This is a sample informational file sepalated tab.
Example:
Colom 1: Sample name. Colom 2: Full path of BAM file. Colom 3: Control or Test. “Control” samples are diploid species used ancistrial filter. Positions that exhibit coverage depth =0 in selected genomes and coverage depth >0 are unselected genomes are subsequently filtered out in “Control” samples. Colom 4: The genome name entered in the 1st column of the Genome information file where the dose of the sample is expected to be 2. If the dose is expected to be 2 for multiple genomes such as allotetraploid, you can enter 2 genome separated by “,”. Dosage-score calculation uses median values for selected genomic regions for correction.
Reference fasta file with repeat region replaced by N (-ref2)
This is a fasta file with repeat region replaced by N used repeat filter. If a file is specified, N regions are excluded from the analysis. Example:
Fasta file that combines those generated by repeat masker for each fasta file of each subgenome
Link file (-l)
This is a link file generated by GetTwoGenomeSyn.pl in NGenomeSyn used multi- and non-homologous filter. If a file is specified, only the specified area will be used for analysis.
Example:
Plot color file (-c)
This is a color informational file sepalated tab. You can specify a color for each chromosome when plotting. By default all plots are black.
Example:
Colom 1: Chr name. Colom 2: Html color code.
The example of execution
dosage-score
dosage-score-plot
The output file format
Output folder by example command
info.txt in log
This file describes the information entered
analysis_region.txt in 1_linked_regions
This file lists the chromosome regions to be analyzed based on the link file.
A01.1.2000000 etc. in 2_sliding_window_average/region_txt
This file is created for each window based on the reference array. The file name is the chromosome name and the start and end positions of the window. The regions where homolog regions are defined between subgenomes by link files are listed.
Colom 1: Chr name.
Colom 2: Start position.
Colom 3: End position.
A01.1.2000000 etc. in 2_sliding_window_average/sliging_window_total
This file is created for each window based on the reference array. The file name is the chromosome name and the start and end positions of the window.
Colom 1: Number of positions analyzed after filtering.
Colom 2~: Total coverage depth for each sample.
sliging_window_average.txt in 2_sliding_window_average
Average coverage depth list for each window. Windows with analysis position =0 are excluded by the filter.
Colom 1: Chr name.
Colom 2: Position.
Colom 3: Number of positions analyzed after filtering.
Colom 4: Average coverage depth for each sample.
dosage_score.txt in 3_dosage_score
Dosage-score list for each window.
Colom 1: Chr name.
Colom 2: Position.
Colom 3: Number of positions analyzed after filtering.
Colom 4: Dosage-score for each sample.
filtered_dosage_score.txt in 4_plot
A file listing only the windows where the number of analysis positions per window exceeds the min_rate value. Same format as 3_dosage_score/dosage_score.txt
Rapa.png etc. in 4_plot
A png file plotting the Dosage-score for each sample.