The .sh and .py scripts in the halfdeep directory must be in your $PATH. One
way to accomplish that is to add this to your shell startup file (e.g. .bashrc):
export PATH=path_to_halfdeep:${PATH}
Expected directory layout
I’ve tried to set this all up to be similar to the VGP assembly pipeline, given
the fArcCen1 Assembly Tutorial and the pipeline scripts for minimap2.
Inputs are the assemblies and the fastq files (*.fasta.gz and *.fastq.gz in the
directory layout shown below). The assembly should have one of these file
extensions: .fa, .fasta, .fsa_nt, .fa.gz, .fasta.gz, or .fsa_nt.gz.
The final output is a halfdeep.dat file. This is a list of
intervals (origin 1, closed) that the process has
called as ‘covered at half depth’.
cd genomic_data/assembly_curated
minimap2 -x map-pb -d mBalMus1.pri.cur.20190618.idx mBalMus1.pri.cur.20190618.fasta.gz
... (and so on, for each assembly)
cd genomic_data
ls pacbio/*.fastq.gz > input.fofn
bam_depth.sh assembly_curated/mBalMus1.pri.cur.20190618.fasta.gz 1
bam_depth.sh assembly_curated/mBalMus1.pri.cur.20190618.fasta.gz 2
... (and so on, for 1..n where n is the number of fastq files)
halfdeep.sh assembly_curated/mBalMus1.pri.cur.20190618.fasta.gz
Bam_depth determines which fastq file by reading line i of input.fofn, where i
is the number given on the command line.
Plotting
Open R with the working directory at, e.g.
genomic_data/halfdeep/mBalMus1.pri.cur.20190618.
HalfDeep
Dependencies
minimap2
samtools
genodsp, version 0.0.8 or newer (https://github.com/rsharris/genodsp)
Those must all be installed and in your
$PATH.Installation
Copy the files from the repository:
The .sh and .py scripts in the halfdeep directory must be in your
$PATH. One way to accomplish that is to add this to your shell startup file (e.g. .bashrc):Expected directory layout
I’ve tried to set this all up to be similar to the VGP assembly pipeline, given the fArcCen1 Assembly Tutorial and the pipeline scripts for minimap2.
Inputs are the assemblies and the fastq files (*.fasta.gz and *.fastq.gz in the directory layout shown below). The assembly should have one of these file extensions: .fa, .fasta, .fsa_nt, .fa.gz, .fasta.gz, or .fsa_nt.gz.
The final output is a halfdeep.dat file. This is a list of intervals (origin 1, closed) that the process has called as ‘covered at half depth’.
Running
Bam_depth determines which fastq file by reading line i of input.fofn, where i is the number given on the command line.
Plotting
Open R with the working directory at, e.g. genomic_data/halfdeep/mBalMus1.pri.cur.20190618.
Then, in R:
To plot to the screen:
To plot to a file (currently only supports pdf):
To plot just a few scaffolds: