Software tool to analyse STR loci from long read data. STRcount can count the number of repeats in a repeat expansion and give you the count in a tabular format for further downstream analysis.
Release notes
0.1.0: Initial release with tools updated on Pypi and ready to use.
Dependencies
Developed and tested on Python 3.7.10. Dependencies include:
The config file should be in the following format:
chr
begin
end
name
repeat
prefix
suffix
chr9
27573527
27573544
c9orf72
GGCCCC
<150bp_left_flank>
<150bp_right_flank>
This format is the same as the format used in STRique as found here.
The 150bp flanking region is suggested, longer flanks will require more time to execute, smaller flanks will take less time but the results may be less accurate.
Usage
If installed using pip or from source, you will be able to use it using STRcount else if you have installed to develop, you will be able to use it using python src/STRcount/STRcount.py
STRcount [-h] --reference REFERENCE --fastq FASTQ --config CONFIG
--output OUTPUT [--min-identity MIN_IDENTITY]
[--min-aligned-fraction MIN_ALIGNED_FRACTION]
[--write-non-spanned]
[--repeat_orientation REPEAT_ORIENTATION]
[--prefix_orientation PREFIX_ORIENTATION]
[--suffix_orientation SUFFIX_ORIENTATION]
[--cleanup CLEANUP] [--output_directory OUTPUT_DIRECTORY]
[--multiseed-DP MULTISEED_DP]
[--precise-clipping PRECISE_CLIPPING]
optional arguments:
-h, --help show this help message and exit
--reference REFERENCE
the reference from which the STR Graph will be
generated
--fastq FASTQ the baseaclled reads in fastq format
--config CONFIG the config file
--output OUTPUT the output file
--min-identity MIN_IDENTITY
only use reads with identity greater than this
--min-aligned-fraction MIN_ALIGNED_FRACTION
require alignments cover this proportion of the query
sequence
--write-non-spanned do not require the reads to span the prefix/suffix
region
--repeat_orientation REPEAT_ORIENTATION
the orientation of the repeat string. + or -
--prefix_orientation PREFIX_ORIENTATION
the orientation of the prefix, + or -
--suffix_orientation SUFFIX_ORIENTATION
the orientation of the suffix, + or -
--cleanup CLEANUP do you want to clean up the temporary file?
--output_directory OUTPUT_DIRECTORY
the output directory for all output and temporary
files
--multiseed-DP MULTISEED_DP
Aligner option
--precise-clipping PRECISE_CLIPPING
Aligner option: use arg as the identity threshold for
a valid alignment.
Output
The output is in a .tsv format that will look something like this:
| read_name | strand | spanned | count | align_score | identity | query_aligned_fraction |
| — | — | — | — | — | — | — |
read_name: The name of the read that is currently being proccessed
strand: The strand on which the primary alignment has been detected
spanned: If set to 1, it means that the read spanned the repeat locus and the flanking sequence
count: The number of repeat motifs detected at the locus for that particular read
align_score: The alignment score as given by GraphAligner
identity: The percentage identity as given by GraphAligner
query_aligned_fraction: This signifies how much of the query sequence is covered by the alignment
STRcount
Software tool to analyse STR loci from long read data. STRcount can count the number of repeats in a repeat expansion and give you the count in a tabular format for further downstream analysis.
Release notes
Dependencies
Developed and tested on Python 3.7.10. Dependencies include:
Installation instructions
Install GraphAligner
STRcount requires and uses GraphAligner as it’s alignment tool. To install this you could either install it using Anaconda:
conda install -c bioconda graphalignerOr you could install it from source using the instructions here
Installation using pip
Use the following command to install STRcount and all dependencies:
Installation from source
Installation to develop
To develop using STRcount, you will need to create a conda environment or python virtual environment, then perform the following steps:
Config file format
The config file should be in the following format:
Usage
If installed using pip or from source, you will be able to use it using
STRcountelse if you have installed to develop, you will be able to use it usingpython src/STRcount/STRcount.pyOutput
The output is in a
.tsvformat that will look something like this: | read_name | strand | spanned | count | align_score | identity | query_aligned_fraction | | — | — | — | — | — | — | — |Contact
Sabiq Chaudhary
License
MIT