Uses a compressed de Bruijn graph (implemented in GATB) to count unitigs in bacterial populations.
Details
This is a slightly modified version of the unitig and graph steps in DBGWAS software, repurposed for input into pyseer.
NB We cannot offer support for unitig-counter, it is provided ‘as-is’. Please consider using unitig-caller instead, which offers the same functionality.
Citation
If you use this, please cite the DBGWAS paper:
Jaillard M., Lima L. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLOS Genetics. 14, e1007758 (2018). doi:10.1371/journal.pgen.1007758.
List of changes
Changes the format of the output from step1 from bugwas matrix to pyseer input (Rtab or kmers).
Where strain_list.txt is a list of input files (assemblies) with a header, for example:
ID Path
6925_1_49 assemblies/6925_1#49.contigs_velvet.fa
6925_1_50 assemblies/6925_1#50.contigs_velvet.fa
Output is in output/unitigs.txt and can be used with --kmers in pyseer. You can also test just the
unique patterns in output/unitigs.unique_rows.txt with the --Rtab option.
Cleaning up output
Some unitigs in the output may span multiple input contigs. If you wish to restrict your unitig calls to those appearing in assembled contigs, you can either:
Run unitig-caller on the input genomes, using the unitig calls from your run.
Run the script in the gatb/bcalm package, which will cut unitigs that span multiple contigs.
Thanks to @rchikhi and @apredeus for discovering and fixing this.
Extracting distances
Two get the shortest sequence distance between two unitigs:
Short unitigs can be extended by following paths in the graph to neightbouring nodes. This can help map
sequences which on their own are difficult to align in a specific manner.
Create a file unitigs.txt with the unitigs to extend (probably your significantly associated hits)
and run:
The output extended.txt will contain possible extensions, comma separated, with lines corresponding to unitigs
in the input. See the help for more options.
Python
A similar python script can be found in unitig-graph:
unitig-counter
Uses a compressed de Bruijn graph (implemented in GATB) to count unitigs in bacterial populations.
Details
This is a slightly modified version of the unitig and graph steps in DBGWAS software, repurposed for input into pyseer.
NB We cannot offer support for unitig-counter, it is provided ‘as-is’. Please consider using unitig-caller instead, which offers the same functionality.
Citation
If you use this, please cite the DBGWAS paper:
Jaillard M., Lima L. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLOS Genetics. 14, e1007758 (2018). doi:10.1371/journal.pgen.1007758.
List of changes
step1from bugwas matrix to pyseer input (Rtab or kmers).step2andstep3in DBGWAS.Install
Recommended installation is through conda:
If the package cannot be found, ensure your channels are set up correctly for bioconda.
For compilation from source, see
INSTALL.md.Usage
Run:
Where
strain_list.txtis a list of input files (assemblies) with a header, for example:Output is in
output/unitigs.txtand can be used with--kmersin pyseer. You can also test just the unique patterns inoutput/unitigs.unique_rows.txtwith the--Rtaboption.Cleaning up output
Some unitigs in the output may span multiple input contigs. If you wish to restrict your unitig calls to those appearing in assembled contigs, you can either:
gatb/bcalmpackage, which will cut unitigs that span multiple contigs.Thanks to @rchikhi and @apredeus for discovering and fixing this.
Extracting distances
Two get the shortest sequence distance between two unitigs:
Extending unitigs
Short unitigs can be extended by following paths in the graph to neightbouring nodes. This can help map sequences which on their own are difficult to align in a specific manner.
Create a file
unitigs.txtwith the unitigs to extend (probably your significantly associated hits) and run:The output
extended.txtwill contain possible extensions, comma separated, with lines corresponding to unitigs in the input. See the help for more options.Python
A similar python script can be found in
unitig-graph: