Telometer looks for telomeric repeats using regular expressions and measures telomeres from the sequencing adapter sequence to the last telomeric repeat of the form 5’-TTAGGG-3’ or 5’-AATCCC-3’.
Because ONT reads are noisy and frequently miscall telomeres (see Tan et al Gen. Bio. 2022) in stereotypical modes, Telometer originally counted these frequently miscalled motifs as telomeric repeats. That said, since this code was initially created improved telomere basecalling has been integrated into the default R10 chemistry dorado basecalling model and R10 high accuracy basecalling with dorado is now the recommended sequencing chemistry and basecalling model for telomere measurement and support for R9 has been deprecated.
Additionally, Telometer only searches reads which align to the first or last several thousand base pairs of their reference chromosome and only measures telomeres from reads longer than 1000 bp to ensure any analyzed read would be sufficiently long to contain likely intact telomeres. It then checks that the first or last 100 bp of a read are telomere-rich to ensure telomere measurements are from terminal and not interstitial telomere sequences.
By default, telometer only considers reads with read length greater than 1000 bp and this minimum is recommended for telomere capture libraries. For whole genome sequencing, this should be raised to 4000 bp.
The sub/telomeric boundary in humans tends to contain stretches of highly variable length which consist of canonical telomere motifs with one mismatch (see Stephens and Kocher, 2024). Occasionally, these stretches are internal to a longer stretch of canonical telomere motifs. The latest version of telometer accounts for internal stretches of variants with 1 bp mismatch and counts them within the telomere length, but this may change as we learn more about telomere measurement by long read sequencing.
For a benchtop protocol for performing telomere capture library preparation in simplex or multiplex, please see TelometerLibraryPrep.docx in this repo.
If this code or library prep method is helpful, please cite the original article:
Telometer
v1.1 A simple regular expression based method for measuring telomere length from long read sequencing
Dependencies: pysam, pandas, regex, samtools, minimap2, scipy (and their associated dependencies)
Simple Usage:
Description
Telometer looks for telomeric repeats using regular expressions and measures telomeres from the sequencing adapter sequence to the last telomeric repeat of the form 5’-TTAGGG-3’ or 5’-AATCCC-3’. Because ONT reads are noisy and frequently miscall telomeres (see Tan et al Gen. Bio. 2022) in stereotypical modes, Telometer originally counted these frequently miscalled motifs as telomeric repeats. That said, since this code was initially created improved telomere basecalling has been integrated into the default R10 chemistry dorado basecalling model and R10 high accuracy basecalling with dorado is now the recommended sequencing chemistry and basecalling model for telomere measurement and support for R9 has been deprecated.
Additionally, Telometer only searches reads which align to the first or last several thousand base pairs of their reference chromosome and only measures telomeres from reads longer than 1000 bp to ensure any analyzed read would be sufficiently long to contain likely intact telomeres. It then checks that the first or last 100 bp of a read are telomere-rich to ensure telomere measurements are from terminal and not interstitial telomere sequences.
By default, telometer only considers reads with read length greater than 1000 bp and this minimum is recommended for telomere capture libraries. For whole genome sequencing, this should be raised to 4000 bp.
The sub/telomeric boundary in humans tends to contain stretches of highly variable length which consist of canonical telomere motifs with one mismatch (see Stephens and Kocher, 2024). Occasionally, these stretches are internal to a longer stretch of canonical telomere motifs. The latest version of telometer accounts for internal stretches of variants with 1 bp mismatch and counts them within the telomere length, but this may change as we learn more about telomere measurement by long read sequencing.
For a benchtop protocol for performing telomere capture library preparation in simplex or multiplex, please see TelometerLibraryPrep.docx in this repo.
If this code or library prep method is helpful, please cite the original article:
Sanchez, S. E. et al. Digital telomere measurement by long-read sequencing distinguishes healthy aging from disease. Nature Communications 2024
Output Structure
Workflow
Install telometer
Download the latest human t2t assembly from https://github.com/marbl/CHM13 (chm13v2.0.fa)
Append Stong 2014 subtelomere assemblies and index (stong_subtels.fa provided with source):
FASTQ reads should be aligned to the Stong+T2T-CHM13-2.0 genome with minimap2.
Convert the sam output to bam, sort, and index
To save space, it is recommended to delete the initial sam and unsorted bam outputs and compress sorted bams.
Run Telometer
Minimal test data subsampled from a telomere capture experiment is included with source. To test:
The minimal dataset should produce measurements with the following summary statistics using telometer default settings:
Other options:
--bam-b--output-o--minreadlen-m--maxgaplen-g--threads-t--memlimit-l