sizemeup is a simple tool to retrieve the genome size for a given species name or tax ID. It utilizes
known genome sizes available from NCBI’s Assembly Reports
in combination with user provided genome sizes that may not be available from NCBI.
Contributing
If you have a species of interest that is not available in the NCBI Assembly Reports, please
consider submitting an issue so that we can get it added to sizemeup. Otherwise, if you have
ideas to improve sizemeup please feel free to!
sizemeup is the main tool that outputs the known genome size for a given species name or tax ID.
Usage
sizemeup --help
Usage: sizemeup [OPTIONS]
sizemeup - A simple tool to determine the genome size of an organism
╭─ Required Options ────────────────────────────────────────────────────────────────────╮
│ * --query -q TEXT The species name or taxid to determine the size of [required] │
│ * --sizes -z TEXT The built in sizes file to use [required] │
╰───────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ──────────────────────────────────────────────────────────────────╮
│ --outdir -o PATH Directory to write output [default: ./] │
│ --prefix -p TEXT Prefix to use for output files [default: sizemeup] │
│ --silent Only critical errors will be printed │
│ --verbose Increase the verbosity of output │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────╯
If the --query value is found a table is printed to STDOUT as well as to a file named
{PREFIX}-sizemeup.txt where {PREFIX} is the value of the --prefix option (default sizemeup).
Talking with Taylor, we have a workflow in Bactopia called
teton for human read scrubbing and taxonomic classification. After running teton, the idea
was to run Bactopia to analyze the samples. However, Bactopia requires a genome size for each
sample in order to calculate coverage and a few other metrics. Sure, we could manually look up
the genome size for each sample, that would be tedious and time consuming. We decided to develop
sizemeup to handle the looking up genome sizes for us. In addition this paves the way for
users of Bactopia to use teton + sizemeup to easily mix species within their runs. In other
words, sizemeup was built to support the Bactopia workflow (but you can use it for whatever!).
As for the name, I wanted something fun and catchy. It’s a simple tool to retrieve the genome
size of a given species name, so, I thought “sizemeup” would work!
sizemeupsizemeupis a simple tool to retrieve the genome size for a given species name or tax ID. It utilizes known genome sizes available from NCBI’s Assembly Reports in combination with user provided genome sizes that may not be available from NCBI.Contributing
If you have a species of interest that is not available in the NCBI Assembly Reports, please consider submitting an issue so that we can get it added to
sizemeup. Otherwise, if you have ideas to improvesizemeupplease feel free to!Installation
You can install
sizemeupusingconda:Available Commands
sizemeupsizemeupis the main tool that outputs the known genome size for a given species name or tax ID.Usage
Example
If the
--queryvalue is found a table is printed to STDOUT as well as to a file named{PREFIX}-sizemeup.txtwhere{PREFIX}is the value of the--prefixoption (default sizemeup).Here is an example of the output file:
However is a species is not found, the you get the following output:
sizemeup-buildsizemup-buildis a helper tool used to build the genome size database forsizemeup. Do do this it:Note: This tool isn’t necessary for most users, just a simple way to update the database on your own or at new releases of
sizemeup.In the end, it produces a TSV file with the following columns:
name- the species nametax_id- the NCBI tax idcategory- the category of the species (e.g.bacteria,virus)size- the genome size in base pairssource- the source of the genome size (e.g.ncbi,user)method- the method used to determine the genome size (e.g.automatic,manual)Citing
sizemeupIf you make use of
sizemeupin your analysis, please cite the following:Petit III RA, Fearing T, Rowley, C sizemeup: A simple tool to determine the genome size of an organism (GitHub)
_ AllTheBacteria
Hunt M., Lima L., Shen W., Lees J., Iqbal Z. AllTheBacteria - all bacterial genomes assembled, available and searchable bioRxiv 2024.03.08.584059
Motivation and Naming
Talking with Taylor, we have a workflow in Bactopia called
tetonfor human read scrubbing and taxonomic classification. After runningteton, the idea was to run Bactopia to analyze the samples. However, Bactopia requires a genome size for each sample in order to calculate coverage and a few other metrics. Sure, we could manually look up the genome size for each sample, that would be tedious and time consuming. We decided to developsizemeupto handle the looking up genome sizes for us. In addition this paves the way for users of Bactopia to useteton+sizemeupto easily mix species within their runs. In other words,sizemeupwas built to support the Bactopia workflow (but you can use it for whatever!).As for the name, I wanted something fun and catchy. It’s a simple tool to retrieve the genome size of a given species name, so, I thought “sizemeup” would work!
Funding
Support for this project came (in part) from the Wyoming Public Health Division.