Tang et al. (2024) JCVI: A Versatile Toolkit for Comparative Genomics
Analysis. iMeta
Contents
Following modules are available as generic Bioinformatics handling
methods.
algorithms
Linear programming solver with SCIP and GLPK.
Supermap: find set of non-overlapping anchors in BLAST or NUCMER output.
Longest or heaviest increasing subsequence.
Matrix operations.
apps
GenBank entrez accession, Phytozome, Ensembl and SRA downloader.
Calculate (non)synonymous substitution rate between gene pairs.
Basic phylogenetic tree construction using PHYLIP, PhyML, or RAxML, and viualization.
Wrapper for BLAST+, LASTZ, LAST, BWA, BOWTIE2, CLC, CDHIT, CAP3, etc.
formats
Currently supports .ace format (phrap, cap3, etc.), .agp
(goldenpath), .bed format, .blast output, .btab format,
.coords format (nucmer output), .fasta format, .fastq
format, .fpc format, .gff format, obo format (ontology),
.psl format (UCSC blat, GMAP, etc.), .posmap format (Celera
assembler output), .sam format (read mapping), .contig
format (TIGR assembly format), etc.
graphics
BLAST or synteny dot plot.
Histogram using R and ASCII art.
Paint regions on set of chromosomes.
Macro-synteny and micro-synteny plots.
Ribbon plots from whole genome alignments.
utils
Grouper can be used as disjoint set data structure.
range contains common range operations, like overlap
and chaining.
Some graphics modules require the ImageMagick library.
On MacOS this can be installed using Conda (see next section). If you are using a linux system (i.e. Ubuntu) you can install ImageMagick using apt-get:
If installed successfully, you can check the version with:
jcvi --version
Usage
Use python -m to call any of the modules installed with JCVI.
Most of the modules in this package contains multiple actions. To use
the fasta example:
Usage:
python -m jcvi.formats.fasta ACTION
Available ACTIONs:
clean | Remove irregular chars in FASTA seqs
diff | Check if two fasta records contain same information
extract | Given fasta file and seq id, retrieve the sequence in fasta format
fastq | Combine fasta and qual to create fastq file
filter | Filter the records by size
format | Trim accession id to the first space or switch id based on 2-column mapping file
fromtab | Convert 2-column sequence file to FASTA format
gaps | Print out a list of gap sizes within sequences
gc | Plot G+C content distribution
identical | Given 2 fasta files, find all exactly identical records
ids | Generate a list of headers
info | Run `sequence_info` on fasta files
ispcr | Reformat paired primers into isPcr query format
join | Concatenate a list of seqs and add gaps in between
longestorf | Find longest orf for CDS fasta
pair | Sort paired reads to .pairs, rest to .fragments
pairinplace | Starting from fragment.fasta, find if adjacent records can form pairs
pool | Pool a bunch of fastafiles together and add prefix
qual | Generate dummy .qual file based on FASTA file
random | Randomly take some records
sequin | Generate a gapped fasta file for sequin submission
simulate | Simulate random fasta file for testing
some | Include or exclude a list of records (also performs on .qual file if available)
sort | Sort the records by IDs, sizes, etc.
summary | Report the real no of bases and N's in fasta files
tidy | Normalize gap sizes and remove small components in fasta
translate | Translate CDS to proteins
trim | Given a cross_match screened fasta, trim the sequence
trimsplit | Split sequences at lower-cased letters
uniq | Remove records that are the same
Then you need to use one action, you can just do:
python -m jcvi.formats.fasta extract
This will tell you the options and arguments it expects.
Feel free to check out other scripts in the package, it is not just
for FASTA.
JCVI: A Versatile Toolkit for Comparative Genomics Analysis
Collection of Python libraries to parse bioinformatics files, or perform computation related to assembly, annotation, and comparative genomics.
How to cite
Contents
Following modules are available as generic Bioinformatics handling methods.
algorithms
apps
formats
Currently supports
.aceformat (phrap, cap3, etc.),.agp(goldenpath),.bedformat,.blastoutput,.btabformat,.coordsformat (nucmeroutput),.fastaformat,.fastqformat,.fpcformat,.gffformat,oboformat (ontology),.pslformat (UCSC blat, GMAP, etc.),.posmapformat (Celera assembler output),.samformat (read mapping),.contigformat (TIGR assembly format), etc.graphics
utils
Then there are modules that contain domain-specific methods.
assembly
annotation
compara
Applications
Please visit wiki for full-fledged applications.
Dependencies
JCVI requires Python3 between v3.9 and v3.12.
Some graphics modules require the ImageMagick library.
On MacOS this can be installed using Conda (see next section). If you are using a linux system (i.e. Ubuntu) you can install ImageMagick using apt-get:
See the Wand docs for instructions on installing ImageMagick on other systems.
A few modules may ask for locations of external programs, if the executable cannot be found in your
PATH.The external programs that are often used are:
Managing dependencies with uv
You can use uv to create and manage a project environment from
pyproject.tomlanduv.lock.Run commands inside the managed environment with
uv run, for example:Managing dependencies with Conda
You can use the YAML files in this repo to create an environment with basic JCVI dependencies.
If you are new to Conda, we recommend the Miniforge distribution.
After activating the Conda environment install JCVI using one of the following options.
Installation
Installation options
Test Installation
If installed successfully, you can check the version with:
Usage
Use
python -mto call any of the modules installed with JCVI.Most of the modules in this package contains multiple actions. To use the
fastaexample:Then you need to use one action, you can just do:
This will tell you the options and arguments it expects.
Feel free to check out other scripts in the package, it is not just for FASTA.
Star History