This readme document is mostly for developers/contributors and those attempting to build the project from source.
Detailed user documentation is available on the project website including tool usage and documentation of metrics produced. Detailed developer documentation can be found here.
The conda package manager (configured with bioconda channels) can be used to quickly install fgbio:
conda install fgbio
To install fgbio without extra dependencies (e.g. R), use the command:
conda install fgbio-minimal
Goals
There are many toolkits available for analyzing genomic data; fgbio does not aim to be all things to all people but is specifically focused on providing:
Robust, well-tested tools.
An easy to use command-line.
Clear and thorough documentation for each tool.
Open source development for the benefit of the community and our clients.
Overview
Fgbio is a set of command line tools to perform bioinformatic/genomic data analysis.
The collection of tools within fgbio are used by our customers and others both for ad-hoc data analysis and within production pipelines.
These tools typically operate on read-level data (ex. FASTQ, SAM, or BAM) or variant-level data (ex. VCF or BCF).
They range from simple tools to filter reads in a BAM file, to tools to compute consensus reads from reads with the same molecular index/tag.
See the list of tools for more detail on the tools
List of tools
For a full list of available tools please see the tools section of the project website.
Below we highlight a few tools that you may find useful.
Tools for working with Unique Molecular Indexes (UMIs, aka Molecular IDs or Molecular Barcodes):
Git LFS is used to store large files used in testing fgbio. In order to compile and run tests it is necessary to install git lfs. To retrieve the large files either:
Clone the repository after installing git lfs, or
In a previously cloned repository run the following once: git lfs install && git lfs pull
After initial setup regular git commands (e.g. pull, fetch, push) will also operate on large files and no special handling is needed.
To clone the repository: git clone https://github.com/fulcrumgenomics/fgbio.git
Use sbt assembly to build an executable jar in target/scala-2.13/.
Tests may be run with sbt test.
Command line
java -jar target/scala-2.13/fgbio-<version>.jar to see the commands supported. Use java -jar target/scala-2.13/fgbio-<version>.jar <command> to see the help message for a particular command.
Include fgbio in your project
You can include fgbio in your project using:
"com.fulcrumgenomics" %% "fgbio" % "1.0.0"
for the latest released version or (buyer beware):
Contributions are welcome and encouraged.
We will do our best to provide an initial response to any pull request or issue within one-week.
For urgent matters, please contact us directly.
fgbio is open source software released under the MIT License.
Sponsorship
Become a sponsor
As a free and open source project, fgbio relies on the support of the community of users for its development. If you work for an organization that uses and benefits from fgbio, please consider supporting fgbio. There are different ways, such as employing people to work on fgbio, funding the project, or becoming a sponsor to support the broader ecosystem. Please contact@fulcrumgenomics.com to discuss.
Sponsors
Sponsors provide support for fgbio through direct funding or employing contributors.
Public sponsors include:
The full list of sponsors supporting fgbio is available in the sponsor page.
fgbio
A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
Visit us at Fulcrum Genomics to learn more about how we can power your Bioinformatics with fgbio and beyond.
This readme document is mostly for developers/contributors and those attempting to build the project from source. Detailed user documentation is available on the project website including tool usage and documentation of metrics produced. Detailed developer documentation can be found here.
Quick Installation
The conda package manager (configured with bioconda channels) can be used to quickly install fgbio:
To install fgbio without extra dependencies (e.g. R), use the command:
Goals
There are many toolkits available for analyzing genomic data; fgbio does not aim to be all things to all people but is specifically focused on providing:
Overview
Fgbio is a set of command line tools to perform bioinformatic/genomic data analysis. The collection of tools within
fgbioare used by our customers and others both for ad-hoc data analysis and within production pipelines. These tools typically operate on read-level data (ex. FASTQ, SAM, or BAM) or variant-level data (ex. VCF or BCF). They range from simple tools to filter reads in a BAM file, to tools to compute consensus reads from reads with the same molecular index/tag. See the list of tools for more detail on the toolsList of tools
For a full list of available tools please see the tools section of the project website.
Below we highlight a few tools that you may find useful.
FastqToBam,AnnotateBamWithUmis,ExtractUmisFromBam, andCopyUmiFromReadName.CorrectUmis,GroupReadsByUmi,CallMolecularConsensusReads,CallDuplexConsensusReads, andFilterConsensusReads.CollectDuplexSeqMetricsandReviewConsensusVariants.FastqToBam,ZipperBams, andDemuxFastqs(see[fqtk][fqtk-link], our rust re-implementation for sample demultiplexing).FilterBam,ClipBam,RandomizeBam,SortBam,SetMateInformationandUpdateReadGroups.ErrorRateByReadPosition.EstimatePoolingFractions]: fgbio-estimatepoolingfractions-link.EstimateRnaSeqInsertSize.CollectAlternateContigNames.UpdateFastaContigNames,UpdateVcfContigNames,UpdateGffContigNames,UpdateIntervalListContigNames,UpdateDelimitedFileContigNames.PickIlluminaIndicesandPickLongIndices.FindTechnicalReadsandFindSwitchbackReads.MakeMixtureVcfandMakeTwoSampleMixtureVcf.Building
Cloning the Repository
Git LFS is used to store large files used in testing fgbio. In order to compile and run tests it is necessary to install git lfs. To retrieve the large files either:
git lfs install && git lfs pullAfter initial setup regular git commands (e.g.
pull,fetch,push) will also operate on large files and no special handling is needed.To clone the repository:
git clone https://github.com/fulcrumgenomics/fgbio.gitRunning the build
fgbio is built using sbt.
Use
sbt assemblyto build an executable jar intarget/scala-2.13/.Tests may be run with
sbt test.Command line
java -jar target/scala-2.13/fgbio-<version>.jarto see the commands supported. Usejava -jar target/scala-2.13/fgbio-<version>.jar <command>to see the help message for a particular command.Include fgbio in your project
You can include
fgbioin your project using:for the latest released version or (buyer beware):
for the latest development snapshot.
Contributing
Contributions are welcome and encouraged. We will do our best to provide an initial response to any pull request or issue within one-week. For urgent matters, please contact us directly.
See Contributing for more details.
Authors
License
fgbiois open source software released under the MIT License.Sponsorship
Become a sponsor
As a free and open source project,
fgbiorelies on the support of the community of users for its development. If you work for an organization that uses and benefits fromfgbio, please consider supportingfgbio. There are different ways, such as employing people to work onfgbio, funding the project, or becoming a sponsor to support the broader ecosystem. Please contact@fulcrumgenomics.com to discuss.Sponsors
Sponsors provide support for
fgbiothrough direct funding or employing contributors. Public sponsors include:The full list of sponsors supporting
fgbiois available in the sponsor page.