insilicoSV is a versatile framework for structural variant (SV) simulation,
which models SVs using a simple and flexible grammar, allowing users to define standard and custom genome
rearrangements, as well as encode genome placement constraints.
Key features:
Built-in support for 26 types of structural variants (simple and complex), small indels, and SNPs
Fine-grained genome placement control allowing SVs (or specific SV breakpoints) to be constrained to specific regions
of interest (with multiple placement modes available to specify how the SV should overlap with each region) or to avoid
specific regions (i.e. category-specific blacklists)
Integration of user-provided SVs
Fine-grained size simulation allowing independent configuration of inter-breakpoint distances in complex SVs
Modular SV definitions allowing any number of different SV categories to be defined and simulated in the same genome
by combining a variety of attributes (e.g, type, size, placement constraints)
Customizable WDL pipeline with support for genome simulation, read simulation, alignment, and visualization
Illustration of SV classes predefined in insilicoSV and their grammatical notation (a),
supported SV placement constraints (b), Samplot visualization of short-read alignments at the site of a simulated
complex delINVdel event (c), Samplot visualization of short-read alignments at the site of a simulated
grammatically-specified custom SV event (d):
To run insilicoSV: gt; insilicosv -c <path/to/config.yaml>
Recommended workflows
Create a new directory
Create a new YAML config file in this directory
Populate the YAML config file with the parameters specific to this experiment (see Input guidelines and
Use Cases)
Run insilicoSV providing the path to the config file as input. insilicoSV will automatically create
output files in the YAML file directory.
Two customizable WDL pipelines are also provided to automatically simulate synthetic genomes and reads
and produce alignments for downstream analysis. Each pipeline can be configured to (1) simulate one or multiple genomes,
(2) simulate a single or multiple read datasets (currently supported platforms: Illumina, PacBio, and ONT)
from these genomes, (4) align the reads, and (5) visualize the alignments at the simulated SV sites.
See WDL for more information.
Documentation
For detailed information about insilicoSV features, along with usage examples,
please refer to the following documentation sections:
insilicoSV: grammar-based structural variant simulation and placement
Table of Contents
Overview
Installation
User Guide
Quick start
Recommended workflows
Detailed documentation
Overview
insilicoSVis a versatile framework for structural variant (SV) simulation, which models SVs using a simple and flexible grammar, allowing users to define standard and custom genome rearrangements, as well as encode genome placement constraints.Key features:
ABC -> aBBBc)Illustration of SV classes predefined in
insilicoSVand their grammatical notation (a), supported SV placement constraints (b), Samplot visualization of short-read alignments at the site of a simulated complex delINVdel event (c), Samplot visualization of short-read alignments at the site of a simulated grammatically-specified custom SV event (d):Installation
Prerequisite: Python 3.9+ - Install
gt; pip install .User guide
Quick start
To run
insilicoSV:gt; insilicosv -c <path/to/config.yaml>Recommended workflows
insilicoSVproviding the path to the config file as input.insilicoSVwill automatically create output files in the YAML file directory.Two customizable WDL pipelines are also provided to automatically simulate synthetic genomes and reads and produce alignments for downstream analysis. Each pipeline can be configured to (1) simulate one or multiple genomes, (2) simulate a single or multiple read datasets (currently supported platforms: Illumina, PacBio, and ONT) from these genomes, (4) align the reads, and (5) visualize the alignments at the simulated SV sites. See WDL for more information.
Documentation
For detailed information about
insilicoSVfeatures, along with usage examples, please refer to the following documentation sections:Authors
Nick Jiang - nickj@berkeley.edu
Chris Rohlicek - crohlice@broadinstitute.org
Ilya Shlyakhter - ilya@broadinstitute.org
Enzo Battistella - ebattist@broadinstitute.org
Victoria Popic - vpopic@broadinstitute.org