build(deps): bump lru from 0.17.0 to 0.18.0 (#828)
Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
annonars
Genome annotation with Rust and RocksDB.
Also:
Running the CLI
You can enable the annonars CLI by building the project with the
clifeature (easiest done with--all-features):Working with TSV Files
When built with the
clifeature,annonarsallows you to to import variant annotations from TSV files into RocksDB databases. This allows you to import variant annotation TSVs as provided by CADD or dbNSFP. Variants are specified in SPDI representation as described in Holmes et al. 2020. All variants in one file refer to the same genome build.You can import TSV files using
tsv import. For example, to import the “CADD with all annotations” file, you can use the following:This will:
grch37.caddin version1.6.Chrom,Pos,Ref, andAltto specify the variant.NA,.,-) to be used for missing values.When run,
annonarswill first try to infer the schema from the first 100,000 rows. It will then import the data into a RocksDB database. The resulting schema will be dumped in JSON format. If necessary, you can also specify a file with the schema in JSON file to use as a seed for the schema inference. You might need to do this if you see an"Unknown"type in the schema. At the end, the database will be compacted, which may take some time but is necessary to reduce the size of the database and ensure that it can be read in read-only note.After everything is done, you will have to manually look for a file matching
*.login the output RocksDB directory. This is the write-ahead log (WAL) of RocksDB file and can be safely deleted (it should be zero-sized if everything went well).Here is how you can import dbNSFP. Note that you will have to build one RocksDB database per genome release that you want to use for lookup.
annonarscan use tabix indices to speedup database building. If there is a.tbifile for each of the input files thenannonarswill use it and perform import in a parallel fashion based on genome windows. Otherwise,annonarswill import all input files in parallel (yet read through each file sequentially). By default, one thread for each CPU core on the system is used. You can control the number of threads to use by setting the environment variableRAYON_NUM_THREADS.You can query the rocksdb databases using
tsv query, either based on a variant, a position (all variants at the position), or a region. Note thatannonarsuses SPDI-style coordinates (1-based, inclusive) for all queries. You can optionally prefix your query with a gnome release (comparison is done case insensitive) andannonarswill check whether the database matches the genome release.Examples:
Developer Notes
The
v1token in the protobuf schema refers to the internal version of the protocol buffer and not the version of, e.g., gnomAD.Building from scratch
To reduce compile times, we recommend using a pre-built version of
rocksdb, either from the system package manager or e.g. viaconda:In either case, either add
to
.cargo/config.tomlor set the environment variablesROCKSDB_LIB_DIRandSNAPPY_LIB_DIRto the appropriate paths:By default, the environment variables are defined in the
.cargo/config.tomlas described above, i.e. may need adjustments if not using the system package manager.To build the project, run:
To install the project locally, run: