目录
dependabot[bot]

build(deps): bump uv from 0.10.9 to 0.11.6 (#256)

Bumps uv from 0.10.9 to 0.11.6.

Release notes

Sourced from uv's releases.

0.11.6

Release Notes

Released on 2026-04-09.

This release resolves a low severity security advisory in which wheels with malformed RECORD entries could delete arbitrary files on uninstall. See GHSA-pjjw-68hj-v9mw for details.

Bug fixes

  • Do not remove files outside the venv on uninstall (#18942)
  • Validate and heal wheel RECORD during installation (#18943)
  • Avoid uv cache clean errors due to Win32 path normalization (#18856)

Install uv 0.11.6

Install prebuilt binaries via shell script

curl --proto '=https' --tlsv1.2 -LsSf
https://releases.astral.sh/github/uv/releases/download/0.11.6/uv-installer.sh
| sh

Install prebuilt binaries via powershell script

powershell -ExecutionPolicy Bypass -c "irm
https://releases.astral.sh/github/uv/releases/download/0.11.6/uv-installer.ps1
| iex"

Download uv 0.11.6

File Platform Checksum
uv-aarch64-apple-darwin.tar.gz Apple Silicon macOS checksum
uv-x86_64-apple-darwin.tar.gz Intel macOS checksum
uv-aarch64-pc-windows-msvc.zip ARM64 Windows checksum
uv-i686-pc-windows-msvc.zip x86 Windows checksum
uv-x86_64-pc-windows-msvc.zip x64 Windows checksum
uv-aarch64-unknown-linux-gnu.tar.gz ARM64 Linux checksum
uv-i686-unknown-linux-gnu.tar.gz x86 Linux checksum
uv-powerpc64le-unknown-linux-gnu.tar.gz PPC64LE Linux checksum
uv-riscv64gc-unknown-linux-gnu.tar.gz RISCV Linux checksum
uv-s390x-unknown-linux-gnu.tar.gz S390x Linux checksum
uv-x86_64-unknown-linux-gnu.tar.gz x64 Linux checksum
uv-armv7-unknown-linux-gnueabihf.tar.gz ARMv7 Linux checksum
uv-aarch64-unknown-linux-musl.tar.gz ARM64 MUSL Linux checksum
uv-i686-unknown-linux-musl.tar.gz x86 MUSL Linux checksum
uv-riscv64gc-unknown-linux-musl.tar.gz RISCV MUSL Linux checksum
uv-x86_64-unknown-linux-musl.tar.gz x64 MUSL Linux checksum
uv-arm-unknown-linux-musleabihf.tar.gz ARMv6 MUSL Linux (Hardfloat) checksum
uv-armv7-unknown-linux-musleabihf.tar.gz ARMv7 MUSL Linux checksum

... (truncated)

Changelog

Sourced from uv's changelog.

0.11.6

Released on 2026-04-09.

This release resolves a low severity security advisory in which wheels with malformed RECORD entries could delete arbitrary files on uninstall. See GHSA-pjjw-68hj-v9mw for details.

Bug fixes

  • Do not remove files outside the venv on uninstall (#18942)
  • Validate and heal wheel RECORD during installation (#18943)
  • Avoid uv cache clean errors due to Win32 path normalization (#18856)

0.11.5

Released on 2026-04-08.

Python

  • Add CPython 3.13.13, 3.14.4, and 3.15.0a8 (#18908)

Enhancements

  • Fix build_system.requires error message (#18911)
  • Remove trailing path separators in path normalization (#18915)
  • Improve error messages for unsupported or invalid TLS certificates (#18924)

Preview features

  • Add exclude-newer to [[tool.uv.index]] (#18839)
  • uv audit: add context/warnings for ignored vulnerabilities (#18905)

Bug fixes

  • Normalize persisted fork markers before lock equality checks (#18612)
  • Clear junction properly when uninstalling Python versions on Windows (#18815)
  • Report error cleanly instead of panicking on TLS certificate error (#18904)

Documentation

  • Remove the legacy PIP_COMPATIBILITY.md redirect file (#18928)
  • Fix uv init example-bare --bare examples (#18822, #18925)

0.11.4

Released on 2026-04-07.

Enhancements

  • Add support for --upgrade-group (#18266)
  • Merge repeated archive URL hashes by version ID (#18841)

... (truncated)

Commits

Dependabot compatibility
score

Dependabot will resolve any conflicts with this PR as long as you don’t alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the Security Alerts page.

Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

5天前683次提交

CI Zenodo DOI Paper DOI:10.1093/bioinformatics/btac810 install with bioconda

logo dark logo light

VCF/BCF transformation using python expressions

vembrane allows to transform VCF/BCF files by specifying flexible Python expressions over INFO or FORMAT field, CHROM, POS, ID, REF, ALT, QUAL, FILTER, and the annotation field ANN/CSQ. Supported transformations include filtering, tagging, annotation, sorting, and conversion to tabular (CSV/TSV), structured formats (JSON/JSONL/YAML), and FHIR. vembrane relies on pysam or cyvcf2 for reading/writing VCF/BCF files.

See https://vembrane.github.io for a high-level overview. For a comparison with similar tools have a look at the vembrane benchmarks.

Installation

vembrane is available in bioconda and can either be installed into an existing conda environment with mamba install -c conda-forge -c bioconda vembrane or into a new named environment mamba create -n environment_name -c conda-forge -c bioconda vembrane. Alternatively, if you are familiar with git and uv, clone this repository and run uv sync. See docs/develop.md for further details.

Subcommands

vembrane provides several subcommands for different tasks:

  • filter: Filters VCF/BCF files based on flexible Python expressions. See the detailed documentation below.

     vembrane filter 'CHROM == "chr3" and ANN["Consequence"].any_is_a("frameshift_variant")' variants.bcf
  • tag: A non-destructive version of filter. Instead of removing records, it adds a user-defined tag to the FILTER field of records that match an expression. For more details, see docs/tag.md.

    vembrane tag --tag quality_at_least_30="QUAL >= 30" variants.vcf
  • table: Creates tabular (TSV/CSV or parquet) files from VCF/BCF data. The columns can be defined with flexible Python expressions. For more details, see docs/table.md.

    vembrane table 'CHROM, POS, 10**(-QUAL/10), ANN["CLIN_SIG"]' input.vcf > table.tsv
  • annotate: Annotates VCF/BCF files with data from an external table-like file (e.g., TSV, CSV), based on genomic coordinates. For more details, see docs/annotate.md.

    vembrane annotate example.yaml example.bcf > annotated.vcf
  • sort: Sorts VCF/BCF files. The sort order can be defined via flexible Python expressions and thus be based on any field and annotation that occurs in the VCF/BCF (e.g. impact or clinical significance). The primary use case is variant prioritization. For more details, see docs/sort.md.

    vembrane sort input.vcf 'round(ANN["gnomad_AF"], 1), -ANN["REVEL"]' > prioritized.vcf
  • structured: Converts VCF records into structured formats like JSON, JSONL, or YAML using a flexible YTE template. For more details, see docs/structured.md.

    vembrane structured template.yml input.vcf --output output.json
  • fhir: Converts VCF records into FHIR observations. For more details, see docs/fhir.md.

    vembrane fhir tumor GRCh38 --profile mii_molgen_v2025.0.0 --output-fmt json --annotation-key ANN < sample.vcf > sample-tumor.fhir.json

vembrane filter

Full documentation is also available in docs/filter.md.

Usage

vembrane takes two positional arguments: The filter expression and the input file; the latter may be omitted to read from stdin instead, making it easy to use vembrane in pipe chains.

usage: vembrane filter [options] expression [input vcf/bcf]

options:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output file. If not specified, output is written to STDOUT.
  --output-fmt {vcf,bcf,uncompressed-bcf}, -O {vcf,bcf,uncompressed-bcf}
                        Output format.
  --annotation-key FIELDNAME, -k FIELDNAME
                        The INFO key for the annotation field. This defaults to "ANN", but tools might
                        use other field names. For example, default VEP annotations can be parsed by
                        setting "CSQ" here.
  --aux NAME=PATH, -a NAME=PATH
                        Path to an auxiliary file containing a set of symbols.
  --context CONTEXT     Python statement defining a context for given Python expressions.
                        Extends eventual definitions given via --context-file.
  --context-file CONTEXT_FILE
                        Path to Python script defining a context for given Python expressions.
  --ontology PATH       Path to an ontology in OBO format. 
                        The ontology is loaded into memory and can be used in expressions via the SO symbol.
                        May be compressed with gzip, bzip2 or xz.
                        Defaults to built-in ontology (from sequenceontology.org).
  --keep-unmatched      Keep all annotations of a variant if at least one of them
                        passes the expression (mimics SnpSift behaviour).
  --preserve-order      Ensures that the order of the output matches that of the input.
                        This is only useful if the input contains breakends (BNDs)
                        since the order of all other variants is preserved anyway.

Filter expression

The filter expression can be any valid python expression that evaluates to a value of type bool. If you want to use truthy values, you need to wrap the expression in bool(), or aggregate multiple values via any() or all().

However, functions and symbols available have been restricted to the following:

  • all, any
  • abs, len, max, min, round, sum
  • enumerate, filter, iter, map, next, range, reversed, sorted, zip
  • dict, list, set, tuple
  • bool, chr, float, int, ord, str
  • Any function or symbol from math
  • Any function from statistics
  • Regular expressions via re
  • custom functions:
    • without_na(values: Iterable[T]) -> Iterable[T] (keep only values that are not NA)
    • replace_na(values: Iterable[T], replacement: T) -> Iterable[T] (replace values that are NA with some other fixed value)
    • genotype related:
      • count_hom, count_het , count_any_ref, count_any_var, count_hom_ref, count_hom_var
      • is_hom, is_het, is_hom_ref , is_hom_var
      • has_ref, has_var

Available fields

The following VCF fields can be accessed in the filter expression:

Name Type Interpretation Example expression
INFO Dict[str, Any¹] INFO field -> Value INFO["DP"] > 0
ANN² Dict[str, Any³] ANN field -> Value² ANN["SYMBOL"] == "CDH2"²
CHROM str Chromosome Name CHROM == "chr2"
POS int Chromosomal position (1-based) 24 < POS < 42
END int Chromosomal end position (1-based, inclusive, NA for breakends); also accessible via INFO["END"] 24 < END < 42
ID str Variant ID ID == "rs11725853"
REF str Reference allele REF == "A"
ALT str Alternative allele⁴ ALT == "C"
QUAL float Quality QUAL >= 60
FILTER List[str] Filter tags "PASS" in FILTER
FORMAT Dict[str, Dict[str, Any¹]] Format -> (Sample -> Value) FORMAT["DP"][SAMPLES[0]] > 0
SAMPLES List[str] [Sample] "Tumor" in SAMPLES
INDEX int Index of variant in the file INDEX < 10

¹ depends on type specified in VCF header

² if your VCF defines annotations under a key other than ANN (e.g. VEP’s CSQ) you have to specify this via the --annotation-key flag (e.g. --annotation-key CSQ). You can (and should, for portability) still use ANN in your expressions then (although the given annotation key works as well, e.g. CSQ["SYMBOL"]).

³ for the usual snpeff and vep annotations, custom types have been specified; any unknown ANN field will simply be of type str. If something lacks a custom parser/type, please consider filing an issue in the issue tracker.

⁴ vembrane does not handle multi-allelic records itself. Instead, such files should be preprocessed by either of the following tools (preferably even before annotation):

Examples

  • Only keep annotations and variants where gene equals “CDH2” and its impact is “HIGH”:

    vembrane filter 'ANN["SYMBOL"] == "CDH2" and ANN["Annotation_Impact"] == "HIGH"' variants.bcf
  • Only keep variants with quality at least 30:

    vembrane filter 'QUAL >= 30' variants.vcf
  • Only keep annotations and variants where feature (transcript) is ENST00000307301:

    vembrane filter 'ANN["Feature"] == "ENST00000307301"' variants.bcf
  • Only keep annotations and variants where protein position is less than 10:

    vembrane filter 'ANN["Protein_position"].start < 10' variants.bcf
  • Only keep variants where the ID matches the regex pattern ^rs[0-9]+:

    vembrane filter 'bool(re.search("^rs[0-9]+", ID or ""))' variants.vcf
  • Only keep variants where mapping quality is exactly 60:

    vembrane filter 'INFO["MQ"] == 60' variants.bcf
  • Only keep annotations and variants where CLIN_SIG contains “pathogenic”, “likely_pathogenic” or “drug_response”:

    vembrane filter \
      'any(entry in ANN["CLIN_SIG"]
           for entry in ("pathogenic", "likely_pathogenic", "drug_response"))' \
      variants.vcf

    Using set operations, the same may also be expressed as:

    vembrane filter \
      'not {"pathogenic", "likely_pathogenic", "drug_response"}.isdisjoint(ANN["CLIN_SIG"])' \
      variants.vcf
  • Filter on sample specific values:

    • by sample name:
      vembrane filter 'FORMAT["DP"]["specific_sample_name"] > 0' variants.vcf
    • by sample index:
      vembrane filter 'FORMAT["DP"][0] > 0' variants.vcf
    • by sample name based on the index in the list of SAMPLES:
      vembrane filter 'FORMAT["DP"][SAMPLES[0]] > 0' variants.vcf
    • using all or a subset of SAMPLES:
      vembrane filter 'mean(FORMAT["DP"][s] for s in SAMPLES) > 10' variants.vcf
  • Filter on genotypes for specific samples (named “kid”, “mom”, “dad”):

    vembrane filter \
      'is_het("kid") and is_hom_ref("mom") and is_hom_ref("dad") and \
       all(FORMAT["DP"][s] > 10 for s in ["kid", "mom", "dad"])' \
      variants.vcf
  • Explicitly access the GT field for the first sample in the file:

    vembrane filter 'FORMAT["GT"][0] == (1, 1)' variants.vcf

Custom ANN types

vembrane parses entries in the annotation field (ANN or whatever you specify under --annotation-key) as outlined in docs/ann_types.md.

Missing values in annotations

If a certain annotation field lacks a value, it will be replaced with the special value of NA. Comparing with this value will always result in False, e.g. ANN["MOTIF_POS"] > 0 will always evaluate to False if there was no value in the “MOTIF_POS” field of ANN (otherwise the comparison will be carried out with the usual semantics).

For fields with custom types, such as ANN["Protein_position"] which is of type PosRange with attributes start, end and length, trying to access ANN["Protein_position"].start will result in NA if there was no value for "Protein_position" in the annotation of the respective record, i.e. the access will return NA instead of raising an AttributeError. In general, any attribute access on NA will result in NA (and issue a warning to stderr).

Since you may want to use the regex module to search for matches, NA also acts as an empty str, such that re.search("nothing", NA) returns nothing instead of raising an exception.

Explicitly handling missing/optional values in INFO or FORMAT fields can be done by checking for NA, e.g.: INFO["DP"] is NA.

Handling missing/optional values in fields other than INFO or FORMAT can be done by checking for None, e.g ID is not None.

Sometimes, multi-valued fields may contain missing values; in this case, the without_na function can be convenient, for example: mean(without_na(FORMAT['DP'][s] for s in SAMPLES)) > 2.3. It is also possible to replace NA with some constant value with the replace_na function: mean(replace_na((FORMAT['DP'][s] for s in SAMPLES), 0.0)) > 2.3

Auxiliary files

vembrane supports additional files, such as lists of genes or ids with the --aux NAME=path/to/file option. The file should contain one item per line and is parsed as a set. For example vembrane filter --aux genes=genes.txt "ANN['SYMBOL'] in AUX['genes']" variants.vcf will keep only records where the annotated symbol is in the set specified in genes.txt.

Custom context

In addition to the default context for expressions given to vembrane, consisting of variables containing the VCF/BCF record information, helper functions and useful Python builtins (like any()), it is possible to define additional custom context via the flags

  • --context STATEMENT: providing a Python statement,
  • --context-file PATH: providing a Python script to evaluate (considering the parent directory of the script as search path for eventual additional imports).

In both cases, the provided input is evaluated with Python. Any global variables (or functions) become available in the Python expressions that you provide to vembrane for filtering or the other subcommands. Note that the code you provide as context is not sandboxed and should be trusted. Carefully review any code you get from the internet or AI.

An example use case is the import of additional Python modules. For example, you could use this to randomly subsample 10% of the VCF/BCF records given in a file input.vcf:

vembrane filter --context "import random; random.seed(42)" "random.random() < 0.1" input.vcf > subsampled.vcf

Ontologies

vembrane supports ontologies in OBO format. The ontology is loaded into memory and can be accessed in the filter expression via the SO symbol. This enables filtering based on relationships between ontology terms. For example, vembrane filter --ontology so.obo 'ANN["Consequence"].any_is_a("intron_variant")' will keep only records where at least one of the consequences is an intron variant or a subtype thereof. If no ontology is provided, the built-in ontology from sequenceontology.org (date: 2024-06-06) is loaded automatically if the SO symbol is accessed.

There are three relevant classes/types:

  • Term: Represents a term in the ontology. It inherits from str and can be used as such.
  • Consequences: Represents a list of terms. It inherits from list and can be used as such.
  • SO: Represents the ontology itself. It is a singleton and can be used to access the ontology.

The following functions are available for ontologies, where term is a single Term and terms is a Consequences object:

  • SO.get_id(term: Term) -> str: Convert from term name (e.g. stop_gained) to accession (e.g. SO:0001587).
  • SO.get_term(id_: str) -> Term: Convert from accession (e.g. SO:0001587) to term name (e.g. stop_gained).
  • terms.most_specific_terms() -> Consequences: Narrow down the list of terms to the most specific ones, e.g. frameshift_variant&frameshift_truncation&intron_variant&splice_site_variant&splice_donor_5th_base_variant will lead to frameshift_truncation&splice_donor_5th_base_variant.
  • term.ancestors() -> Consequences: Get all ancestral levels of a term, all the way to the ontology’s root node.
  • term.descendants() -> Consequences: Get all descendant levels of a term, all the way to the ontology’s respective leave nodes.
  • term.parents() -> Consequences: Get immediate parents of a term.
  • term.children() -> Consequences: Get immediate children of a term.
  • term.is_a(parent: Term) -> bool: Check if there is a path from term to parent, i.e. whether term is the parent type or a subtype of it.
  • terms.any_is_a(parent: Term) -> bool: Check if any of the terms is a subtype of parent.
  • term.is_ancestor(other: Term) -> bool: Check if term is an ancestor of other.
  • term.is_descendant(other: Term) -> bool: Check if term is a descendant of other. (Same as is_a)
  • term.path_length(target: Term) -> int | None: Get the shortest path length from term to target or vice versa. Returns None if no path exists.

Citation

Check the “Cite this repository” entry in the sidebar for citation options.

Also, please read should-I-cite-this-software for background.

Authors

  • Marcel Bargull (@mbargull)
  • Jan Forster (@jafors)
  • Till Hartmann (@tedil)
  • Johannes Köster (@johanneskoester)
  • Elias Kuthe (@eqt)
  • David Lähnemann (@dlaehnemann)
  • Felix Mölder (@felixmoelder)
  • Christopher Schröder (@christopher-schroeder)
关于

用于高效处理和过滤VCF文件的Python库

20.6 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号