Pipelines to turn basic genomic data into Ensembl cores and back.
This is a multilanguage (Perl, Python) repo providing eHive pipelines and various scripts (see below) to prepare genomic data and load it as Ensembl core database or to dump such core databases as file bundles.
Bundles themselves consist of genomic data in various formats (e.g. fasta, gff3, json) and should follow the corresponding specification.
Installation and configuration
This repository is publicly available in PyPI, so it can be easily installed with your preferred Python package manager, e.g.:
pip install ensembl-genomio
Prerequisites
Pipelines are intended to be run inside the Ensembl production environment. Please, make sure you have all the proper credential, keys, etc. set up.
Ensembl GenomIO
Pipelines to turn basic genomic data into Ensembl cores and back.
This is a multilanguage (Perl, Python) repo providing eHive pipelines and various scripts (see below) to prepare genomic data and load it as Ensembl core database or to dump such core databases as file bundles.
Bundles themselves consist of genomic data in various formats (e.g. fasta, gff3, json) and should follow the corresponding specification.
Installation and configuration
This repository is publicly available in PyPI, so it can be easily installed with your preferred Python package manager, e.g.:
Prerequisites
Pipelines are intended to be run inside the Ensembl production environment. Please, make sure you have all the proper credential, keys, etc. set up.
Get repo and install
Clone:
Install the python part (of the pipelines) and test it:
Update your perl envs (if you need to)
Optional installation
If you need to install “editable” Python package use ‘-e’ option
To install additional dependencies (e.g.
[docs]or[cicd]) provide[<tag>]string, e.g.:For the list of tags see
[project.optional-dependencies]in pyproject.toml.Additional steps to use automated generation of the documentation
[docs]tagmkdocs buildcommandNextflow installation
Please, refer to the “Installation” section of the Nextflow pipelines document.
Pipelines
Initialising and running eHive-based pipelines
Pipelines are derived from
Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf, or fromBio::EnsEMBL::Hive::PipeConfig::EnsemblGeneric_conf, of fromBio::EnsEMBL::EGPipeline::PipeConfig::EGGeneric_conf(see documentation).And the same perl class prefix used for every pipeline:
Bio::EnsEMBL::EGPipeline::PipeConfig::.N.B. Don’t forget to specify
-reg_fileoption for thebeekeeper.pl -url $url -reg_file $REG_FILE -loopcommand.