To install directly from source, follow the instructions in the next section.
EMVC-2 is available on conda via the bioconda channel. See this page for installation instructions for conda. Once conda is installed, we recommend creating an environment with python=3.8.1:
Note that if emvc-2 is installed this way, it should be invoked with the command emvc-2 rather than ./emvc-2. The bioconda help page shows the commands if you wish to install emvc-2 in an environment.
gcc ( Linux: >= 4.8.1, Mac: Apple clang version >= 14.0.0 )
Python libraries requirement
cython ( >=0.29.17 ),
numpy ( >=1.16.6,<=1.20.3 ),
argparse ( >=1.1 ),
scipy ( >=1.1.0,<1.5.4 ),
tqdm ( >=4.46.0 ),
scikit-learn ( >=0.22.2,<=0.24.2 ),
Compiling the candidate_variants_finder and installing python dependencies
The following instructions will create the candidate_variants_finder executable in the root directory, which is needed to run EMVC-2, and install the required python dependencies.
To compile candidate_variants_finder you need to have the gcc compiler.
On Linux (Ubuntu or CentOS) gcc usually comes installed by default, but if not run the following:
emvc-2 [-h] -i BAM_FILE -r REF_FILE [-p THREADS] [-t ITERATIONS] [-m LEARNERS] [-v VERBOSE] -o OUT_FILE
optional arguments:
-h, --help show this help message and exit
-i BAM_FILE, --bam_file BAM_FILE
The bam file
-r REF_FILE, --ref_file REF_FILE
The reference fasta file
-p THREADS, --threads THREADS
The number of parallel threads (default 8)
-t ITERATIONS, --iterations ITERATIONS
The number of EM iterations (default 5)
-m LEARNERS, --learners LEARNERS
The number of learners (default 7)
-v VERBOSE, --verbose VERBOSE
Make output verbose (default 0)
-o OUT_FILE, --out_file OUT_FILE
The output file name
Usage example
We add an example folder with a test file to run a simple example of the tool. The hs37d5 reference file must be downloaded following the instructions detailed in the following section for the example to work.
To run the variant caller with 8 threads on the example file example.bam:
EMVC-2
An efficient SNV variant caller based on the expectation maximization algorithm. EMVC-2 is implemented in C and uses a python wrapper.
Supported plataforms: Linux, MacOS
Authors: Guillermo Dufort y Álvarez, Martí Xargay, Idoia Ochoa, and Alba Pages-Zamora
Contact: gdufort@fing.edu.uy
Install with Conda
To install directly from source, follow the instructions in the next section.
EMVC-2 is available on conda via the bioconda channel. See this page for installation instructions for conda. Once conda is installed, we recommend creating an environment with python=3.8.1:
Then run the following command to install emvc-2.
Note that if emvc-2 is installed this way, it should be invoked with the command
emvc-2rather than./emvc-2. The bioconda help page shows the commands if you wish to install emvc-2 in an environment.Install from source code
Download repository
Requirements
Software requirements
Compiler requirement
Python libraries requirement
Compiling the candidate_variants_finder and installing python dependencies
The following instructions will create the candidate_variants_finder executable in the root directory, which is needed to run EMVC-2, and install the required python dependencies. To compile candidate_variants_finder you need to have the gcc compiler.
On Linux (Ubuntu or CentOS) gcc usually comes installed by default, but if not run the following:
On macOS, install GCC compiler:
xcode-select --install):To check if the gcc compiler is properly installed in your system run:
On Linux
The output should be the description of the installed software.
To compile candidate_variants_finder and install the requiered python dependencies run:
Install samtools
To install samtools, you can use conda:
or follow the instructions in the github repository.
Usage
Usage example
We add an example folder with a test file to run a simple example of the tool. The hs37d5 reference file must be downloaded following the instructions detailed in the following section for the example to work.
To run the variant caller with 8 threads on the example file example.bam:
Original paper datasets information
To test the performance of the EMVC-2 SNV variant caller we ran experiments on the following datasets.
Downloading the datasets and the reference genome
To download a dataset you have to run the download_files.sh with the specific dataset name as a parameter. For example, to download ERR262997 run:
To download the human reference genome version hs37d5 run:
The scripts use the command curl to perform the download. To install curl on macOS run:
To install curl on Ubuntu or CentOS run:
Alignment information
To obtain alignment information in BAM format for each pair of FASTQ files we recommend using the tool BWA.
To install bwa with conda run:
To align a pair of FASTQ files against a reference genome using BWA run: