This repository contains svhip.py, a script for predicting functional RNA elements from multiple sequence alignments. It supports generating training data, training machine learning models, slicing alignments into windows, and predicting classes (coding, non-coding, other) for alignment windows.
–threads (int, default: max(CPU_COUNT-1, 1)): Number of threads to allocate.
–seed (int, default: random integer at startup): Seed controlling randomized behavior (e.g. shuffling).
Task overview:
data → Generation of training data from multiple sequence alignments.
train → Train a prediction model on data generated with ‘data’ command.
windows → Cut an alignment into overlapping windows in preparation for alignments.
predict → Run a model prediction on windows generated with ‘windows’ program.
hexcalibrate → Train a hexamer frequency model which can be used for coding potential assessment.
svhip data
Purpose
Generate training data from coding and/or noncoding input sequences.
Align sequences (requires Clustal Omega), optionally generate a negative set (SISSIz if available, otherwise column shuffling), slice alignments into windows, and compute features into a TSV.
Behavior
Requires at least one of –noncoding or –coding.
Checks for clustalo availability; checks SISSIz availability unless –shuffle-control is given.
Seeds RNG with –seed.
Options
–noncoding (string): Input directory with FASTA file(s) of noncoding sequences (Requires at least 1 of ‘noncoding’, ‘coding’).
–coding (string): Input directory with FASTA file(s) of coding sequences (Requires at least 1 of ‘noncoding’, ‘coding’).
–other (string): Input directory with FASTA file(s) of random (intergenic) sequences (will be auto-generated via randomization if not supplied).
-o, –outfile (string): Name for the output file (Required).
-N, –negative (string): Path to a specific negative dataset; if empty, a negative set is auto-generated.
-d, –max-id (float, default: 0.95): Remove sequences above this identity threshold during preprocessing (interpreted as proportion; help text mentions percent).
-n, –num-sequences (int, default: 100): Number of sequences input alignments will be optimized towards.
-l, –window-length (int, default: 120): Window length for slicing alignments into overlapping windows.
-s, –samples (int, default: 10): Number of sampling runs per alignment/sequence count.
-a, –sample-attempts (int, default: 1000): Number of sampling attempts per alignment.
-c, –shuffle-control (store_true, default: False): Use simpler column-based shuffling instead of SISSIz.
-H, –hexamer-model (string, default: hexamer_models/Human_hexamer.tsv): Path to the statistical hexamer model to use.
-S, –no-structural-filter (string, default: False): Set to True to disable filtering windows by statistical significance of structure. Note: defined as action=”store” (string) although conceptually a boolean toggle.
-T, –tree (string, default: None): Path to a Newick-formatted species tree for the alignment. If None, a tree may be estimated.
Clustal Omega (clustalo) must be available in PATH for data generation and alignment steps.
SISSIz is used for negative control generation when available; if not present or if –shuffle-control is set, a simpler column-shuffling approach is used instead.
Randomization is controlled by –seed. If not provided, a random seed is generated at start.
When using predict with –bed, ensure the input is MAF to include genomic coordinates.
SVHIP
This repository contains
svhip.py, a script for predicting functional RNA elements from multiple sequence alignments. It supports generating training data, training machine learning models, slicing alignments into windows, and predicting classes (coding, non-coding, other) for alignment windows.Usage
Global options (available for all tasks)
Task overview:
svhip data
Purpose
Behavior
Options
Example
svhip train
Purpose
Options
SVM hyperparameters (when model=SVM and optimization enabled)
Random Forest hyperparameters (when model=RF and optimization enabled)
Example
svhip windows
Purpose
Options
Example
svhip predict
Purpose
Options
Example
svhip hexcalibrate
Purpose
Options
Example
External tools and notes