The installation instructions for the installable SignalP 6.0 prediction tool can be found here.
Install in editable mode with pip install -e ./ to experiment.
Data
The training dataset as well as the full dataset before homology partitioning are in data. The directory additionally contains the extended vocabulary of the ProtTrans BertTokenizer used.
Training
You can find the training script in scripts/train_model.py. The pytorch model is in src/signalp6/models. Please refer to the training script source for the meaning of all parameters.
SignalP 6.0
Signal peptide prediction model based on a Bert protein language model encoder and a conditional random field (CRF) decoder.
This is the development codebase. If you are looking for the prediction service, go to https://services.healthtech.dtu.dk/service.php?SignalP-6.0.
The installation instructions for the installable SignalP 6.0 prediction tool can be found here.
Install in editable mode with
pip install -e ./to experiment.Data
The training dataset as well as the full dataset before homology partitioning are in
data. The directory additionally contains the extended vocabulary of the ProtTransBertTokenizerused.Training
You can find the training script in
scripts/train_model.py. The pytorch model is insrc/signalp6/models. Please refer to the training script source for the meaning of all parameters.A basic training command looks like this:
Other things in package
training_utilscontains parts that were used to fit the model, e.g. dataloading and regularization.utilscontains other utilities, such as functions to calculate metrics or region statistics.