PSOSP (Prophage SOS dependency Predictor) is a novel bioinformatics tool to predict prophage induction modes by analyzing the heterology index (HI) of LexA protein binding to target DNA, classifying prophages into SOS-dependent (SdPs) and SOS-independent (SiPs).
We provide an online platform (PSOSP) for rapid prediction of bacteriophage induction modes: https://vee-lab.sjtu.edu.cn/PSOSP/. There you can upload your host and phage genomes and get the prediction results.
Background
Principle
Temperate phages integrate into the bacterial host genome as prophages. Under normal conditions, the LexA protein binds to the SOS box within the prophage, repressing the expression of phage-related genes and maintaining the lysogenic state. Upon external stimuli (such as exposure to DNA-damaging agents), the RecA protein is activated, leading the self-cleavage of LexA and its dissociation from the SOS box. This relieves the prophage repression, triggering the temperate phage to enter the lytic cycle and thereby facilitating its proliferation.
Workflow
LexA & Canonical SOS Box (CBS) Identification : Scanning the host genome to identify LexA protein and canonical SOS boxes (CSBs) located upstream of the lexA gene
Heterology Index (HI) Calculation: Identifying potential SOS boxes (PSBs) across bacterial genomes, calculating the Heterology Index (HI) for each PSB and establishing classification thresholds (HIc1 and HIc2) via Mean Shift clustering results
PSB scan in prophage: Scanning PSBs within prophage promoter regions and determining of the minimum HI (HImin)
Prophage categoriation : Evaluating the ability of LexA binding to prophage promoter regions by comparing HImin with thresholds
We have validated PSOSP’s accuracy using 14 experimentally confirmed bacteriophages spanning 10 viral families (including 2 Peduoviridae, 3 Inoviridae, and 9 distinct novel families), with their hosts covering 7 bacterial genera (Salmonella, Escherichia, Vibrio, Pseudomonas, Serratia_J, Hafnia, and Shewanella) across 3 bacterial orders (Enterobacterales, Enterobacterales_A, and Pseudomonadales). Significantly, all PSOSP predictions for these bacteriophages showed complete consistency with experimental evidence, demonstrating the tool’s versatility and reliability across broad taxonomic ranges.
Significance for prophage isolation
We propose that future phage isolation efforts could first use PSOSP to determine phage type
For SdPs, conventional SOS-inducing agents (e.g., MMC, UV) remain appropriate.
For SiPs, SOS-independent inducers such as DPO, C4-HSL, EDTA, and pyocyanin, or physical factors like varying salinity, temperature, and pH, should be considered
Instructions
Input requirements
For host:
Host Taxon Suitability: PSOSP is primarily suitable for Gammaproteobacteria, including multiple critical pathogens such as Vibrio cholerae, Pseudomonas aeruginosa, Yersinia pestis, Escherichia coli, Salmonella enterica, Shigella spp., and Klebsiella spp. Compatible bacterial genera can be viewed on the Statistics page of the PSOSP online website.
Genome Quality: We advise using host genomes with a completeness score above 90%, since low-quality genomes may lose the LexA protein and lead to poor results. You can assess the completeness of your genome assembly using CheckM2.
Multi-Contig Genomes: If the host genome consists of multiple contigs, ensure the input host genome file contains all contigs (i.e., provide the genome assembly as a single multi-contig file).
For prophage:
Genome Quality: PSOSP utilizes CheckV for quality assessment. Predictions are reliable for prophages with >=90% CheckV completeness; accuracy decreases for completeness <90%. If you are certain your phage genome is complete, you may disregard the CheckV results in the output file (as CheckV assessments can occasionally be inaccurate).
Multiple Inputs: The input phage genome file can contain sequences for multiple prophages.
Host Association: Input phages must either be integrated into the corresponding host genome or capable of infecting the host. Predicting regulatory relationships between mismatched phage-host pairs is meaningless.
prediction quality: High for phage completeness between 90%-100% or for phages predicted as SdPs; Medium for phage completeness between 50%-90%; Low for phage completeness lower than 50%
PSOSP: Prophage SOS-dependency Predictor
PSOSP (Prophage SOS dependency Predictor) is a novel bioinformatics tool to predict prophage induction modes by analyzing the heterology index (HI) of LexA protein binding to target DNA, classifying prophages into SOS-dependent (SdPs) and SOS-independent (SiPs).
Table of contents (Chinese Tutorial/中文说明✨)
Webserver
We provide an online platform (PSOSP) for rapid prediction of bacteriophage induction modes: https://vee-lab.sjtu.edu.cn/PSOSP/. There you can upload your host and phage genomes and get the prediction results.
Background
Principle
Temperate phages integrate into the bacterial host genome as prophages. Under normal conditions, the LexA protein binds to the SOS box within the prophage, repressing the expression of phage-related genes and maintaining the lysogenic state. Upon external stimuli (such as exposure to DNA-damaging agents), the RecA protein is activated, leading the self-cleavage of LexA and its dissociation from the SOS box. This relieves the prophage repression, triggering the temperate phage to enter the lytic cycle and thereby facilitating its proliferation.
Workflow
Experimental validation
We have validated PSOSP’s accuracy using 14 experimentally confirmed bacteriophages spanning 10 viral families (including 2 Peduoviridae, 3 Inoviridae, and 9 distinct novel families), with their hosts covering 7 bacterial genera (Salmonella, Escherichia, Vibrio, Pseudomonas, Serratia_J, Hafnia, and Shewanella) across 3 bacterial orders (Enterobacterales, Enterobacterales_A, and Pseudomonadales). Significantly, all PSOSP predictions for these bacteriophages showed complete consistency with experimental evidence, demonstrating the tool’s versatility and reliability across broad taxonomic ranges.
Significance for prophage isolation
We propose that future phage isolation efforts could first use PSOSP to determine phage type
Instructions
Input requirements
For host:
For prophage:
Dependencies
Installation
(1) conda (recommended, easiest way to install)
install conda and add channels (If already installed, please skip)
install PSOSP
test installation:
psosp testusage:psosp -hIf you need CheckV results, please download CheckV database
(2) git (install dependencies mentioned above first)
test installation:
psosp testInput files
PSOSP needs two files as inputs,i.e.,
-hf: a host genome in fasta format-vf: phage genome in fasta formatother parameters
-wd: woking path to save result files-faa: host protein sequences in fasta format (optional)-db: checkv reference database path (optional)How to run
The users can only specify the required parameters:
install through conda
using example data in github or in zenodo for a test:
Outputs
In this example, the results of PSOSP’s analysis will be written to the
test/test-resultdirectory, which will look like this:virus_wp2-phage-sp1-sp2-sp3_checkv: this directory contain results of CheckVhost_wp2.fna.faa_lexa_blast.tsv: blast result of LexA proteinhost_wp2.fna_whole_genome_HI.tsv: HI clusters of all potential SOS box in host genome using MeanShift method.host_wp2_prodigal.faa: protein sequences produced by prodigalvirus_wp2-phage-sp1-sp2-sp3_prediction.tsv: prediction results of PSOSPA detailed overview of
virus_wp2-phage-sp1-sp2-sp3_prediction.tsv:host: Input host filenamevirus: Phage identifier in input FASTAprediction_result: prediction induction mode of PSOSP. SiPs: SOS-independent Prophage; SuPs: SOS-uncertain Prophage; SdPs: SOS-dependent Prophageprediction quality:Highfor phage completeness between 90%-100% or for phages predicted as SdPs;Mediumfor phage completeness between 50%-90%;Lowfor phage completeness lower than 50%completeness: Estimated phage completeness (CheckV)contamination: Estimated phage contamination (CheckV)viral-HI(min): Minimal HI in phage genomebox-seq: sequence of potential sos box with minimal HIbox-seq_start_pos: box start position in phage genomebox-seq_strand: + (forward) or - (reverse)confidence_window_lower: Threshold HIc1 (HImin ≤ HIc1 →SdP)confidence_window_upper: Threshold HIc2 (HImin ≥ HIc2 →SiP)blast_status: ‘Blast_OK’ (LexA homologs found) or ‘-‘ if absentfimo_status: ‘Fimo_OK’ (SOS-box detected upstream of LexA) or ‘-‘ if absentCitation
Hao, Yali, Mujie Zhang, Xinjuan Lei, Chengrui Zhu, Taoliang Zhang, Yanping Zheng, Xiang Xiao, and Huahua Jian. 2025. “PSOSP Uncovers Pervasive SOS‐Independent Prophages With Distinct Genomic and Host Traits in Bacterial Genomes.” iMeta e70073. https://doi.org/10.1002/imt2.70073