Unlike the -l argument, -L does not require using a list of native protein structures (i.e. a list of PDB codes).
This allows using a set of decoys as an input (each having any type of filename).
This will create statistical potentials, with residues represented by their carbons beta (-r CB)
Each potential will be plotted as a SVG file (-p).
This interatomic squared distances used for the calculations are written into *.dat files (-g).
Note: Any previously created ‘myPotentials/parameters.log’ file will be overwritten.
The pseudo-energy of 1BKR will be calculated with cubic-interpolated potentials (-c).
These interpolated potentials will be plotted as SVG files (-p).
Two TSV files will be written (-w):
the pseudo-energy and distance for each atomic pair (data.tsv);
the pseudo-energy for each residue of the protein sequence (energy_[WINDOW_SIZE].tsv).
All these data are written into ‘myResults’ directory (-o myResults).
Notes:
the default representation is now CB (carbons beta), as defined in ‘myPotentials/parameters.log’;
Same training as case#1 but with Kernel Density Estimations (KDE)
Here, we use an Epanechnikov kernel (-k e), and the kernel bandwidth is selected with the Sheather-Jones direct plug-in (-b SJ-dpi) method.
Each potential will be plotted as a SVG file (-p).
Only the residues 10A to 20A of 1BKR will be processed (-q).
A Z-score will be computed to evaluate the absolute structural quality (-z); the more negative, the better the model.
This Z-score will be computed on 2000 random sequence decoys (-s 2000).
Case#4
After any training:
$ ./scoring -l example/list2.txt -d myPotentials/
Multiple inputs: a pseudo-energy will be calculated for each of the 25 structures of the ‘example/list2.txt’.
The chain name is provided for 2 structures in this list. By default, all chains found will be processed.
This trains the reference state separately (-W) on all atoms (-r allatom).
A ‘frequencies.ref’ file is created, which can then be used (-R) to train a statistical potential.
MyPMFs
Postic G., Hamelryck T., Chomilier J., Stratmann D.
Generate statistical potentials from a user-defined list of protein structures
INSTALL
Type ‘make’ in the terminal. This will create executable binaries named ‘scoring’ and ‘training’.
GET HELP
Run each program without any argument (or with -h option).
EXAMPLES
Case#1
This will create a statistical potential for each residue pair represented by the carbons alpha (n=210; *.nrg files).
The ‘myPotentials/‘ output directory will also contain 3 Tab-Separated Values (.tsv) files with some statistics about the training dataset:
Note: The same results can be obtained with the following command:
Unlike the -l argument, -L does not require using a list of native protein structures (i.e. a list of PDB codes). This allows using a set of decoys as an input (each having any type of filename).
This will calculate the pseudo-energy of the structure 1BKR by using the previously computed potentials.
Case#2
This will create statistical potentials, with residues represented by their carbons beta (-r CB) Each potential will be plotted as a SVG file (-p). This interatomic squared distances used for the calculations are written into *.dat files (-g).
Note: Any previously created ‘myPotentials/parameters.log’ file will be overwritten.
The pseudo-energy of 1BKR will be calculated with cubic-interpolated potentials (-c). These interpolated potentials will be plotted as SVG files (-p). Two TSV files will be written (-w):
Notes:
Case#3
Same training as case#1 but with Kernel Density Estimations (KDE) Here, we use an Epanechnikov kernel (-k e), and the kernel bandwidth is selected with the Sheather-Jones direct plug-in (-b SJ-dpi) method. Each potential will be plotted as a SVG file (-p).
Only the residues 10A to 20A of 1BKR will be processed (-q). A Z-score will be computed to evaluate the absolute structural quality (-z); the more negative, the better the model. This Z-score will be computed on 2000 random sequence decoys (-s 2000).
Case#4
After any training:
Multiple inputs: a pseudo-energy will be calculated for each of the 25 structures of the ‘example/list2.txt’. The chain name is provided for 2 structures in this list. By default, all chains found will be processed.
Case#5
This trains the reference state separately (-W) on all atoms (-r allatom). A ‘frequencies.ref’ file is created, which can then be used (-R) to train a statistical potential.
Thus, the observed frequencies are trained on backbones, while the reference state is trained on all atoms.
Contact: guillaume.postic@u-paris.fr