Coding Potential Calculator (CPC) is a Support Vector Machine-based
classifier to assess the protein-coding potential of a transcript (i.e
whether a cDNA/RNA transcript could encode a peptide or not) based on
six biologically meaningful sequence features. It takes nucleotide
FASTA sequences as input, and generate output about the coding status
and the “supporting evidence” for the sequence.
A relatively comprehensive protein database. UniRef90 and NCBI nr
should be both okay. The database should be named as “prot_db”, and put under the data/
subdir.
Install:
Unpack the tarball:
tom@linux$ gzip -dc cpc-0.9-r2.tar.gz | tar xf -
Build third-part packages:
tom@linuxcdcpc−0.9−r2tom@linux export CPC_HOME=”PWD"tom@linux cd libs/libsvm
tom@linuxgzip−dclibsvm−2.81.tar.gz∣tarxf−tom@linux cd libsvm-2.81
tom@linuxmake clean && make
tom@linux cd ../..
tom@linuxgzip−dcestate.tar.gz∣tarxf−tom@linux cd estate
tom@linux$ make clean && make
Format BLAST database, named it as “prot_db”, and put under the cpc/data/.
tom@linuxcdCPC_HOME/data
tom@linux$ formatdb -i (your_fasta_file) -p T -n prot_db
Coding Potential Calculator
Introduction
Coding Potential Calculator (CPC) is a Support Vector Machine-based classifier to assess the protein-coding potential of a transcript (i.e whether a cDNA/RNA transcript could encode a peptide or not) based on six biologically meaningful sequence features. It takes nucleotide FASTA sequences as input, and generate output about the coding status and the “supporting evidence” for the sequence.
Pre-requisite:
NCBI BLAST package: a local version could be downloaded from http://www.ncbi.nlm.nih.gov/blast/
A relatively comprehensive protein database. UniRef90 and NCBI nr should be both okay.
The database should be named as “prot_db”, and put under the data/ subdir.
Install:
Unpack the tarball:
tom@linux$ gzip -dc cpc-0.9-r2.tar.gz | tar xf -
Build third-part packages:
tom@linuxcdcpc−0.9−r2tom@linux export CPC_HOME=”PWD"tom@linux cd libs/libsvm tom@linuxgzip−dclibsvm−2.81.tar.gz∣tarxf−tom@linux cd libsvm-2.81 tom@linuxmake clean && make tom@linux cd ../.. tom@linuxgzip−dcestate.tar.gz∣tarxf−tom@linux cd estate tom@linux$ make clean && make
Format BLAST database, named it as “prot_db”, and put under the cpc/data/.
tom@linuxcdCPC_HOME/data tom@linux$ formatdb -i (your_fasta_file) -p T -n prot_db
Run the predict
tom@linuxcdCPC_HOME tom@linux$ bin/run_predict.sh (input_seq) (result_in_table) (working_dir) (result_evidence)
========= See the website for tutorial and more details. (http://cpc.cbi.pku.edu.cn)
Contact: cpc@mail.cbi.pku.edu.cn