目录

FASTG to Protein Library

This package generates a candidate protein library in two phases:

  1. Parsing a FASTG file to create graph traversals of longer stretches of DNA

    • FASTG is parsed into a directed graph. A depth-first search is made on all connecting
        edges. The DFS traversal is then used to concatenate all DNA sequences in the path.
    • DNA sequences are translated to mRNA and split into candidate proteins at the stop
        codon. Each DFS traversal can, and will, produce a set of candidate protein sequences.
    • Protein sequences are filtered on length and amino acid redundancy.
    • Protein sequences are cleaved into peptide sequences.
    • DFS traversals, proteins and peptides are stored in a SQLite database. The linking
        relationship between all three is maintained in the DB.
    • A FASTA file of peptides is produced for the user. This FASTA file is to be used
        in a search against MSMS data.
  2. Using verified peptides as a filter to produce a final candidate peptide library

    • The user will invoke the code with
      • DB
      • list of peptide sequences or peptide FASTA
      • It is expected that the submitted peptides have been verified against MSMS and they represent found and identified peptide sequences
    • The verified peptides are used to filter proteins from the database, these
        proteins become the final library.
    • The verified peptides are used to score the proteins for
      • coverage
      • percent of verified v. total peptide association
    • Final user output
      • SQLite database
      • Protein score text file, comma delimited
      • Filtered protein FASTA file
关于

将FASTG格式的基因组组装文件转换为蛋白质序列库

42.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号