No docopt (#336)
Remove docopt dependency
Clean up test output artifacts
Trim default pytest targets
Address review feedback: conftest cleanup scope, docopt doc=None, compare_gos validation, re-enable tests (#337)
Initial plan
Address review comments: conftest scope/obo, docopt doc=None, compare_gos nargs, pyproject tests
Co-authored-by: tanghaibao 106987+tanghaibao@users.noreply.github.com
Co-authored-by: copilot-swe-agent[bot] 198982749+Copilot@users.noreply.github.com Co-authored-by: tanghaibao 106987+tanghaibao@users.noreply.github.com
Co-authored-by: Copilot 198982749+Copilot@users.noreply.github.com Co-authored-by: tanghaibao 106987+tanghaibao@users.noreply.github.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
GOATOOLS: A Python library for Gene Ontology analyses
How to cite
goatools compare_gos, which can be used with or without grouping.Contents
This package contains a Python library to
Process over- and under-representation of certain GO terms, based on Fisher’s exact test. With numerous multiple correction routines including locally implemented routines for Bonferroni, Sidak, Holm, and false discovery rate. Also included are multiple test corrections from statsmodels: FDR Benjamini/Hochberg, FDR Benjamini/Yekutieli, Holm-Sidak, Simes-Hochberg, Hommel, FDR 2-stage Benjamini-Hochberg, FDR 2-stage Benjamini-Krieger-Yekutieli, FDR adaptive Gavrilov-Benjamini-Sarkar, Bonferroni, Sidak, and Holm.
Process the obo-formatted file from Gene Ontology website. The data structure is a directed acyclic graph (DAG) that allows easy traversal from leaf to root.
Read GO Association files:
Print decendants count and/or information content for a list of GO terms
Get parents or ancestors for a GO term with or without optional relationships, including Print details about a GO ID’s parents
Get children or descendants for a GO term with or without optional relationships
Compare two or more lists of GO IDs
Plot GO hierarchies
Write GO hierarchies to an ASCII text file
Group GO terms for easier viewing
Map GO terms (or protein products with multiple associations to GO terms) to GOslim terms (analog to the map2slim.pl script supplied by geneontology.org)
Installation
Make sure your Python version >= 3.7, and download an
.obofile of the most current GO:or
.obofile for the most current GO Slim terms (e.g. generic GOslim) :PyPI
To install the development version:
Bioconda
Dependencies
When installing via PyPI or Bioconda as described above, all dependencies are automatically downloaded. Alternatively, you can manually install:
scipy.stats.fisher_exactstatsmodels(optional) for access to a variety of statistical tests for GOEACookbook
run.shcontains example cases using the installedgoatoolsCLI.Find GO enrichment of genes under study
See examples in find_enrichment
The
goatools find_enrichmentcommand takes as arguments files containing:--compareis specified)Please look at
tests/datafolder to see examples on how to make these files. when ready, the command looks like:and can filter on the significance of (e)nrichment or (p)urification. it can report various multiple testing corrected p-values as well as the false discovery rate.
The
ein the “Enrichment” column means “enriched” - the concentration of GO term in the study group is significantly higher than those in the population. The “p” stands for “purified” - significantly lower concentration of the GO term in the study group than in the population.Important note: by default,
goatools find_enrichmentpropagates counts to all the parents of a GO term. As a result, users may find terms in the output that are not present in theirassociationfile. Use--no_propagate_countsto disable this behavior.Write GO hierarchy
goatools wr_hier: Given a GO ID, write the hierarchy below (default) or above (--up) the given GO.Plot GO lineage
goatools go_plot:-r)goatools plot_go_termcan plot the lineage of a certain GO term, by:This command will plot the following image.
Sometimes people like to stylize the graph themselves, use option
--gmlto generate a GML output which can then be used in an external graph editing software like Cytoscape. The following image is produced by importing the GML file into Cytoscape using yFile orthogonal layout and solid VizMapping. Note that the GML reader plugin may need to be downloaded and installed in thepluginsfolder of Cytoscape:Map GO terms to GOslim terms
See
goatools map_to_slimfor usage. As arguments it takes the gene ontology files:go-basic.obogoslim_generic.oboor any other GOslim file)The script either maps one GO term to its GOslim terms, or protein products with multiple associations to all its GOslim terms.
To determine the GOslim terms for a single GO term, you can use the following command:
To determine the GOslim terms for protein products with multiple associations:
Where the
associationfile has the same format as used forgoatools find_enrichment.The implementation is similar to map2slim.
Technical notes
Available statistical tests for calculating uncorrected p-values
For calculating uncorrected p-values, we use SciPy:
scipy.stats.fisher_exactAvailable multiple test corrections
We have implemented several significance tests:
bonferroni, bonferroni correctionsidak, sidak correctionholm, hold correctionfdr, false discovery rate (fdr) implementation using resamplingAdditional methods are available if
statsmodelsis installed:sm_bonferroni, bonferroni one-step correctionsm_sidak, sidak one-step correctionsm_holm-sidak, holm-sidak step-down method using Sidak adjustmentssm_holm, holm step-down method using Bonferroni adjustmentssimes-hochberg, simes-hochberg step-up method (independent)hommel, hommel closed method based on Simes tests (non-negative)fdr_bh, fdr correction with Benjamini/Hochberg (non-negative)fdr_by, fdr correction with Benjamini/Yekutieli (negative)fdr_tsbh, two stage fdr correction (non-negative)fdr_tsbky, two stage fdr correction (non-negative)fdr_gbs, fdr adaptive Gavrilov-Benjamini-SarkarIn total 15 tests are available, which can be selected using option
--method. Please note that the default FDR (fdr) uses a resampling strategy which may lead to slightly different q-values between runs.iPython Notebooks
Optional attributes
definition
Run a Ontology Enrichment Analysis (GOEA)
goea_nbt3102 human phenotype ontologies
Show many study genes are associated with RNA, translation, mitochondria, and ribosomal
goea_nbt3102_group_results
Report level and depth counts of a set of GO terms
report_depth_level
Find all human protein-coding genes associated with cell cycle
cell_cycle
Calculate annotation coverage of GO terms on various species
annotation_coverage
Determine the semantic similarities between GO terms
semantic_similarity semantic_similarity_wang
Obsolete GO terms are loaded upon request
godag_obsolete_terms
Want to Help?
Prior to submitting your pull request, please add a test which verifies your code, and run:
Items that we know we need include:
Add code coverage runs
Edit tests in the
makefileunder the commentHelp setting up documentation. We are using Sphinx and Python docstrings to create documentation. For documentation practice, use make targets:
To remove practice documentation:
Once you are happy with the documentation do:
Star History
Copyright (C) 2010-2021, Haibao Tang et al. All rights reserved.