How to Perform Gene Ontology (GO) Analysis: A Practical Guide

Table of Contents

You’ve finished an RNA-seq differential expression analysis and you have 1,200 differentially expressed genes. Now what? Gene Ontology (GO) enrichment analysis is how you turn that list into biological insight.

What Gene Ontology actually is

The Gene Ontology project organizes gene functions into three structured vocabularies:

  • Biological Process (BP): What the gene contributes to (e.g., “cell cycle”, “DNA repair”)
  • Molecular Function (MF): What the gene does at the molecular level (e.g., “ATP binding”, “kinase activity”)
  • Cellular Component (CC): Where the gene product is located (e.g., “nucleolus”, “mitochondrial outer membrane”)

GO terms are organized as a directed acyclic graph (DAG) — terms become more specific as you go deeper.

The two main types of analysis

Over-representation analysis (ORA)

Tests whether a defined gene list (your significantly DE genes) is enriched for genes annotated to specific GO terms compared to background. Uses a hypergeometric or Fisher’s exact test. Best for: when you have a clean list of “interesting” genes.

Gene set enrichment analysis (GSEA)

Uses the entire ranked gene list (typically ranked by fold-change × significance) and tests whether genes in a given set tend to cluster at the top or bottom of the ranking. No threshold required. Best for: capturing coordinated subtle changes in pathways without arbitrary cutoffs.

Common tools

  • g:Profiler: Web-based, supports many organisms, intuitive output
  • Enrichr: Massive library of annotation sets beyond GO
  • DAVID: Long-standing, web-based, but updates have been slow
  • clusterProfiler (R/Bioconductor): ORA + GSEA in one package
  • GSEA (Broad): The original GSEA implementation, with MSigDB pathway collections
  • WebGestalt: Combines ORA, GSEA, and network methods

Choosing the right background

The most common GO analysis mistake is using the wrong background gene list. The background should be all genes that could have been detected in your experiment, not “all human genes.” For RNA-seq, this is typically the genes that passed expression filtering, not the genome at large. The wrong background inflates p-values dramatically.

Multiple testing correction

You’re testing thousands of GO terms. Always use FDR (Benjamini-Hochberg) correction. Look at adjusted p-values, not raw ones.

Reducing redundancy

GO is hierarchical, so enriched terms often overlap. Tools like:

  • REVIGO: Clusters semantically similar terms
  • simplifyEnrichment (R): Hierarchical clustering of GO terms
  • CirGO or GO Figure!: Visualize summarized term clusters

Beyond GO

  • KEGG: Curated pathways, including metabolism and signaling
  • Reactome: Detailed, curated pathways with reactions
  • MSigDB Hallmark: 50 well-defined biological states
  • WikiPathways: Community-curated

Common pitfalls

  • Using all human genes as background when your assay only detects 14,000
  • Reporting raw p-values instead of FDR-adjusted
  • Listing 100 redundant terms instead of summarizing
  • Inferring causality from enrichment — enrichment is correlation, not mechanism
  • Ignoring direction — separately analyze up- and down-regulated genes if they differ biologically

GO analysis is fast, free, and informative — but only if done with the right background, multiple-testing correction, and redundancy reduction. Treat enrichment as a hypothesis-generating tool, not a definitive answer.

Featured Articles

The Iran War Is Now Hitting Pharma Supply Chains Directly
Daily Updates

The Iran War Is Now Hitting Pharma Supply Chains Directly

The Iran war’s impact on pharmaceutical supply chains is no longer theoretical. Evonik, a major supplier of pharma-grade amino and keto acids, announced a 15% price increase effective immediately, citing rising energy, raw material, and shipping costs caused by the conflict. This is the first

Read More »
Makary Is Out. The FDA Has No Permanent Commissioner.
Daily Updates

Makary Is Out. The FDA Has No Permanent Commissioner.

It’s over. FDA Commissioner Marty Makary resigned on Tuesday after 13 months in the role. The resignation followed days of reporting that the White House had signed off on a plan to replace him. The final trigger was a disagreement over flavored e-cigarette authorization, which

Read More »

Join 85,000+ Biotech, MedTech, and Pharma Leaders

Your Daily Edge in Biotech, MedTech, and Pharma

Get trusted, high-signal updates every morning
Breakthroughs, trial data, deals, and the news that matters