Bringing a new drug to market is one of the longest, most expensive endeavors in industry. The standard drug discovery pipeline takes 10–15 years and costs upward of $2 billion per approved drug. Understanding the phases helps you place your work within the larger flow.
Phase 1: Target identification and validation
The earliest stage. Researchers identify a biological target — a protein, gene, or pathway — whose modulation is hypothesized to treat a disease. Common methods:
- Genetic association studies (GWAS, exome sequencing of disease cohorts)
- CRISPR screens (loss/gain-of-function)
- Multi-omics analysis of patient samples
- Mechanism-based reasoning from disease biology
Validation requires showing that target modulation produces a disease-relevant phenotype in cellular and animal models.
Phase 2: Hit identification
Once a target is validated, researchers find chemical matter (small molecules, antibodies, oligonucleotides) that engage it. Approaches:
- High-throughput screening (HTS) against a library of millions of compounds
- Structure-based drug design (SBDD) using crystal structures or AlphaFold models
- Fragment-based drug discovery (FBDD) screening small fragments and growing them
- Computational virtual screening against billions of virtual molecules
- DNA-encoded libraries (DELs) for ultra-large library screening
Phase 3: Hit-to-lead
Initial hits are typically too weak, non-selective, or have poor properties. Hit-to-lead optimization aims to improve potency (often to nanomolar), selectivity, and basic ADME properties to identify a few promising scaffolds.
Phase 4: Lead optimization
Iterative chemistry to balance potency, selectivity, pharmacokinetics, safety, and physicochemical properties. Hundreds to thousands of analogs are synthesized and tested. Output: 1–2 candidate molecules nominated for development.
Phase 5: Preclinical development
Studies required before human testing:
- In vivo efficacy in disease-relevant animal models
- Pharmacokinetics across multiple species
- GLP toxicology studies in two species (typically rodent + non-rodent)
- Safety pharmacology on cardiovascular, respiratory, CNS systems
- Genotoxicity (Ames test, micronucleus, etc.)
- CMC (chemistry, manufacturing, controls) to make GMP-grade drug
Culminates in IND (Investigational New Drug) filing in the US or CTA (Clinical Trial Application) in Europe.
Phase 6: Clinical trials
| Phase | Patients | Goal |
|---|---|---|
| Phase I | 20–80 (often healthy) | Safety, dose-finding, PK |
| Phase II | 100–300 | Efficacy signal, optimal dose |
| Phase III | 1,000–3,000+ | Pivotal efficacy and safety |
| Phase IV | Variable | Post-marketing surveillance |
Phase 7: Regulatory approval
Submit a New Drug Application (NDA) or Biologics License Application (BLA) to FDA, or Marketing Authorization Application (MAA) to EMA. Review typically takes 6–12 months. Approval is granted if benefit-risk is favorable for the proposed indication.
Phase 8: Post-approval
Manufacturing scale-up, commercial launch, post-marketing surveillance, lifecycle management (label extensions, new formulations, additional indications).
Attrition rates
Of every 10,000 compounds entering preclinical, roughly:
- ~250 enter preclinical development
- ~5 enter Phase I
- ~1 reaches the market
The biggest losses occur in Phase II (efficacy not confirmed) and Phase III (insufficient differentiation or unexpected safety).
How modern approaches are reshaping the pipeline
- AI accelerates target ID, hit identification, and property prediction
- Genetic evidence-supported targets succeed at higher rates
- Biomarker-driven trials reduce Phase II/III risk
- Precision medicine narrows indications and improves efficacy
- Adaptive trial designs improve efficiency
The pipeline is long, but understanding it helps you frame your work — whether you’re at the bench studying a target or designing a clinical program.



