Data Sources & Methodology

How we collect and process pharmaceutical data

Data updated: Jul 17, 2026

Our Approach

TheraRadar aggregates data exclusively from official government sources. We do not scrape commercial databases or estimate data. Every data point can be traced back to FDA, ClinicalTrials.gov, SEC, or CMS.

Primary Data Sources

Data Type	Source	Update Frequency	Coverage
FDA Drug Approvals	FDA Drugs@FDA	Weekly	8,621 drugs
Drug Labels	FDA DailyMed	Monthly	Indications, MOA, warnings
Patents & Exclusivity	FDA Orange Book	Monthly	Small molecule patents
Biologics	FDA Purple Book	Monthly	Biosimilars, reference products
Clinical Trials	ClinicalTrials.gov	Weekly	~147,000 indexed trials
SEC Filings	SEC EDGAR	Weekly	8-K, 10-K, 10-Q filings
PDUFA Dates	SEC 8-K Filings (extracted)	Weekly	FDA target action dates
Medicare Spending	CMS Part D Data	Annual	Drug spending 2013-2022
Drug Targets	FDA DailyMed Labels	Monthly	866 targets from drug labels
Genetic Evidence	Open Targets	Quarterly	GWAS associations, disease links

Drug Data Processing

Source: We parse the FDA Drugs@FDA bulk data files which contain all FDA-approved drug applications (NDAs, ANDAs, BLAs) since 1939.

Enrichment: Drug labels from DailyMed are parsed to extract indications, mechanism of action, targets, warnings, and adverse reactions.

Normalization: Brand names and generic names are normalized. Company names are standardized across acquisitions and name changes.

Clinical Trials Index

Source: We maintain a local index of ~147,000 clinical trials from ClinicalTrials.gov for dashboard analytics, covering interventional Phase 1-4 trials started from 2008 onwards.

Live queries: The Trials Explorer queries ClinicalTrials.gov in real-time, searching all 440,000+ trials in the full CT.gov database - not limited to our indexed subset.

Filtering: Our index focuses on interventional drug trials in Phase 1-4 (excluding device, behavioral, observational, and non-phased studies). The Trials Explorer live search has no such filter and returns all matching studies.

Indication Taxonomy

Manual curation: We maintain a hand-curated taxonomy that maps FDA indication text and clinical trial conditions to standardized therapeutic areas.

Coverage: See our taxonomy dashboard for current mapping coverage and completeness metrics.

Therapeutic areas: Oncology, CNS, Cardiovascular, Metabolic, Infectious Disease, Immunology, Respiratory, Rare Disease, and more.

PDUFA Date Extraction

Method: We search SEC EDGAR full-text for PDUFA-related keywords in 8-K, 10-K, and 10-Q filings.

Keywords: "PDUFA", "Prescription Drug User Fee Act", "FDA target action date", "FDA goal date"

Limitations: Not all companies disclose PDUFA dates. We supplement with manual entries for known upcoming decisions.

Patent & Exclusivity Data

Orange Book: Contains patents and exclusivity for small molecule drugs (NDAs). Updated monthly by FDA.

Purple Book: Contains biosimilar interchangeability and reference product designations for biologics (BLAs).

Limitation: Many biologics are not listed in the Orange Book. Patent cliff analysis focuses on drugs with Medicare Part D spending data.

Drug Targets & Mechanism of Action

Source: Drug targets are extracted from FDA-approved drug labels via DailyMed using AI-assisted extraction. The "Mechanism of Action" and "Clinical Pharmacology" sections contain target information.

What we extract: Gene symbols (like GLP1R, EGFR, TNF) identified from label text using LLM analysis. We normalize these to standard HGNC gene symbols where possible.

Coverage: 866 unique drug targets from ~1,700 drugs with target annotations. Not all FDA-approved drugs have explicit target information in their labels.

Limitation: Target extraction depends on label text quality. Older drugs and some generics may lack detailed mechanism information. Investigational drugs (not yet approved) are not included.

Genetic Evidence

Source: Open Targets Platform integrates genetic associations from GWAS studies, rare disease genetics, and functional genomics.

What we show: For each drug target, we display diseases with genetic evidence linking mutations in that gene to disease risk (e.g., GLP1R variants associated with Type 2 diabetes).

Why it matters: Drugs targeting genes with strong genetic evidence have historically shown higher clinical trial success rates. Genetic validation de-risks target selection.

Scoring: Open Targets provides association scores (0-1) based on evidence strength, study size, and statistical significance.

AI in the Workflow

Where AI helps: Several data pipelines use AI-assisted extraction. Drug targets are extracted from DailyMed labels using LLM analysis. PDUFA dates are extracted from SEC EDGAR full-text using LLM-aided keyword analysis. Brief research involves AI-assisted aggregation of primary sources — FDA approval letters, SEC filings, peer-reviewed publications, press releases — and AI-assisted draft generation.

Where it doesn't go: Editorial judgment, source verification, and the final voice on every published Brief are mine. Every specific number, date, trial result, or regulatory decision cited in a Brief is verified against the primary source before publishing. Cross-trial comparisons, head-to-head positioning, and competitive claims are written and stress-tested by hand.

Why this is transparent: AI accelerates the work — research aggregation, data pipeline maintenance, draft scaffolding — but the editorial product is mine. If you find an error, it's mine to fix. Let me know.

Known Limitations

• Drug names: ClinicalTrials.gov uses base names (e.g., "Fludarabine") while FDA uses salt forms (e.g., "FLUDARABINE PHOSPHATE"). We maintain manual mappings but some may be missing.
• Company attribution: FDA lists the original sponsor, not necessarily the current manufacturer. We attempt to track acquisitions but may miss some.
• Indication mapping: Our taxonomy covers major indications but some rare conditions may not be mapped. See coverage metrics on the taxonomy dashboard.
• PDUFA completeness: We capture dates disclosed in SEC filings. Private companies and some situations may not have public disclosures.

Found an error?

If you notice incorrect data, please report it and we'll investigate. Include the drug/company name and what you believe is incorrect.