Methodology — The Lift

1 · Ingestion

How sources enter the system

Each source has its own connector. Connectors are independent — one source breaking, changing its API, or going offline never blocks the rest. Connectors run on a per-source schedule and write to a per-source staging area before any normalization happens.

Three ingestion patterns

Pattern	When	Sources
REST API pull	Source publishes a structured API. Highest fidelity, lowest maintenance.	World Bank · UN SDG · WHO GHO · ITU · IUCN · UN IGME
CSV download	Source publishes annual or quarterly bulk datasets. Stable schema, predictable cadence.	OWID · UNESCO UIS · IHME GBD · Gapminder · IEA
Scrape + LLM extract	Source publishes narrative reports or PDFs. Structured scrape pulls report URLs and metadata; Claude extraction pulls (where, when, actor, metric, magnitude) into the normalized schema.	IPCC · GAVI impact reports · UN Special Rapporteur findings · regulator approval notices

Reviewer in the loop

For LLM-extracted gain records, the first several hundred extractions per source go through human review against the original source text. Errors recalibrate the prompt. After calibration, sampling continues at a configurable rate.

Optimistic over-extraction is the failure mode to watch for. A record of gains has an even stronger temptation than a record of harms to overclaim, so the human review loop is especially load-bearing here.

2 · Normalization

The shared gain-record schema

Every source-record, regardless of origin, is mapped to the schema below before entering the gain store. The original record is preserved alongside the normalized form — no data is dropped during translation.

// One source-record { "id": "who_gho-vac.dtp3.2023.global", "source_id": "who_gho", "source_record_id": "VAC_DTP3", "source_url": "https://ghoapi.azureedge.net/api/VAC_DTP3", "ingested_at": "2026-05-14T06:00:00Z", "country_iso": "GLB", "country_name": "Global", "actor_raw": "WHO/UNICEF Estimates of National Immunization Coverage", "actor_normalized": "who.headquarters", "actor_relationship_to_state": "multilateral", "event_type": "vaccine_coverage_gain", "date": "2023", "magnitude": { "value": 84, "unit": "percent" }, "direction": "up_is_good", "summary": "DTP3 coverage of one-year-olds: 84% globally", "source_caveats": "WHO point estimates; IHME GBD modeled estimates differ slightly", "gain_id": "gain-2023-glb-dtp3-coverage", "confidence": 0.92 }

The full schema definition with every enum value lives in schema.json. It is versioned: any breaking change to the shape requires a major-version bump and a migration script for existing records.

Entity resolution

Actor names — "World Health Organization", "WHO Headquarters", "WHO Strategic Advisory Group of Experts", "WHO IVB" — need to resolve to a canonical actor identity so that contributions and patterns are visible across sources. Same problem for instruments and metric names across reports.

Approach: embedding-based candidate matching against a canonical actor table; Claude verification on candidates the embedding alone can't separate confidently. The actor table is structured (state agency, multilateral, civil society, private sector, researcher), with the classification subject to its own provenance chain.

3 · Gain resolution

Linking source-records into gain records

When OWID, the World Bank, and the UN IGME all report on the same child-mortality reduction, The Lift creates one canonical gain with three linked source-records — not three duplicate entries, and not one record that hides the others.

Candidate matching by metric + actor + date + summary embedding, within a configurable window.
Claude verification pass on candidates: "Are these source-records describing the same gain? Yes / No / Probably / Related but distinct."
Canonical gain generated: best-supported date, scope, magnitude range across sources, full source-record list.
If magnitude figures across confirming sources differ by >50%, tier flips to disputed and all source figures surface in the public record.

Why preservation matters

Other aggregators often merge source-records and lose the provenance — a magnitude figure shows up with no clear path back to which source said what. The Lift keeps every source-record addressable: a gain record points to its sources, each source has its own URL and methodology disclosure, and a reader can always reconstruct the chain.

4 · Verification tiering

How records are classified

Every published record carries a tier. The tier is computed, not editorial — it follows from the source count, source agreement, and time window of corroboration. The adjudicated tier includes any decision by a regulator or status authority of record (FDA, EMA, IUCN status committees, treaty depositories, peace-accord signings).

Multi-source

Two or more independent sources, magnitude figures agreeing within ±20% (or all sources reporting in non-numeric terms with compatible descriptions), all within a corroboration window.

Adjudicated

Confirmed by a regulator or status authority of record — FDA/EMA approval, IUCN Red List status change, UN Treaty Collection deposit, signed peace accord, formal disease-elimination certification. The strongest tier; durable, time-stamped fact.

Disputed

Multiple sources cover the same gain but materially disagree on magnitude or significance. Tier is preserved as disputed; the disagreement itself is surfaced in the public record.

Single-source

Reported by one source; not yet corroborated. Surfaced anyway, with explicit single-source tag. This is information, not noise — suppressing single-source reports means losing real signal from places and topics where coverage is thin.

Why we don't average

Averaging diverging magnitude estimates produces a number that has no methodological backing — it's not what any source published, and it implies a precision the underlying data doesn't support.

So when sources disagree, the record carries the range and the methodology behind each estimate. The honest finding field in the public record synthesizes what can be said responsibly. Readers see what each source published; they don't get a fabricated consensus.

5 · Costs alongside gains

Every gain has a costs field

Empty is allowed; missing is not. This is the discipline that distinguishes a gains record from progress-movement marketing. If a gain has documented environmental, equity, or distributional costs, those costs sit on the same record — not on a separate page, and not in a footnote.

Haber-Bosch: feeds roughly half of humanity; eutrophication, nitrogen runoff, and energy-intensive manufacturing are co-listed costs.
Green Revolution: famine reduction across South Asia; aquifer depletion, monoculture pesticide exposure, and smallholder displacement are co-listed.
Smallpox eradication: ~5M deaths/year averted continuously since 1980; no significant ongoing cost identified — empty is correct.
Renewables expansion: displaces fossil generation; mining footprints (lithium, cobalt, rare earths) and end-of-life recycling are co-listed.
GLP-1 agonists: cardiovascular and renal benefit; cost-access disparities, muscle loss, and supply-driven scarcity for type-2 patients are co-listed.

The discipline cuts both ways. A "gain" with so much offsetting cost that the net is unclear should not be tiered as a gain at all — it should appear with the disagreement framing of the disputed tier.

6 · Magnitude scoring

How a gain's reach is rated

A gain's magnitude field is what drives its prominence on the page — dot size on the timeline, ordering in the events table, weight in the country composite. The scoring formula is published, contested in the open, and revised in the corrections log when assumptions change.

Magnitude formula (v1)

magnitude = log10(people_reached) × durability_factor × evidence_factor

People reached: the count of people whose lives are materially affected by the gain. For disease eradications, this is the prevalence × population × years of effect.
Durability factor: 1.0 for irreversible (eradication, ratification deposit, regulator approval), 0.7 for institutional commitment, 0.4 for a single-year measurement.
Evidence factor: 1.0 for multi-source agreement, 0.8 for single-source, 0.6 for disputed.

The formula is intentionally coarse. It exists to keep visualizations honest, not to produce a final ranking. Readers should see the magnitude as approximate.

7 · Caveat disclosure

Every source has known shape

Source caveats are metadata, not flaws. Every record carries the caveat disclosure of its source. The reader can choose to weight a national-government claim about its own development gains differently than an independent NGO measurement of the same indicator — and the system surfaces the information needed to make that judgment.

National governments on their own gains: reported, tagged as such; corroborated against independent sources before tier flips to multi.
Private-sector announcements (drug approvals, renewable capacity): recorded only when there is a regulator filing or independent verification.
UN/multilateral aggregations: generally high-fidelity; the underlying country reports inherit each country's caveats.
NGO impact reports: generally high-fidelity; mission framing disclosed (e.g., GAVI's program scope, WHO program priorities).
Long historical reconstructions (Gapminder, OWID): useful for narrative scope; uncertainty bands disclosed; not used as primary source for any specific gain entry.

8 · Public corrections log

Everything that changes, recorded

When a magnitude is revised, a record is removed, or a tier is reclassified, the change writes a row to a queryable public log. The page links to this log from every record. Optimistic numbers are especially worth retracting on the same record they were celebrated on.

What gets logged

Magnitude figure revised (before/after, source citation, reason).
Tier reclassified (single → multi, multi → disputed, etc.).
Source-record added or removed from a gain.
Gain merged into another, or split apart.
Costs field updated (cost added, cost downgraded, cost retracted).
Record retracted (with explicit retraction notice preserved at original URL).

9 · Governance

How the project stays honest over time

A gains record has a distinctive failure mode — capture by interests whose products are part of the record. The governance design has to refuse that capture explicitly.

Open data, open methodology. Every ingestion script, every normalization rule, every magnitude scoring weight is public. The data the public sees is the data the public can audit.
No corporate-sponsored gains. Foundation- and reader-supported. Vaccine manufacturers, renewable-energy firms, and pharmaceutical developers do not fund the record their products appear in.
Independent advisory board. Drawn from progress studies, public health, conservation biology, and rights work. Reviews magnitude scoring weights and caveat disclosures.
Public corrections log. All changes recorded with reason and timestamp. Reviewable by anyone.
Costs alongside gains, always. No exception for politically popular categories. A gain whose costs we won't name isn't a gain that's been honestly measured.