Every entry in The Lift passes through four stages: ingestion from named sources, normalization to a shared schema, gain resolution across sources, and verification tiering. This document is the spec for each stage, written so any reader can audit the path from source document to public claim.
Each source has its own connector. Connectors are independent — one source breaking, changing its API, or going offline never blocks the rest. Connectors run on a per-source schedule and write to a per-source staging area before any normalization happens.
| Pattern | When | Sources |
|---|---|---|
| REST API pull | Source publishes a structured API. Highest fidelity, lowest maintenance. | World Bank · UN SDG · WHO GHO · ITU · IUCN · UN IGME |
| CSV download | Source publishes annual or quarterly bulk datasets. Stable schema, predictable cadence. | OWID · UNESCO UIS · IHME GBD · Gapminder · IEA |
| Scrape + LLM extract | Source publishes narrative reports or PDFs. Structured scrape pulls report URLs and metadata; Claude extraction pulls (where, when, actor, metric, magnitude) into the normalized schema. | IPCC · GAVI impact reports · UN Special Rapporteur findings · regulator approval notices |
For LLM-extracted gain records, the first several hundred extractions per source go through human review against the original source text. Errors recalibrate the prompt. After calibration, sampling continues at a configurable rate.
Optimistic over-extraction is the failure mode to watch for. A record of gains has an even stronger temptation than a record of harms to overclaim, so the human review loop is especially load-bearing here.
Every source-record, regardless of origin, is mapped to the schema below before entering the gain store. The original record is preserved alongside the normalized form — no data is dropped during translation.
The full schema definition with every enum value lives in schema.json. It is versioned: any breaking change to the shape requires a major-version bump and a migration script for existing records.
Actor names — "World Health Organization", "WHO Headquarters", "WHO Strategic Advisory Group of Experts", "WHO IVB" — need to resolve to a canonical actor identity so that contributions and patterns are visible across sources. Same problem for instruments and metric names across reports.
Approach: embedding-based candidate matching against a canonical actor table; Claude verification on candidates the embedding alone can't separate confidently. The actor table is structured (state agency, multilateral, civil society, private sector, researcher), with the classification subject to its own provenance chain.
When OWID, the World Bank, and the UN IGME all report on the same child-mortality reduction, The Lift creates one canonical gain with three linked source-records — not three duplicate entries, and not one record that hides the others.
Other aggregators often merge source-records and lose the provenance — a magnitude figure shows up with no clear path back to which source said what. The Lift keeps every source-record addressable: a gain record points to its sources, each source has its own URL and methodology disclosure, and a reader can always reconstruct the chain.
Every published record carries a tier. The tier is computed, not editorial — it follows from the source count, source agreement, and time window of corroboration. The adjudicated tier includes any decision by a regulator or status authority of record (FDA, EMA, IUCN status committees, treaty depositories, peace-accord signings).
Two or more independent sources, magnitude figures agreeing within ±20% (or all sources reporting in non-numeric terms with compatible descriptions), all within a corroboration window.
Confirmed by a regulator or status authority of record — FDA/EMA approval, IUCN Red List status change, UN Treaty Collection deposit, signed peace accord, formal disease-elimination certification. The strongest tier; durable, time-stamped fact.
Multiple sources cover the same gain but materially disagree on magnitude or significance. Tier is preserved as disputed; the disagreement itself is surfaced in the public record.
Reported by one source; not yet corroborated. Surfaced anyway, with explicit single-source tag. This is information, not noise — suppressing single-source reports means losing real signal from places and topics where coverage is thin.
Averaging diverging magnitude estimates produces a number that has no methodological backing — it's not what any source published, and it implies a precision the underlying data doesn't support.
So when sources disagree, the record carries the range and the methodology behind each estimate. The honest finding field in the public record synthesizes what can be said responsibly. Readers see what each source published; they don't get a fabricated consensus.
Empty is allowed; missing is not. This is the discipline that distinguishes a gains record from progress-movement marketing. If a gain has documented environmental, equity, or distributional costs, those costs sit on the same record — not on a separate page, and not in a footnote.
The discipline cuts both ways. A "gain" with so much offsetting cost that the net is unclear should not be tiered as a gain at all — it should appear with the disagreement framing of the disputed tier.
A gain's magnitude field is what drives its prominence on the page — dot size on the timeline, ordering in the events table, weight in the country composite. The scoring formula is published, contested in the open, and revised in the corrections log when assumptions change.
magnitude = log10(people_reached) × durability_factor × evidence_factor
The formula is intentionally coarse. It exists to keep visualizations honest, not to produce a final ranking. Readers should see the magnitude as approximate.
Source caveats are metadata, not flaws. Every record carries the caveat disclosure of its source. The reader can choose to weight a national-government claim about its own development gains differently than an independent NGO measurement of the same indicator — and the system surfaces the information needed to make that judgment.
When a magnitude is revised, a record is removed, or a tier is reclassified, the change writes a row to a queryable public log. The page links to this log from every record. Optimistic numbers are especially worth retracting on the same record they were celebrated on.
A gains record has a distinctive failure mode — capture by interests whose products are part of the record. The governance design has to refuse that capture explicitly.