Data, methods & sources

Everything here is built from public data with an open, reproducible pipeline. This page documents every source, how each figure is computed, what has been verified exactly versus what is directional, and the honest limitations. The whole pipeline is on GitHub and refreshes monthly.

Sources

Source

What it provides

Access & caveats

Activate companies directory

224 ventures: name, cohort year, hub, verticals, fellows, websites, and the human-written Critical Need / Technology Vision / Potential Impact.

Softr app over an Airtable base; harvested via headless browser (JS-rendered). The complete public directory (re-harvested, unchanged at 224). Activate's marketing pages cite ~235 total, a broader count that includes ventures not in the public directory.

Activate fellows directory

292 fellows: biography, cohort, hub, company, LinkedIn. The bios state degree + university for ~92%.

Separate Softr/Airtable table, same harvest. The complete public directory; Activate cites ~294 total.

USAspending.gov

Federal awards, per-company non-dilutive outcomes, per-space ecosystem funding, and agency breakdowns.

Public API. Recipient exact-match + $25M/award cap to exclude institutional name collisions.

OpenAlex

Founder research footprints (works, citations, h-index, pre-founding topics, affiliations) and field publication velocity.

Public API (large open scholarly index). Name disambiguation is field/domain-aware; 100 of 224 companies resolved, 112 founders.

SEC EDGAR

Form D filings as a private-capital-raised signal.

Full-text search API. Presence/absence only.

IRS Form 990 (ProPublica)

Activate's own revenue, expenses, and net assets by year.

Public nonprofit filings. EIN 47-5502184.

The Engine (engine.xyz)

MIT's 'Tough Tech' venture firm (founded 2016, invests for equity). Its 57 public portfolio companies, mapped onto Activate's verticals for a model-vs-model comparison.

Harvested via headless browser; their taxonomy is coarser, so the comparison is directional positioning.

How each figure is computed

Federal non-dilutive funding
$227.2M across 102 companies. USAspending assistance awards (grants + cooperative agreements) whose recipient name exactly matches the company, summed, with a $25M/award cap to drop institutional collisions. DOD SBIR contracts are a known gap.
Research momentum (radar x-axis)
Each field's share of all global publications (OpenAlex) in 2021–24 vs its 2013–16 baseline. Normalized to share so it isn't fooled by the index growing. Fields are keyword-defined, so the ranking is the signal, not the exact multiple.
Federal funding momentum (radar bubble / Space Forecast)
USAspending obligations matching a per-space keyword, recent (FY22–25) vs prior (FY16–19) windows. Keyword-matched against award text; directional.
Activate presence (radar y-axis)
Share of the in-view ventures in each vertical (cross-filters with the dashboard).
Whitespace / sourcing opportunity
Research momentum × federal-funding momentum, discounted by Activate's existing presence. A prompt to investigate, not a directive.
Founder research profile
OpenAlex author match with field/domain-aware disambiguation and a senior-namesake guard (reject h-index > 60, no recent founder is a 100-h-index professor). Unresolved founders are non-academic or genuinely absent from OpenAlex.
Typical-founder profile
Median + interquartile range of citations, h-index, and years from first paper to founding across the 112 resolved founders. Descriptive baseline, not a model.
Selection loop closure
The 89 companies with a cited founder, split at the median citation count, compared on the rate of winning federal funding. At this N the rates are within a couple of points, reported as the honest null it is.
Fellow background
Degree level and universities parsed from the bios with regex heuristics (88% PhD; 92% name a university). The top-school ranking is robust; individual parses can miss.
Discipline → space
Field of study parsed from each bio, mapped to the fellow's company verticals.
Emerging Science (bottom-up topics)
Growth in each OpenAlex topic's share of publications (2016–17 vs 2023–24) across the deep-tech domains, crossed with federal funding momentum (USAspending, curated keywords, only counted when current funding ≥$20M so tiny-base ratios can't dominate) and against the topics Activate fellows publish in. Ranked by opportunity = research × funding. Share-normalized; growth above ~8x dropped as a coverage artifact; some tagging noise remains, so it is a candidate surface, not a ranking.
Hub specialization
A hub's share of a vertical minus the portfolio-wide share (over/under-index in percentage points).
Model & finances
Revenue, expenses, net assets straight from the Form 990s; the 7.2× is FY2019 → FY2023 revenue.

Verified exactly vs. directional

Verified to the number (recomputed from the source records): company and fellow counts, 88% PhD, the 7.2× revenue growth from the 990s, the cohort-shift percentages, the convergence pair counts, and the hub over-index figures.

Entity-matched public-record estimate: the $227.2M federal total is summed from USAspending assistance awards matched to each company by exact recipient name, with a $25M/award cap to drop institutional collisions. The top recipients are confirmed real Activate companies with real DOE grants, but exact-name matching is conservative, it can miss awards (11 companies show NSF-API awards not captured in the all-agency total, and DOD SBIR contracts are excluded). Treat it as a careful lower-bound estimate, not an exact figure.

Directional(trust the ranking, not the magnitude): all keyword-based momentum, research velocity, federal funding by space, and the peer-funder comparison, where the other funder's taxonomy differs from Activate's. These are labeled as such in the interface.

Reported as null: the selection loop closure. The hypothesis that founder research depth predicts funding outcomes does not hold at the current sample, and the dashboard says so rather than dressing up an outlier-driven average.

Limitations

Fields are defined by keyword queries, not clean discipline boundaries, magnitudes are approximate.
Name-matching (founders to OpenAlex, companies to federal recipients) has finite precision; guards reduce but don't eliminate error.
Bio parsing is heuristic; aggregate distributions are reliable, individual rows can be wrong.
Founder-level analyses run on 112 resolved founders across 100 companies (89 in the citation-depth split), descriptive and hypothesis-generating, not predictive.
Identity demographics (gender, race, age) are deliberately not inferred, the signal here is scientific and career depth, used to widen discovery, not to gate it.
It is a point-in-time snapshot, refreshed monthly; not a live feed.

Read the synthesis in the Point of View, or explore the live data in the dashboard.