Position: The government dataset is still online but missing entire years

Heather · December 8, 2025, 9:22pm

Example: The EPA’s Air Quality Data Disappearances. During the Trump administration, EPA pages that hosted particulate matter, ozone, and climate science datasets were removed, redesigned, or “redirected to nowhere” with no notice. Researchers found that entire datasets had been replaced by placeholder pages with text like “coming soon.” Independent watchdogs (Environmental Data & Governance Initiative) documented over 5,000 federal web pages altered or removed, including vital PM2.5 datasets. These datasets weren’t legally removed but rather relocated, hidden, or de-linked.

Play: Recover

Recover missing years from the End of Term Web Archives. Search archived EPA datasets from prior administrations to restore missing tables. https://archive.org/details/EndOfTerm2024WebCrawls

Play: Rebuild

Rebuild missing years using state environmental monitors + satellite aerosol data.

Blend state air monitoring feeds with NASA aerosol optical depth (AOD) to reconstruct missing EPA exposure years. https://earthdata.nasa.gov/

Play: Supplement

Supplement gaps with university repositories and citizen-science networks like SafeCast. Community sensor networks often preserve continuous air-quality readings even when federal portals break. https://safecast.org/
Play: Bayesian Model

Use Bayesian hierarchical gap-filling to estimate missing years with principled uncertainty. Borrow strength across nearby monitors, years, and locations to create defensible estimates. https://mc-stan.org/

Play: Smoothing Model

Apply multi-year smoothing and confidence intervals to stabilize incomplete time series. Use rolling averages and uncertainty ranges to avoid over-interpreting broken year-by-year data. https://cran.r-project.org/web/packages/zoo/index.html

Anything we should add?