Data Center Stress Index

Who bears the cost
when a data center moves in?

Every new data center consumes water, draws power from the grid, and occupies land. But the communities hosting them rarely have the data to evaluate whether they are getting a fair deal. This tool changes that.

What questions does this dashboard answer?

1. Which counties bear the heaviest resource burden from data centers — in water, energy, and land?
2. Are communities getting fair economic returns — jobs and wages — relative to the resources consumed?
3. How transparent are states about permitting, environmental review, and tax incentives for data centers?
4. Who actually owns these facilities — and how much of the industry is controlled by foreign entities?
5. What would happen to county stress scores if proposed federal or state legislation were enacted?
6. Where are the geographic hot spots — regions where high-burden data center counties cluster together?

What we calculate

Every U.S. county hosting a data center receives four scores, each normalized on a 0–1 scale:

Resource Burden Score (RBS)

Measures the combined strain on local water, energy, and land. Higher = more burden on community resources. Weighted: 40% water, 35% energy, 25% land.

Regulatory Opacity Score (ROS)

Measures how transparent the state regulatory environment is. Higher = less accountability in permitting, tax incentives, and environmental review.

Economic Return Score (ERS)

Measures what the community gets back in local jobs and wages per megawatt of installed capacity. Higher = better return. Uses Monte Carlo simulation for confidence intervals.

Composite Stress Index (CSI)

Blends all three scores: 50% burden, 30% opacity, 20% inverted return. Counties receive letter grades A through F. This is the single number that ranks communities by overall data center stress.

How we got the data

The entire index is built from 19 public data sources, all free. No proprietary datasets. No gated APIs. Total cost: under $500.

Facilities
PNNL DC Atlas, FracTracker Alliance, Epoch AI
Water Stress
WRI Aqueduct 4.0, spatially joined to counties
Energy
EIA-923 generation + company sustainability reports
Land Use
PNNL sqft, Census TIGER ALAND, MW-to-acreage estimates
Employment
BLS QCEW (NAICS 518210), imputed where suppressed
Regulatory
State Regulatory Index (42 jurisdictions), EPA FLIGHT
Boundaries
Census TIGER shapefiles, congressional districts
Ownership
V8 facility traits: 191 foreign-owned across 14 countries
Policy
12 legislative proposals decomposed into formula parameters

What we know we don’t know

Transparency requires honesty about gaps. Here is what we cannot yet measure:

Energy is estimated, not metered. No public dataset tracks facility-level electricity consumption. V9 estimates consumption using LBNL’s 50% utilization methodology (down from V8’s 100% assumption). Operator-reported data from Meta (18 facilities) and Google/AWS PUE is used where available. For most facilities, consumption is modeled from nameplate MW.
247 counties lack acreage data. Not all facility sources report square footage or lot size. We estimate from MW capacity using industry heuristics (50 acres per 100 MW).
~65% of employment data is suppressed. BLS withholds county-level QCEW data to protect employer confidentiality. We impute from state totals proportional to MW capacity.
Behind-the-meter gas is invisible. An estimated 56 GW of on-site natural gas generation is not captured by any federal energy dataset.
Water consumption is now estimated from cooling type. V9 uses WUE benchmarks (Siddik et al. 2021/2022) to estimate gallons per MWh by cooling method. Meta’s reported per-facility water data validates the approach for 15 US facilities. However, cooling method is unknown for most facilities, and the national weighted average (0.3 L/kWh per Siddik et al. 2021) is used as default.
419 of 670 counties lack precise facility coordinates. Most facilities plot at county centroids. Individual addresses are not in public datasets.
425 facilities appear as both operating and proposed. FracTracker and V8 source data overlap. Deduplication is ongoing.
The State Regulatory Index covers 42 of 50 states. Eight states lack sufficient data for all six transparency variables.

Methodological limitations

Peer review identified the following structural limitations. We disclose them here because transparency is a core value of this project.

Composite weights are policy judgments. The weights that combine water, energy, and land into the Resource Burden Score (0.4/0.35/0.25) and the weights that combine RBS, ROS, and ERS into the composite (0.5/0.3/0.2) reflect the analyst’s judgment about relative importance. They are not derived from the data. PCA validation shows the resulting rankings are robust (ρ > 0.96 vs. data-driven weights), but users should explore the entropy-weighted toggle for a purely data-driven alternative.
Water stress is not water consumption. WRI Aqueduct measures basin-level water stress — the ratio of withdrawals to available supply — not facility-specific water use. A data center with closed-loop cooling in a stressed basin scores the same as an evaporative facility. Facility-level gallons-per-MWh estimates by cooling type are planned for V9.
Economic data is heavily imputed. BLS suppresses ~65% of county-level employment data to protect employer confidentiality. The Economic Return Score for most counties is modeled from state totals, not observed. Counties with high imputation are flagged as Tier C data quality — interpret their ERS with caution.
Regulatory scoring has a single rater. The State Regulatory Index was scored by one analyst-AI team. An inter-rater reliability pilot (3 raters, 10 states, Cohen’s Weighted Kappa target ≥ 0.60) is planned but not yet complete. Until validated, SRI scores should be treated as preliminary.
Letter grades are relative rankings. The A-through-F grades use percentile thresholds, which means exactly 20% of counties fall into each tier by definition. An “A” means better than 80% of DC-hosting counties, not that a county has no data center impact. There are no absolute “safe” thresholds.
No external validation yet. The DCSI has not been compared against established environmental justice indices (EJScreen, SVI). This comparison is planned for V9 to demonstrate whether the DCSI captures impacts that existing tools miss.

Explore the dashboard

Each tab focuses on a different dimension of the data center landscape:

DCSI National
Interactive national map with county-level stress scores, filtering by ownership, classification, and capacity type. Includes Sankey ownership flows, burden breakdowns, and scatter analysis.
DCSI State
A one-stop-shop for any state: governor, research summary, county-level map, facility table, congressional districts, and ownership breakdown. Toggle between plain language, technical, and policy brief.
Policy Impact
Simulate how 12 real legislative proposals would change county stress scores. See proposed data centers on the map and explore what each would mean for local communities.
Methodology
Every formula, every weight, every assumption — fully disclosed. Includes PCA validation, entropy weighting sensitivity analysis, and Monte Carlo error propagation methodology.
Data Sources
Complete inventory of all 19 public data sources with file details, coverage statistics, and version information.
AI Accountability
Every AI-introduced error caught by the analyst during development. Severity, category, fix status, and the lesson: the analyst never abdicated to the machine.
Built in public. Under $500. All data free.
A project by Anna R. Dudley
DCSI National Overview

32 counties show high resource burden with below-average economic return

The Data Center Stress Index (DCSI) is a county-level composite ranking of U.S. counties hosting data centers. It measures three dimensions: the resource burden each facility places on local water, energy, and land; the regulatory opacity surrounding permitting and operations; and the economic return the community receives in jobs and wages per megawatt of installed capacity. Use this dashboard to explore which communities bear the heaviest costs — and which receive the least in return.

670
Counties with facilities
3,954
Facilities tracked
--
Est. TWh/yr consumed
--
Est. billion gal/yr water
754
Proposed / Under Review
45
Foreign HQ flagged
Priority
View
Filter
Resource Burden ranks counties by the combined strain data centers place on water supply (WRI Aqueduct), energy grid (EIA-923), and land use (PNNL/USGS). Economic Return flips the lens to rank by local jobs and wages generated per MW of capacity (BLS QCEW).
Scores shows the raw composite index for each county. Hot-Spot Clusters highlights geographic concentrations of high-stress counties using spatial autocorrelation (Moran's I). Entropy Weights lets the data determine component weights instead of fixed formulas.
2026
Facilities: 4,502Avg CSI: --Proposed: 658
Filtered by:
Clear all filters
Composite Stress Index
Low concernHigh concern
Layers
Commercial Government Academic
Zoom in to see individual data centers

Interactive choropleth showing the Composite Stress Index for U.S. counties with data center facilities. Green = low stress, red = high stress. Sourced from PNNL, FracTracker, Epoch AI, WRI Aqueduct, EIA-923, and BLS. Scroll to zoom.

We have provided the most accurate information we had access to. See the Data Sources tab.
Ownership to Stress Flow
Click any node to filter the dashboard

This Sankey diagram traces who owns data center capacity, where that capacity is headquartered, and how it maps to resource stress tiers. Each flow line represents MW capacity moving from a corporate operator through a headquarters country to a stress grade tier. Click any corporation or country node to cross-filter every chart on the dashboard.

Ownership data sourced from PNNL IM3 Atlas and Epoch AI. Facility-to-operator mapping may be incomplete for smaller or privately held operators. See the Data Sources tab for details.
Showing US-headquartered flows only
Facility Ownership
Click a segment to filter

Market share of U.S. data center capacity by parent company headquarters country or corporate operator. Foreign-owned facilities may face additional CFIUS scrutiny. Data sourced from PNNL IM3 Atlas and Epoch AI ownership records.

Ownership attribution reflects the best available public records. Some facilities are owned through subsidiaries or holding companies that may obscure the ultimate parent. See the Data Sources tab.
Resource Burden Breakdown
Click a state to see the math and filter

Each bar decomposes a state's aggregate resource burden into water stress, grid load, and land use components, the three pillars of the RBS formula (40/35/25 weighting). Water stress comes from WRI Aqueduct 4.0, grid load from EIA-923 plant-level generation, and land use from PNNL facility footprints over USGS NLCD developed area. Click any state to see the raw data and calculation.

Scores are normalized to a 0-to-1 scale. Underlying data quality varies by county and source. See the Data Sources tab for provider details.
Stress by Congressional District
Ranked by composite stress score

Districts ranked by composite stress index. This view connects resource burden to political accountability. Every district bar maps to a specific representative who can champion or block data center policy. District boundaries from Census TIGER/Line shapefiles, legislators from the @unitedstates project and Open States.

District-to-county mapping uses spatial overlay with a 1% area threshold to filter boundary slivers. See the Data Sources tab.
Economic Return Leaders
Counties with highest Jobs + Wages per MW

Counties delivering the most local economic value per megawatt of data center capacity. High ERS counties demonstrate that data centers can generate meaningful employment. Employment and wage data sourced from BLS Quarterly Census of Employment and Wages (NAICS 518210). MW capacity from PNNL, Epoch AI, FracTracker, and hand-researched overrides.

BLS suppresses employment data in counties with fewer than three reporting establishments to protect confidentiality. See the Data Sources tab.
Burden vs. Return: Which Counties Are Getting a Raw Deal?
Counties in the top-left quadrant bear the highest resource burden with the lowest economic return
Scroll to zoom / drag to pan / double-click to reset

Reading this chart: Each bubble is a county hosting data center facilities. Bubble size reflects total MW capacity.

The top-left quadrant is the danger zone: high resource burden (water, energy, land) paired with low economic return (few jobs, low wages per MW). These counties subsidize the digital economy without proportional benefit.

The bottom-right quadrant is the sweet spot: modest resource consumption alongside meaningful local employment and wages.

Click any county bubble to see its full scorecard, elected officials, and facility-level breakdown.

Scatter plot of Resource Burden Score (x-axis) vs. Economic Return Score (y-axis) for all counties with data center facilities. RBS is derived from WRI Aqueduct water stress, EIA-923 energy data, and USGS land cover. ERS is derived from BLS QCEW employment and wage data. Bubble color reflects the Composite Stress Index. Use your scroll wheel to zoom into dense clusters.

Scores reflect the best available public data. Some values are estimated where direct measurements are unavailable. See the Data Sources tab for complete sourcing.
DCSI State

Explore data center stress at the state level

Select a state to see its counties, facilities, ownership breakdown, resource burden, and economic return. All charts below filter to the selected state.

--
Counties
--
Facilities
--
Total MW
--
Proposed / Planned
--
Foreign HQ
--
Avg Stress Score
Governor
--
--
State Research Summary
Select a state to see a research summary.
Audience
Composite Stress Index
Low concernHigh concern
Facility Ownership
Within selected state
County Resource Burden
Water / Grid / Land breakdown per county
County Economic Return
Jobs + Wages per MW
Congressional Districts
Stress by district within state
Burden vs. Return
County-level scatter for selected state
Scroll to zoom / drag to pan / double-click to reset

Reading this chart: Each bubble is a county. Size reflects MW capacity.

The top-left quadrant = high burden, low return. The bottom-right = low burden, high return.

All Facilities
Select a state to see facilities

Policy Impact Lens

Select a policy proposal to see how it would change county stress scores. The map adjusts to show before/after impact. Proposed data centers are shown with dashed outlines. Use your scroll wheel to zoom into any region of the map.

Select a policy to see impact
Composite Stress Index
Low concernHigh concern
Existing Proposed
Policy impact scores are modeled estimates based on bill text and regulatory intent. They are not predictions. See the Data Sources tab for underlying data.
Score Impact Summary
How the selected policy changes county scores
Select a policy above to see a before/after analysis of how it would affect county stress scores, which counties move between tiers, and which proposed data centers would be blocked or modified.
Counties Most Affected
Top counties where scores change the most
State Regulatory Transparency (ROS)
How transparent is each state's data center permitting, environmental review, and tax incentive process?

The Regulatory Opacity Score (ROS) measures how much information states disclose about data center permitting, environmental review, utility rates, tax incentives, and ownership. Higher opacity means less public transparency. Scores are derived from a 6-variable State Regulatory Index across 42 DC-hosting jurisdictions. Click any bar to filter the dashboard to that state.

Proposed Data Centers
Facilities announced or under construction, not yet operational
Cancelled & Withdrawn Projects
Facilities that were cancelled, withdrawn, denied, or rejected — evidence that democratic process and community voice can shape outcomes
Follow the Money

Public Subsidies to Data Center Operators

$16.2 billion in disclosed tax incentives, abatements, and megadeals tracked by Good Jobs First’s Subsidy Tracker. Five companies — Amazon, Apple, Meta, Google, and Microsoft — account for 86% of all disclosed subsidy value. An additional 299 records have undisclosed amounts, meaning the true total is significantly higher.

$16.2B
Disclosed value
706
Subsidy records
86%
To top 5 firms
299
Undisclosed amounts
Top Recipients by Disclosed Value
Top States by Disclosed Value
Subsidy Awards by Year
Context: Indiana’s $8.5B total is dominated by a single Amazon megadeal ($8.3B in 2024). Washington’s 258 records reflect the state’s long-running data center tax incentive program. Virginia, despite hosting more data centers than any other state, shows only $141M in disclosed subsidies — likely reflecting the use of NDAs and nondisclosure agreements that prevent public reporting. The 299 undisclosed-value records suggest significant subsidy activity that is invisible to the public.
Source: Good Jobs First Subsidy Tracker. Full-text search for “data center” across 722,000 records. Some records may include non-DC companies with data center mentions in program notes. State-by-state comparisons are limited by uneven disclosure practices. As noted by Good Jobs First: “Due to uneven disclosure, it is NOT appropriate to make state-by-state comparisons.” Data current through November 2025.

Methodology

How the DCSI is calculated, what data it uses, and where human judgment enters.

The Analytical Question

For every U.S. county hosting a data center: what resource burden is that facility placing on the community, and what economic return is the community receiving?

Score Architecture

Resource Burden Score (RBS)

RBS = 0.4 x Water + 0.35 x Energy + 0.25 x Land

Percentile-rank normalization across all 670 DC-hosting counties. Water draws on WRI Aqueduct 4.0 baseline water stress with seasonal indicators. Energy is estimated facility-level consumption (MW × utilization × 8,760 hours) following LBNL methodology, supplemented with EIA-923 generation where local power plants exist. Land is facility footprint as a share of county land area, with MW-derived estimates for counties missing observed acreage.

Regulatory Opacity Score (ROS)

County ROS = State Regulatory Index x (1 + Pushback Modifier + COI Modifier)

6-variable State Regulatory Index across 42 DC-hosting jurisdictions: environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure. Each variable scored 0–3 against statute citations. Community pushback applies a −10% modifier; the Company Opacity Index applies up to a −5% modifier where local operators are demonstrably more transparent.

Economic Return Score (ERS)

ERS = 0.5 x Jobs_per_MW + 0.5 x Wage_Premium

Wage Premium is data center industry average pay divided by county all-industry average pay. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals. Counties are tagged Tier A (high confidence), B (moderate), or C (imputed inputs). Mean wage premium across DC counties: ~1.9x.

Composites

Burden Mode: 0.5 x RBS + 0.3 x ROS + 0.2 x (1 - ERS)
Return Mode: 0.5 x (1 - ERS) + 0.3 x RBS + 0.2 x ROS

Company Opacity Index (COI)

12-indicator binary scoring of corporate transparency across 142 data center companies. Indicators span environmental reporting, energy disclosure, water disclosure, PUE reporting, renewable targets, cooling technology, community engagement, tax incentive disclosure, beneficial ownership, supply chain transparency, third-party audits, and incident reporting.

COI = 1.0 − (indicators_disclosed / 12)

Scale: 0 = fully transparent, 1 = fully opaque. Tiers: Transparent (≤0.25), Open (0.26–0.50), Partial (0.51–0.65), Opaque (0.66–0.80), Dark (>0.80). County-level COI is capacity-weighted (larger facilities contribute more) and feeds the ROS calculation as a transparency credit of up to −5%.

Energy Efficiency Score (EES)

Facility-level composite derived from PUE, grid dependency, cooling impact, backup emissions, and a transparency penalty. Quality flags: measured (real data for 3+ components), partial (1–2 real data points), insufficient (fully imputed — default 0.593).

EES = f(PUE_efficiency, grid_dependency, cooling_impact, backup_emissions) × (1 − transparency_penalty)

Higher EES = more efficient. EES is currently displayed as an informational overlay and is not factored into the composite CSI; integration into RBS Energy is under evaluation.

Data Quality Tiers

Every county receives a data quality tier reflecting how much of its scoring input comes from observed source data versus estimates or imputations.

Tier A: ≥3 of 4 dimensions observed  |  Tier B: 2 of 4 observed  |  Tier C: ≤1 observed

The four dimensions assessed are: (1) water stress (observed via Aqueduct vs. state-median imputed), (2) energy (EIA-923 supply data vs. MW demand proxy), (3) land (source acreage vs. MW-estimated), and (4) employment (BLS QCEW observed vs. state-residual imputed). Counties with Tier C data have the majority of their score driven by estimates and should be interpreted with caution. The tier is displayed in the county tooltip, scorecard, and sidebar.

Spatial Autocorrelation (Moran's I)

Measures whether data center stress clusters geographically or distributes randomly. A positive Moran's I indicates clustering: counties near high-stress counties tend to also be high-stress, suggesting regional infrastructure strain rather than isolated incidents. Computed on the CSI values using queen contiguity weights from the Census TIGER county shapefile.

I = (N / W) x (Sum_ij w_ij(x_i - x_bar)(x_j - x_bar)) / (Sum_i(x_i - x_bar)^2)

Local Moran's I (LISA) identifies specific hot-spot and cold-spot clusters. Counties flagged as hot spots (High-High) are surrounded by other high-stress counties. These are the regional pressure zones where infrastructure strain compounds.

Entropy-Weighted Scoring

Rather than fixed weights (0.4/0.35/0.25), entropy weighting lets the data determine how much each component contributes to the composite score. Components with more variation across counties receive higher weights; components where all counties score similarly receive lower weights. This prevents a uniformly high-stress component from dominating the index without adding discriminatory power.

w_j = (1 - E_j) / Sum(1 - E_k), where E_j = -Sum p_ij ln(p_ij) / ln(n)

Both fixed and entropy-weighted composites are computed. The dashboard defaults to fixed weights (which are more interpretable for policy audiences) but allows toggling to entropy-weighted for analytical rigor.

Granger Causality Testing

Tests whether data center facility announcements (from FracTracker timeline data) Granger-cause changes in county-level water stress scores (from Aqueduct monthly data). If past facility announcements help predict future water stress beyond what water stress's own history predicts, this provides statistical evidence of a causal link between data center expansion and resource degradation.

F-test: H0: DC announcements do not Granger-cause water stress changes

Applied county-by-county where sufficient time-series data exists (2015 to 2026). Results reported as significant/not significant with lag selection via AIC.

Grading Methodology — Percentile Thresholds

Counties receive letter grades A through F based on their percentile rank within all 670 data-center-hosting counties:

A = 0–20th percentile  |  B = 20–40th  |  C = 40–60th  |  D = 60–80th  |  F = 80–100th

Percentile grading is a relative ranking, not an absolute threshold. An “A” grade means a county has lower composite stress than 80% of DC-hosting counties — it does not mean the county experiences no resource burden. Tier C data quality counties (where ≤1 of 4 scoring dimensions uses observed data) should be interpreted with particular caution; their grades are driven primarily by estimates and imputations.

What This Tool Does Not Do

Does not rank counties without facilities. Does not make causal claims. Does not generate regulatory findings. Informs judgment; it does not replace it.

Methodology Updates & Known Limitations

A running ledger of the choices, validations, and trade-offs that shaped each version of the index. Kept here so the methodology body stays focused on what the score is, not what it has been.

Weights are policy-weighted, not empirically derived

The primary weights (0.4/0.35/0.25 for RBS; 0.5/0.3/0.2 for CSI) reflect the analyst’s judgment about relative importance. PCA, entropy, and equal-weight validation runs all produce rankings highly correlated with the policy-weighted version (Spearman ρ > 0.96), but the choice of weights is ultimately a value judgment. The dashboard’s entropy-weighted toggle exposes a purely data-driven alternative.

V9 Energy — LBNL methodology

Estimated facility-level consumption uses LBNL methodology (MW × utilization × 8,760 hours). Utilization defaults to 50% per LBNL 2024 national average, adjusted by facility type (hyperscale 58%, colocation 50%, enterprise 43%). PUE sourced from operator reports where available (Google per-campus, Meta fleet 1.08, AWS per-region, Microsoft 1.16) or estimated from facility type. Replaces V8’s 100% utilization with flat PUE 1.3. EIA-923 generation data supplemented with an MW-based demand proxy for 89 counties with data centers but no local power plants. Renewable energy credit reduces energy burden for facilities reporting above-median renewable sourcing (up to −10%).

V8 Land — MW-derived acreage estimates

For 247 counties missing facility footprint data, land use is estimated using an industry heuristic of ~20 acres per 100 MW. Source acreage from PNNL and FracTracker is preferred where available.

V8 ROS — Single-rater scoring

All 6 regulatory variables across 42 jurisdictions were scored by a single rater (anna_claude). Inter-rater reliability pilot (Cohen’s Weighted Kappa > 0.60 per variable) deferred. Cronbach’s Alpha was dropped as a gate because the six variables are formative (independent dimensions), not reflective.

V8 ROS — COI integration

Company Opacity Index now acts as a transparency modifier on ROS: ROS × (1 + pushback_mod + COI_mod) where COI_mod = −0.05 × (1 − county_avg_coi). Counties whose operators are demonstrably more transparent receive up to a 5% reduction in regulatory opacity, recognizing that corporate transparency partially compensates for weak state regulation.

EES is informational, not part of CSI

The Energy Efficiency Score is currently displayed as an overlay only. It is not factored into the composite CSI. Integration into the RBS Energy sub-score is under evaluation for a future version.

CUSUM removed in V8

CUSUM change-point detection was removed because the available Aqueduct data covers a single year of seasonal variation, not a multi-year time series. Running CUSUM on 12 monthly values detected seasonal patterns (summer vs. winter), not genuine acceleration of water stress. The 38% flag rate confirmed this was noise, not signal. CUSUM has been replaced by the data quality tier system and facility-level mapping.

Grading is relative, not absolute

Percentile grading means exactly 20% of counties will always fall into each tier. There are no empirically validated “safe” or “critical” stress levels for data center hosting. Natural-breaks (Jenks) classification is under evaluation as a potential alternative.

V7 ERS — Wage Premium replaces raw wages

ERS now uses Wage Premium (DC industry average pay divided by county all-industry average pay) instead of raw wages_per_mw. Mean wage premium across DC counties is approximately 1.9x. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals with Tier A/B/C confidence flags.

Reference

Data Sources

Last source scan: April 13, 2026  ·  5 sources have new data ready for integration (FracTracker, Epoch AI, PNNL IM3 projections, EIA-923 January 2026, BLS QCEW Q3 2025).  1 source at risk: EPA GHGRP proposed rule to terminate the program entirely.
PNNL IM3 Data Center Atlas
FreeUpdated Feb 5, 2026
1,479 U.S. data center facility locations with county FIPS, coordinates, operator, and square footage. February 2026 update adds projected data center locations alongside existing sites, with a growth-projection component based on fiber, electricity, and water infrastructure analysis — a new forecasting layer not present in earlier versions.
Source: MSD Live / PNNL (im3.pnnl.gov/datacenter-atlas)Join key: state_id + county_id to FIPSFiles: 5Download
Epoch AI Data Center Dataset
FreeUpdated Apr 9–10, 2026
Frontier AI data centers with verified MW capacity, H100-equivalent GPU counts (2.5M total, 13 US sites). Both the Frontier Data Centers CSV and Frontier Data Center Timelines CSV were refreshed April 9–10, 2026. Five AI data centers are projected to hit 1 GW capacity in 2026, capital cost, and construction timelines. Tracks the largest AI compute installations including xAI Colossus (425 MW), OpenAI Stargate Abilene (590 MW), Anthropic-Amazon New Carlisle (1,092 MW), and Meta Prometheus (695 MW). Tier 1 source for MW corrections.
Source: epoch.ai (CC-BY licensed)Join key: Facility name + address geocodingFiles: 5 (centers, timelines, chillers, cooling towers, ZIP)Download
EIA-923 Power Plant Operations
FreeJanuary 2026 monthly available
Plant-level electricity generation, fuel consumption, and environmental data. January 2026 monthly release published March 24, 2026. Annual 2024 final data released September 2025; 2025 annual expected June 2026. Covers 2015 to 2026 across three schedule types for time-series energy analysis.
Source: eia.govJoin key: Plant ID to EIA-860 to CountyFiles: 32Download
EIA-860 Plant Location Crosswalk
Free
Maps every power plant to its county, latitude, longitude, and utility. Essential join table connecting EIA-923 generation data to county-level geography.
Source: eia.govJoin key: Plant Code to CountyFiles: 13Download
WRI Aqueduct 4.0 Water Risk Atlas
Free
Global water stress scores: baseline annual, baseline monthly (for CUSUM velocity detection), and future projections (2030/2050/2080 scenarios).
Source: World Resources InstituteJoin key: Spatial overlay to County FIPSFiles: 3 CSV + GDBDownload
BLS QCEW: NAICS 518210 & 5182
FreeQ3 2025 available
County-level employment and wage data for data processing/hosting (518210) and broader parent code (5182). Q3 2025 release (March 10, 2026) now available. Q4 2025 scheduled for June 2, 2026. Covers 2020 to Q3 2025 with all ownership types.
Source: bls.gov/cewJoin key: area_fipsFiles: 12Download
EPA FLIGHT: Emissions by Unit
Free⚠ AT RISK
Critical risk: EPA has proposed a rule to end the GHGRP entirely. If finalized, this data source — a foundational input to the Regulatory Opacity Score's diesel generator component — would disappear permanently. The 2025 reporting deadline has also been extended from March 31 to October 30, 2026, delaying the next data release. No facility-level data has been published since the last cycle. Monitor closely for DCSI implications. Unit-level greenhouse gas emissions including diesel generator hours from Subparts C, D, and AA.
Source: EPA GHGRPJoin key: Facility to CountyFiles: 1 (.xlsb)Download
FracTracker National DC Tracker
FreeNew dashboard — April 7, 2026
FracTracker launched a redesigned "Open U.S. Data Centers Tracker Dashboard" on April 7, 2026 with new data on project status, scale, and impacts on energy systems and communities. Covers 1,446+ facilities with status (657 proposed, 529 operating, 119 approved/under construction, 52 expanding, 46 suspended, 43 cancelled). Includes community pushback documentation (172 facilities flagged), NDA tracking, MW capacity, cooling type, and power source. April 2026 dashboard adds 44 new fields including resistance status, advocacy information, and source URLs.
Source: FracTracker Alliance / ArcGISJoin key: County + lat/lonFiles: 4 (main CSV + PEC Virginia + Sci4GA layers)Download
USGS NLCD Land Cover
Free
National Land Cover Database raster (1985 to 2023) for land cover change analysis. Requires zonal statistics processing with county shapefile to extract developed-land area per county.
Source: USGS / ScienceBaseJoin key: Raster to County polygon overlayFiles: 5 (992 MB TIF)Download
Census TIGER County Shapefile
Free
2025 county boundary polygons for all 3,200+ U.S. counties. Used for spatial joins (Aqueduct to County, NLCD to County).
Source: Census BureauJoin key: GEOID (FIPS)Files: 7Download
Congressional & State Legislators
Free
Federal legislators from @unitedstates project. State legislators from Open States (51 state files). Used for "Contact Your Rep" feature in county modal.
Source: github.com/unitedstates + Open StatesFiles: 52Download
MW Capacity Overrides (Web Research)
Curated
Hand-researched MW power capacity corrections for facilities where the default square-footage estimate diverges significantly from publicly reported values. Sources include utility interconnection filings, operator press releases, engineering reports (PASE, DPR), and data center tracking databases (Baxtel, Aterio, interconnection.fyi). This override layer exists because campus-level square footage often includes non-IT space (offices, cooling infrastructure, parking), causing the standard 6 MW/100K sqft conversion to overestimate by 3 to 20 times for hyperscale facilities.
Source: Multiple (utility filings, operator sites, trade press)Join key: Facility name + State + CountyFiles: mw_overrides_v2.csv (97 entries) + mw_web_research_v2.csv (452 entries)Priority: Overrides sqft-derived estimates in pipeline
Ownership Affiliation Overrides (Web Research)
Curated
Hand-researched ownership corrections for 228 data center facilities where the original source listed operators as "Unknown" or "Other." Identifies the actual corporate operator, headquarters country, and flags foreign-owned entities. Sources include company press releases, SEC filings, utility interconnection records, and industry databases. Foreign-HQ operators identified include NTT (Japan), Cologix (Canada), Nebius/ex-Yandex (Netherlands), EdgeConneX/EQT (Sweden), Eneus Energy (UK), and MineOne (China).
Source: Multiple (company filings, press releases, industry databases)Join key: Facility name + State + CountyFiles: 1 (ownership_overrides.csv)Priority: Overrides "Unknown" operators in pipeline
Proposed Data Centers Database (Curated)
Curated
Comprehensive dataset of 754 proposed, approved, and under-construction data center facilities in the United States. Includes power source identification (grid, solar, nuclear, natural gas), expected completion dates, MW capacity, operator, and community impact notes. Nuclear-powered facilities are flagged for the Policy Impact Lens toggle. Derived from FracTracker Alliance data enriched with web research.
Source: FracTracker Alliance + web researchJoin key: Facility name + StateFiles: 1 (proposed_data_centers.csv)Nuclear flagged: 3 facilities
PNNL Sqft Supplement
Curated
Fills missing square footage values for 57 PNNL facilities using public filings, satellite imagery measurements, and operator disclosures. Enables MW estimation where only building footprints were available.
Files: pnnl_sqft_supplement.csv (57 entries)Priority: Fills gaps in PNNL Atlas
PNNL Exclusion List
Curated
Flags 9 PNNL entries that are not commercial data centers (government HPC, university research computing, or misidentified facilities). Government/academic facilities are excluded from ROS and ERS scoring but retained in RBS.
Files: pnnl_exclusion_list.csv (9 entries)Impact: Classification-based scoring exclusions
State Regulatory Index (V7)
Curated
6-variable state-level transparency index for 42 DC-hosting jurisdictions. Variables: permit_transparency, environmental_review, energy_disclosure, water_disclosure, tax_incentive_accountability, ownership_disclosure. Scored 0–3 per variable with statute citations. Mean index 0.220, range 0.056–0.556. Pipeline integration pending inter-rater reliability pilot.
Source: State statutes, administrative codes, regulatory docketsFiles: state_regulatory_index.csv (42 entries)Documentation: ROS_SCORING_SUMMARY_TIER1.md (10 states), ROS_SCORING_SUMMARY_TIER2.md (32 states)
MW Cross-Validation Audit
Audit
18-facility cross-validation comparing our researched MW values against independent sources (utility filings, industry databases). Tracks match rates, discrepancies, and items requiring human review for data quality assurance.
Files: mw_cross_validation.csv (18 entries)Purpose: Data quality verification
V8 Unified Facility Dataset
V8 Primary
5,151 facilities with 43 columns. Merges PNNL Atlas + FracTracker with Company Opacity Index (142 companies), Energy Efficiency Scores, sustainability traits (power_source, renewable_pct, cooling_method), and data quality tiers (T1/T2/T3). Enriched with company sustainability reports, utility filings, and industry databases. 14 facility misattributions and 288 county-state mismatches corrected.
Files: v8/facility_traits_merged.csv (5,151 × 43)Coverage: 670 counties, 142 companies scoredKey metrics: Mean COI 0.744, Mean EES 0.547
V9 Energy Estimation Sources
V9.1
Facility-level electricity consumption estimates using LBNL 2024 utilization factors (50% national average, adjusted by type: hyperscale 58%, colocation 50%, enterprise 43%). PUE sourced from operator reports: Google per-campus (16 US sites, TTM PUE 1.04–1.14), Meta fleet (1.08, plus 15 facilities with real 2024 MWh data), AWS per-region (3 US regions, PUE 1.12–1.15), Microsoft (1.16). EIA-861 county commercial demand used for validation.
Source: LBNL 2024, Google/Meta/AWS sustainability reports, EIA-861Files: 6 CSV (google_pue, aws_pue_wue, meta_facility, utilization_factors, water_benchmarks, eia861_county)
V9 Water Estimation Sources
V9.1
Facility-level water consumption estimates using WUE benchmarks by cooling type: evaporative towers 1.8 L/kWh (475 gal/MWh), hybrid 0.9 L/kWh, direct evaporative 0.2 L/kWh, air-cooled 0. Based on Siddik et al. 2021/2022 and Uptime Institute benchmarks. 10 Meta facilities use real 2024 reported water data instead of estimates.
Source: Siddik et al. 2021, Uptime Institute, Meta 2024 Sustainability ReportFiles: water_consumption_benchmarks.csv
Good Jobs First Subsidy Tracker
FreeNew
706 data center subsidy records across all U.S. states. $16.2 billion in disclosed tax credits, abatements, and megadeals. Top recipients: Amazon ($9.0B), Apple ($1.5B), Meta ($1.5B), Google ($1.4B), Microsoft ($684M). Covers 2000–2025 subsidy awards including property tax abatements, sales tax exemptions, and infrastructure grants.
Source: Good Jobs First / Subsidy TrackerJoin key: State + Parent CompanyRecords: 706 (407 with disclosed values)View
Total: 22 sources · All free and public

About This Project

Intelligence-grade analysis. Built in public. Under $500. The analyst never abdicated to the machine.

The Data Center Stress Index answers a question that should be simple: when a data center moves into a county, what does that county actually get in return?

Every county in the United States that hosts a data center is scored across three dimensions: the resource burden it bears (water consumption, grid load, and land use), the transparency of its regulatory environment (permitting visibility and diesel generator reliance), and the economic return it receives (local jobs and wages per megawatt of installed capacity). These three scores combine into a single composite index that ranks counties on a letter-grade scale from A (low burden, high return) to F (high burden, low return).

How It Works

The index draws from 22 public data sources, all free, totaling more than 1 GB of raw data. The V9.1 facility dataset (4,670 facilities across 50+ variables) merges PNNL, FracTracker (April 2026), and Epoch AI (April 2026) records with company-level transparency, efficiency metrics, and estimated energy and water consumption. Water stress scores come from the World Resources Institute's Aqueduct 4.0 atlas, spatially joined to county boundaries using Census TIGER shapefiles. Energy burden comes from the EIA-923 plant-level generation reports, crosswalked to counties through EIA-860 plant locations. Land use intensity is calculated from PNNL facility footprints divided by county total land area (Census TIGER ALAND). Employment and wage data come from the Bureau of Labor Statistics Quarterly Census of Employment and Wages (NAICS 518210 and all-industry), and regulatory opacity is derived from a 6-variable State Regulatory Index covering 42 jurisdictions. Foreign ownership attribution uses a structured field from the V8 facility dataset, covering 191 foreign-owned facilities across 14 countries.

Each component is normalized to a 0-to-1 scale and combined using expert weights validated against three alternative weighting schemes (PCA-derived, entropy, and equal-weight). The Resource Burden Score uses 40% water, 35% energy, and 25% land; PCA validation confirms water and energy dominate variance. The Regulatory Opacity Score uses a 6-variable State Regulatory Index covering environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure across 42 jurisdictions, modulated by a community pushback modifier. The Economic Return Score combines jobs per MW and a wage premium ratio (data center pay vs. county all-industry average), with Monte Carlo error propagation producing 90% credible intervals and confidence tiers. The composite index blends these three scores (50% burden, 30% opacity, 20% inverted return) to produce a single ranking. V8 adds a Company Opacity Index (12-indicator corporate transparency) and Energy Efficiency Score (PUE, grid dependency, cooling impact) at the facility level.

Advanced Techniques

Beyond the core index, the tool applies several analytical methods to surface patterns that a simple ranking would miss. CUSUM (cumulative sum) change-point detection flags counties where water stress is accelerating rather than stable. Moran's I spatial autocorrelation identifies geographic clusters of high-stress counties, revealing regional infrastructure strain rather than isolated incidents. Entropy-weighted scoring offers a data-driven alternative to fixed weights, giving more influence to components that vary most across counties. And Granger causality testing examines whether data center facility announcements statistically predict future changes in local water stress.

Policy Scenarios

The Policy Impact Lens lets users simulate the effect of 12 real legislative proposals on county stress scores. Each policy is decomposed into the specific formula parameters it would affect, with percentage adjustments derived from bill text and regulatory intent. The policies span the full spectrum from federal moratoriums that would halt all new construction to executive orders that accelerate permitting on federal land. This is not prediction; it is structured scenario analysis designed to inform advocacy, journalism, and legislative debate.

What This Tool Does Not Do

The DCSI does not rank counties that have no data center facilities. It does not make causal claims about whether data centers caused resource degradation. It does not generate regulatory findings or legal conclusions. It informs judgment; it does not replace it. Every data source is public, every formula is disclosed, and every assumption is documented in the Methodology page.

About the Author

The Data Center Stress Index is a project by Anna R. Dudley. The entire tool was built using publicly available data for under $500. No proprietary datasets. No gated APIs. No corporate sponsorship. The purpose is to put the same analytical capability in the hands of local officials, community advocates, and journalists that the industry's own lobbyists already have.

For questions, speaking inquiries, or data requests, contact Anna R. Dudley. Power moves before policy does.

AI Accountability
Dedication

This page is dedicated to my dear friend and mentor, J.R. Four years ago, sitting in an old basement, you taught me how to set up my first virtual machine and how to install packages. In this new age of AI, you continue to teach the entire floor about the importance of never abdicating to the machine. Let AI work for you. Don’t let it think for you.

Errors Caught by the Analyst

Every score, label, formula and visualization in this dashboard was reviewed by a human before it shipped. The errors logged below were caught during that review — some by Anna, some by collaborators, none by automated tests. They are kept in public for the same reason climate scientists publish their model uncertainty: the only way to trust a number is to know how it was made.

If you find a number on this dashboard that looks wrong, that is the experience this page is supposed to make possible. Tell us, and you will be added to the log.

--
Total Errors Logged
--
Critical Severity
--
High Severity
--
Fixed
--
Open / Partial
What Counts as an Error

An “error” in this log is anything an AI assistant produced that would have shipped wrong if a human hadn’t caught it. That includes:

Scoring bugs
Wrong formulas, double-counted variables, normalization that destroyed the distribution, weights that didn’t sum to 1.
Data joins
FIPS code mismatches, county name collisions, units that weren’t converted, suppressed BLS values silently treated as zero.
Hallucinated facts
Plausible-sounding citations that didn’t exist, fabricated wage figures, statute numbers that weren’t real.
Visualization lies
Color scales that exaggerated, axes that were truncated without disclosure, sample counties presented as the full dataset.
Statistical overreach
CUSUM run on seasonal data, Granger causality reported without checking sample size, p-values without corrections.
Silent failures
Rows dropped from joins without warning, encoding errors that produced empty fields, defaults that masked missing data.
Errors by Severity
Errors by Category
Error Discovery Timeline
How These Errors Are Caught
  1. Read every output. Every script that produces a number is run, the output is opened, and at least one row is checked against the source data by hand. If the number can’t be reproduced from the source, it doesn’t ship.
  2. Spot-check known counties. Loudoun, Maricopa, Dallas, Santa Clara, Quincy — counties with publicly reported MW or water draw — are used as anchors. If the dashboard says Loudoun has 50 MW, something has gone wrong.
  3. Distribution sanity checks. Histograms of every score before and after normalization. If the distribution is flat, bimodal, or all values pile at 0/1, that’s a normalization bug.
  4. Cite-or-cut. Any factual claim in the methodology, error log, or county profile must trace back to a public source. Anything that can’t cite gets cut.
  5. Public log. Errors are written down here, in public, before they are fixed. The log is the receipt.
Why Public Error Logs Matter

Most data dashboards are presented as finished objects. The methodology page tells you the formula, the source, the date — and asks you to trust that the implementation matched. AI-assisted analysis breaks that assumption. Code that looks right can be wrong in ways that are completely invisible until you check the answer against reality.

Listing what went wrong — including the things that were embarrassing or that took weeks to find — is the only honest way to show the work. If a county’s grade changes between versions, the reason should be readable. If a formula was wrong for three weeks, the fact that it was wrong should be on the same page as the corrected number. That’s the contract.

The Full Log

Each entry includes the error ID, severity, status, the description of what was wrong, and how it was fixed. Sorted newest first. Filterable by status above.