Every new data center consumes water, draws power from the grid, and occupies land. But the communities hosting them rarely have the data to evaluate whether they are getting a fair deal. This tool changes that.
Every U.S. county hosting a data center receives four scores, each normalized on a 0–1 scale:
Measures the combined strain on local water, energy, and land. Higher = more burden on community resources. Weighted: 40% water, 35% energy, 25% land.
Measures how transparent the state regulatory environment is. Higher = less accountability in permitting, tax incentives, and environmental review.
Measures what the community gets back in local jobs and wages per megawatt of installed capacity. Higher = better return. Uses Monte Carlo simulation for confidence intervals.
Blends all three scores: 50% burden, 30% opacity, 20% inverted return. Counties receive letter grades A through F. This is the single number that ranks communities by overall data center stress.
The entire index is built from 19 public data sources, all free. No proprietary datasets. No gated APIs. Total cost: under $500.
Transparency requires honesty about gaps. Here is what we cannot yet measure:
Peer review identified the following structural limitations. We disclose them here because transparency is a core value of this project.
Each tab focuses on a different dimension of the data center landscape:
The Data Center Stress Index (DCSI) is a county-level composite ranking of U.S. counties hosting data centers. It measures three dimensions: the resource burden each facility places on local water, energy, and land; the regulatory opacity surrounding permitting and operations; and the economic return the community receives in jobs and wages per megawatt of installed capacity. Use this dashboard to explore which communities bear the heaviest costs — and which receive the least in return.
Interactive choropleth showing the Composite Stress Index for U.S. counties with data center facilities. Green = low stress, red = high stress. Sourced from PNNL, FracTracker, Epoch AI, WRI Aqueduct, EIA-923, and BLS. Scroll to zoom.
This Sankey diagram traces who owns data center capacity, where that capacity is headquartered, and how it maps to resource stress tiers. Each flow line represents MW capacity moving from a corporate operator through a headquarters country to a stress grade tier. Click any corporation or country node to cross-filter every chart on the dashboard.
Market share of U.S. data center capacity by parent company headquarters country or corporate operator. Foreign-owned facilities may face additional CFIUS scrutiny. Data sourced from PNNL IM3 Atlas and Epoch AI ownership records.
Each bar decomposes a state's aggregate resource burden into water stress, grid load, and land use components, the three pillars of the RBS formula (40/35/25 weighting). Water stress comes from WRI Aqueduct 4.0, grid load from EIA-923 plant-level generation, and land use from PNNL facility footprints over USGS NLCD developed area. Click any state to see the raw data and calculation.
Districts ranked by composite stress index. This view connects resource burden to political accountability. Every district bar maps to a specific representative who can champion or block data center policy. District boundaries from Census TIGER/Line shapefiles, legislators from the @unitedstates project and Open States.
Counties delivering the most local economic value per megawatt of data center capacity. High ERS counties demonstrate that data centers can generate meaningful employment. Employment and wage data sourced from BLS Quarterly Census of Employment and Wages (NAICS 518210). MW capacity from PNNL, Epoch AI, FracTracker, and hand-researched overrides.
Reading this chart: Each bubble is a county hosting data center facilities. Bubble size reflects total MW capacity.
The top-left quadrant is the danger zone: high resource burden (water, energy, land) paired with low economic return (few jobs, low wages per MW). These counties subsidize the digital economy without proportional benefit.
The bottom-right quadrant is the sweet spot: modest resource consumption alongside meaningful local employment and wages.
Click any county bubble to see its full scorecard, elected officials, and facility-level breakdown.
Scatter plot of Resource Burden Score (x-axis) vs. Economic Return Score (y-axis) for all counties with data center facilities. RBS is derived from WRI Aqueduct water stress, EIA-923 energy data, and USGS land cover. ERS is derived from BLS QCEW employment and wage data. Bubble color reflects the Composite Stress Index. Use your scroll wheel to zoom into dense clusters.
Select a state to see its counties, facilities, ownership breakdown, resource burden, and economic return. All charts below filter to the selected state.
Reading this chart: Each bubble is a county. Size reflects MW capacity.
The top-left quadrant = high burden, low return. The bottom-right = low burden, high return.
Select a policy proposal to see how it would change county stress scores. The map adjusts to show before/after impact. Proposed data centers are shown with dashed outlines. Use your scroll wheel to zoom into any region of the map.
The Regulatory Opacity Score (ROS) measures how much information states disclose about data center permitting, environmental review, utility rates, tax incentives, and ownership. Higher opacity means less public transparency. Scores are derived from a 6-variable State Regulatory Index across 42 DC-hosting jurisdictions. Click any bar to filter the dashboard to that state.
$16.2 billion in disclosed tax incentives, abatements, and megadeals tracked by Good Jobs First’s Subsidy Tracker. Five companies — Amazon, Apple, Meta, Google, and Microsoft — account for 86% of all disclosed subsidy value. An additional 299 records have undisclosed amounts, meaning the true total is significantly higher.
How the DCSI is calculated, what data it uses, and where human judgment enters.
For every U.S. county hosting a data center: what resource burden is that facility placing on the community, and what economic return is the community receiving?
Percentile-rank normalization across all 670 DC-hosting counties. Water draws on WRI Aqueduct 4.0 baseline water stress with seasonal indicators. Energy is estimated facility-level consumption (MW × utilization × 8,760 hours) following LBNL methodology, supplemented with EIA-923 generation where local power plants exist. Land is facility footprint as a share of county land area, with MW-derived estimates for counties missing observed acreage.
6-variable State Regulatory Index across 42 DC-hosting jurisdictions: environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure. Each variable scored 0–3 against statute citations. Community pushback applies a −10% modifier; the Company Opacity Index applies up to a −5% modifier where local operators are demonstrably more transparent.
Wage Premium is data center industry average pay divided by county all-industry average pay. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals. Counties are tagged Tier A (high confidence), B (moderate), or C (imputed inputs). Mean wage premium across DC counties: ~1.9x.
12-indicator binary scoring of corporate transparency across 142 data center companies. Indicators span environmental reporting, energy disclosure, water disclosure, PUE reporting, renewable targets, cooling technology, community engagement, tax incentive disclosure, beneficial ownership, supply chain transparency, third-party audits, and incident reporting.
Scale: 0 = fully transparent, 1 = fully opaque. Tiers: Transparent (≤0.25), Open (0.26–0.50), Partial (0.51–0.65), Opaque (0.66–0.80), Dark (>0.80). County-level COI is capacity-weighted (larger facilities contribute more) and feeds the ROS calculation as a transparency credit of up to −5%.
Facility-level composite derived from PUE, grid dependency, cooling impact, backup emissions, and a transparency penalty. Quality flags: measured (real data for 3+ components), partial (1–2 real data points), insufficient (fully imputed — default 0.593).
Higher EES = more efficient. EES is currently displayed as an informational overlay and is not factored into the composite CSI; integration into RBS Energy is under evaluation.
Every county receives a data quality tier reflecting how much of its scoring input comes from observed source data versus estimates or imputations.
The four dimensions assessed are: (1) water stress (observed via Aqueduct vs. state-median imputed), (2) energy (EIA-923 supply data vs. MW demand proxy), (3) land (source acreage vs. MW-estimated), and (4) employment (BLS QCEW observed vs. state-residual imputed). Counties with Tier C data have the majority of their score driven by estimates and should be interpreted with caution. The tier is displayed in the county tooltip, scorecard, and sidebar.
Measures whether data center stress clusters geographically or distributes randomly. A positive Moran's I indicates clustering: counties near high-stress counties tend to also be high-stress, suggesting regional infrastructure strain rather than isolated incidents. Computed on the CSI values using queen contiguity weights from the Census TIGER county shapefile.
Local Moran's I (LISA) identifies specific hot-spot and cold-spot clusters. Counties flagged as hot spots (High-High) are surrounded by other high-stress counties. These are the regional pressure zones where infrastructure strain compounds.
Rather than fixed weights (0.4/0.35/0.25), entropy weighting lets the data determine how much each component contributes to the composite score. Components with more variation across counties receive higher weights; components where all counties score similarly receive lower weights. This prevents a uniformly high-stress component from dominating the index without adding discriminatory power.
Both fixed and entropy-weighted composites are computed. The dashboard defaults to fixed weights (which are more interpretable for policy audiences) but allows toggling to entropy-weighted for analytical rigor.
Tests whether data center facility announcements (from FracTracker timeline data) Granger-cause changes in county-level water stress scores (from Aqueduct monthly data). If past facility announcements help predict future water stress beyond what water stress's own history predicts, this provides statistical evidence of a causal link between data center expansion and resource degradation.
Applied county-by-county where sufficient time-series data exists (2015 to 2026). Results reported as significant/not significant with lag selection via AIC.
Counties receive letter grades A through F based on their percentile rank within all 670 data-center-hosting counties:
Percentile grading is a relative ranking, not an absolute threshold. An “A” grade means a county has lower composite stress than 80% of DC-hosting counties — it does not mean the county experiences no resource burden. Tier C data quality counties (where ≤1 of 4 scoring dimensions uses observed data) should be interpreted with particular caution; their grades are driven primarily by estimates and imputations.
Does not rank counties without facilities. Does not make causal claims. Does not generate regulatory findings. Informs judgment; it does not replace it.
A running ledger of the choices, validations, and trade-offs that shaped each version of the index. Kept here so the methodology body stays focused on what the score is, not what it has been.
The primary weights (0.4/0.35/0.25 for RBS; 0.5/0.3/0.2 for CSI) reflect the analyst’s judgment about relative importance. PCA, entropy, and equal-weight validation runs all produce rankings highly correlated with the policy-weighted version (Spearman ρ > 0.96), but the choice of weights is ultimately a value judgment. The dashboard’s entropy-weighted toggle exposes a purely data-driven alternative.
Estimated facility-level consumption uses LBNL methodology (MW × utilization × 8,760 hours). Utilization defaults to 50% per LBNL 2024 national average, adjusted by facility type (hyperscale 58%, colocation 50%, enterprise 43%). PUE sourced from operator reports where available (Google per-campus, Meta fleet 1.08, AWS per-region, Microsoft 1.16) or estimated from facility type. Replaces V8’s 100% utilization with flat PUE 1.3. EIA-923 generation data supplemented with an MW-based demand proxy for 89 counties with data centers but no local power plants. Renewable energy credit reduces energy burden for facilities reporting above-median renewable sourcing (up to −10%).
For 247 counties missing facility footprint data, land use is estimated using an industry heuristic of ~20 acres per 100 MW. Source acreage from PNNL and FracTracker is preferred where available.
All 6 regulatory variables across 42 jurisdictions were scored by a single rater (anna_claude). Inter-rater reliability pilot (Cohen’s Weighted Kappa > 0.60 per variable) deferred. Cronbach’s Alpha was dropped as a gate because the six variables are formative (independent dimensions), not reflective.
Company Opacity Index now acts as a transparency modifier on ROS: ROS × (1 + pushback_mod + COI_mod) where COI_mod = −0.05 × (1 − county_avg_coi). Counties whose operators are demonstrably more transparent receive up to a 5% reduction in regulatory opacity, recognizing that corporate transparency partially compensates for weak state regulation.
The Energy Efficiency Score is currently displayed as an overlay only. It is not factored into the composite CSI. Integration into the RBS Energy sub-score is under evaluation for a future version.
CUSUM change-point detection was removed because the available Aqueduct data covers a single year of seasonal variation, not a multi-year time series. Running CUSUM on 12 monthly values detected seasonal patterns (summer vs. winter), not genuine acceleration of water stress. The 38% flag rate confirmed this was noise, not signal. CUSUM has been replaced by the data quality tier system and facility-level mapping.
Percentile grading means exactly 20% of counties will always fall into each tier. There are no empirically validated “safe” or “critical” stress levels for data center hosting. Natural-breaks (Jenks) classification is under evaluation as a potential alternative.
ERS now uses Wage Premium (DC industry average pay divided by county all-industry average pay) instead of raw wages_per_mw. Mean wage premium across DC counties is approximately 1.9x. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals with Tier A/B/C confidence flags.
Intelligence-grade analysis. Built in public. Under $500. The analyst never abdicated to the machine.
The Data Center Stress Index answers a question that should be simple: when a data center moves into a county, what does that county actually get in return?
Every county in the United States that hosts a data center is scored across three dimensions: the resource burden it bears (water consumption, grid load, and land use), the transparency of its regulatory environment (permitting visibility and diesel generator reliance), and the economic return it receives (local jobs and wages per megawatt of installed capacity). These three scores combine into a single composite index that ranks counties on a letter-grade scale from A (low burden, high return) to F (high burden, low return).
The index draws from 22 public data sources, all free, totaling more than 1 GB of raw data. The V9.1 facility dataset (4,670 facilities across 50+ variables) merges PNNL, FracTracker (April 2026), and Epoch AI (April 2026) records with company-level transparency, efficiency metrics, and estimated energy and water consumption. Water stress scores come from the World Resources Institute's Aqueduct 4.0 atlas, spatially joined to county boundaries using Census TIGER shapefiles. Energy burden comes from the EIA-923 plant-level generation reports, crosswalked to counties through EIA-860 plant locations. Land use intensity is calculated from PNNL facility footprints divided by county total land area (Census TIGER ALAND). Employment and wage data come from the Bureau of Labor Statistics Quarterly Census of Employment and Wages (NAICS 518210 and all-industry), and regulatory opacity is derived from a 6-variable State Regulatory Index covering 42 jurisdictions. Foreign ownership attribution uses a structured field from the V8 facility dataset, covering 191 foreign-owned facilities across 14 countries.
Each component is normalized to a 0-to-1 scale and combined using expert weights validated against three alternative weighting schemes (PCA-derived, entropy, and equal-weight). The Resource Burden Score uses 40% water, 35% energy, and 25% land; PCA validation confirms water and energy dominate variance. The Regulatory Opacity Score uses a 6-variable State Regulatory Index covering environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure across 42 jurisdictions, modulated by a community pushback modifier. The Economic Return Score combines jobs per MW and a wage premium ratio (data center pay vs. county all-industry average), with Monte Carlo error propagation producing 90% credible intervals and confidence tiers. The composite index blends these three scores (50% burden, 30% opacity, 20% inverted return) to produce a single ranking. V8 adds a Company Opacity Index (12-indicator corporate transparency) and Energy Efficiency Score (PUE, grid dependency, cooling impact) at the facility level.
Beyond the core index, the tool applies several analytical methods to surface patterns that a simple ranking would miss. CUSUM (cumulative sum) change-point detection flags counties where water stress is accelerating rather than stable. Moran's I spatial autocorrelation identifies geographic clusters of high-stress counties, revealing regional infrastructure strain rather than isolated incidents. Entropy-weighted scoring offers a data-driven alternative to fixed weights, giving more influence to components that vary most across counties. And Granger causality testing examines whether data center facility announcements statistically predict future changes in local water stress.
The Policy Impact Lens lets users simulate the effect of 12 real legislative proposals on county stress scores. Each policy is decomposed into the specific formula parameters it would affect, with percentage adjustments derived from bill text and regulatory intent. The policies span the full spectrum from federal moratoriums that would halt all new construction to executive orders that accelerate permitting on federal land. This is not prediction; it is structured scenario analysis designed to inform advocacy, journalism, and legislative debate.
The DCSI does not rank counties that have no data center facilities. It does not make causal claims about whether data centers caused resource degradation. It does not generate regulatory findings or legal conclusions. It informs judgment; it does not replace it. Every data source is public, every formula is disclosed, and every assumption is documented in the Methodology page.
The Data Center Stress Index is a project by Anna R. Dudley. The entire tool was built using publicly available data for under $500. No proprietary datasets. No gated APIs. No corporate sponsorship. The purpose is to put the same analytical capability in the hands of local officials, community advocates, and journalists that the industry's own lobbyists already have.
For questions, speaking inquiries, or data requests, contact Anna R. Dudley. Power moves before policy does.
This page is dedicated to my dear friend and mentor, J.R. Four years ago, sitting in an old basement, you taught me how to set up my first virtual machine and how to install packages. In this new age of AI, you continue to teach the entire floor about the importance of never abdicating to the machine. Let AI work for you. Don’t let it think for you.
Every score, label, formula and visualization in this dashboard was reviewed by a human before it shipped. The errors logged below were caught during that review — some by Anna, some by collaborators, none by automated tests. They are kept in public for the same reason climate scientists publish their model uncertainty: the only way to trust a number is to know how it was made.
If you find a number on this dashboard that looks wrong, that is the experience this page is supposed to make possible. Tell us, and you will be added to the log.
An “error” in this log is anything an AI assistant produced that would have shipped wrong if a human hadn’t caught it. That includes:
Most data dashboards are presented as finished objects. The methodology page tells you the formula, the source, the date — and asks you to trust that the implementation matched. AI-assisted analysis breaks that assumption. Code that looks right can be wrong in ways that are completely invisible until you check the answer against reality.
Listing what went wrong — including the things that were embarrassing or that took weeks to find — is the only honest way to show the work. If a county’s grade changes between versions, the reason should be readable. If a formula was wrong for three weeks, the fact that it was wrong should be on the same page as the corrected number. That’s the contract.
Each entry includes the error ID, severity, status, the description of what was wrong, and how it was fixed. Sorted newest first. Filterable by status above.