DataCommons Client Expert

A scientific computing skill for querying Google's Data Commons — an open knowledge graph aggregating statistical data from UN, WHO, CDC, Census Bureau, and 100+ other sources into a unified API. DataCommons Client Expert helps you retrieve statistical observations, explore entities, and build data-driven analyses across demographics, economics, health, and environment.

When to Use This Skill

Choose DataCommons Client Expert when:

Querying statistical data across demographics, economics, health, or climate
Comparing indicators across countries, states, or cities
Exploring time-series data from official government sources
Building data analyses that combine multiple public data sources

Consider alternatives when:

You need real-time market data (use financial APIs)
You need raw census microdata (use IPUMS or Census Bureau directly)
You need custom survey data (use specific survey data repositories)
You need non-statistical geographic data (use OpenStreetMap)

Quick Start


claude "Compare life expectancy across G7 countries over the last 20 years"


import datacommons as dc
import datacommons_pandas as dcpd

# Get life expectancy for G7 countries
g7 = ["country/USA", "country/GBR", "country/FRA",
      "country/DEU", "country/ITA", "country/JPN", "country/CAN"]

df = dcpd.build_time_series_dataframe(
    g7,
    "LifeExpectancy_Person"
)
print(df.tail(20))

# Get multiple indicators for one place
us_stats = dc.get_stat_all(
    ["country/USA"],
    ["Count_Person", "Median_Income_Person",
     "LifeExpectancy_Person", "UnemploymentRate_Person"]
)

Core Concepts

Data Commons Concepts

Concept	Description	Example
Entity (DCID)	Unique identifier for a place/thing	`country/USA`, `geoId/06`
Statistical Variable	Measured quantity	`Count_Person`, `Median_Income_Person`
Observation	Single data point (value + date)	Population in 2020: 331M
Property	Attribute of an entity	`name`, `containedInPlace`

Python API Methods


import datacommons as dc

# Explore entities
dc.get_property_values(["country/USA"], "name")
# → {"country/USA": ["United States of America"]}

dc.get_property_values(["country/USA"], "containedInPlace")
# → US states and territories

# Get statistical data
dc.get_stat_value("country/USA", "Count_Person")
# → Most recent population count

dc.get_stat_series("country/USA", "Count_Person")
# → Time series: {2010: 309M, 2015: 321M, 2020: 331M, ...}

# Search for statistical variables
dc.search_statvar("unemployment rate")
# → List of matching statistical variable IDs

Building Comparative Datasets


import datacommons_pandas as dcpd
import pandas as pd

# Compare US states on multiple indicators
states = [f"geoId/{str(i).zfill(2)}" for i in range(1, 57)]

indicators = {
    "Count_Person": "Population",
    "Median_Income_Person": "Median Income",
    "Count_Person_Employed": "Employed",
    "LifeExpectancy_Person": "Life Expectancy",
}

dfs = {}
for var_id, label in indicators.items():
    df = dcpd.build_time_series_dataframe(states, var_id)
    dfs[label] = df

# Get latest values
latest = pd.DataFrame({
    label: df.iloc[:, -1] for label, df in dfs.items()
})
print(latest.sort_values("Median Income", ascending=False).head(10))

Configuration

Parameter	Description	Default
`api_key`	Google Data Commons API key	None (limited without key)
`default_date`	Default observation date	Latest available
`place_type`	Default entity type	`Country`
`output_format`	DataFrame or dict	`dataframe`
`cache_responses`	Cache API responses locally	`true`

Best Practices

Use DCIDs instead of names. Always reference entities by their DCID (country/USA, geoId/06037) rather than names. Names can be ambiguous — "Georgia" could be a US state or a country. DCIDs are unambiguous.
Search for the right statistical variable. Data Commons has thousands of statistical variables. Use dc.search_statvar() to find the exact variable ID before querying. "Population" maps to Count_Person, not Population_Total or similar guesses.
Check data availability before large queries. Not all variables are available for all places or time periods. Query a single entity first to verify the variable exists, then scale to your full entity list.
Use datacommons_pandas for analysis. The pandas wrapper (dcpd) returns DataFrames directly, which integrate naturally with analysis workflows. The base datacommons library returns dictionaries that require manual conversion.
Cite the underlying data source. Data Commons aggregates from many sources (Census, WHO, etc.). Identify and cite the original data source for your specific variable, not just "Data Commons."

Common Issues

Statistical variable returns no data. The variable may not exist for your entity type or time period. Use dc.get_stat_all() to see which variables have data for your entity. Check that the DCID format is correct for your geography level.

Time series has gaps. Different sources publish at different frequencies. Annual data may have gaps where the source didn't report. Use pandas interpolation for visualization, but clearly note interpolated vs. actual values in analysis.

Conflicting values from different sources. Data Commons may have multiple observations for the same variable from different sources. Specify a measurement_method or observation_period filter if you need consistency from a single source.

⚠️ Loading Issue

Datacommons Client Expert

DataCommons Client Expert

When to Use This Skill

Quick Start

Core Concepts

Data Commons Concepts

Python API Methods

Building Comparative Datasets

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace