Datacommons Client Expert
Production-ready skill that handles work, data, commons, platform. Includes structured workflows, validation checks, and reusable patterns for scientific.
DataCommons Client Expert
A scientific computing skill for querying Google's Data Commons — an open knowledge graph aggregating statistical data from UN, WHO, CDC, Census Bureau, and 100+ other sources into a unified API. DataCommons Client Expert helps you retrieve statistical observations, explore entities, and build data-driven analyses across demographics, economics, health, and environment.
When to Use This Skill
Choose DataCommons Client Expert when:
- Querying statistical data across demographics, economics, health, or climate
- Comparing indicators across countries, states, or cities
- Exploring time-series data from official government sources
- Building data analyses that combine multiple public data sources
Consider alternatives when:
- You need real-time market data (use financial APIs)
- You need raw census microdata (use IPUMS or Census Bureau directly)
- You need custom survey data (use specific survey data repositories)
- You need non-statistical geographic data (use OpenStreetMap)
Quick Start
claude "Compare life expectancy across G7 countries over the last 20 years"
import datacommons as dc import datacommons_pandas as dcpd # Get life expectancy for G7 countries g7 = ["country/USA", "country/GBR", "country/FRA", "country/DEU", "country/ITA", "country/JPN", "country/CAN"] df = dcpd.build_time_series_dataframe( g7, "LifeExpectancy_Person" ) print(df.tail(20)) # Get multiple indicators for one place us_stats = dc.get_stat_all( ["country/USA"], ["Count_Person", "Median_Income_Person", "LifeExpectancy_Person", "UnemploymentRate_Person"] )
Core Concepts
Data Commons Concepts
| Concept | Description | Example |
|---|---|---|
| Entity (DCID) | Unique identifier for a place/thing | country/USA, geoId/06 |
| Statistical Variable | Measured quantity | Count_Person, Median_Income_Person |
| Observation | Single data point (value + date) | Population in 2020: 331M |
| Property | Attribute of an entity | name, containedInPlace |
Python API Methods
import datacommons as dc # Explore entities dc.get_property_values(["country/USA"], "name") # → {"country/USA": ["United States of America"]} dc.get_property_values(["country/USA"], "containedInPlace") # → US states and territories # Get statistical data dc.get_stat_value("country/USA", "Count_Person") # → Most recent population count dc.get_stat_series("country/USA", "Count_Person") # → Time series: {2010: 309M, 2015: 321M, 2020: 331M, ...} # Search for statistical variables dc.search_statvar("unemployment rate") # → List of matching statistical variable IDs
Building Comparative Datasets
import datacommons_pandas as dcpd import pandas as pd # Compare US states on multiple indicators states = [f"geoId/{str(i).zfill(2)}" for i in range(1, 57)] indicators = { "Count_Person": "Population", "Median_Income_Person": "Median Income", "Count_Person_Employed": "Employed", "LifeExpectancy_Person": "Life Expectancy", } dfs = {} for var_id, label in indicators.items(): df = dcpd.build_time_series_dataframe(states, var_id) dfs[label] = df # Get latest values latest = pd.DataFrame({ label: df.iloc[:, -1] for label, df in dfs.items() }) print(latest.sort_values("Median Income", ascending=False).head(10))
Configuration
| Parameter | Description | Default |
|---|---|---|
api_key | Google Data Commons API key | None (limited without key) |
default_date | Default observation date | Latest available |
place_type | Default entity type | Country |
output_format | DataFrame or dict | dataframe |
cache_responses | Cache API responses locally | true |
Best Practices
-
Use DCIDs instead of names. Always reference entities by their DCID (
country/USA,geoId/06037) rather than names. Names can be ambiguous — "Georgia" could be a US state or a country. DCIDs are unambiguous. -
Search for the right statistical variable. Data Commons has thousands of statistical variables. Use
dc.search_statvar()to find the exact variable ID before querying. "Population" maps toCount_Person, notPopulation_Totalor similar guesses. -
Check data availability before large queries. Not all variables are available for all places or time periods. Query a single entity first to verify the variable exists, then scale to your full entity list.
-
Use
datacommons_pandasfor analysis. The pandas wrapper (dcpd) returns DataFrames directly, which integrate naturally with analysis workflows. The basedatacommonslibrary returns dictionaries that require manual conversion. -
Cite the underlying data source. Data Commons aggregates from many sources (Census, WHO, etc.). Identify and cite the original data source for your specific variable, not just "Data Commons."
Common Issues
Statistical variable returns no data. The variable may not exist for your entity type or time period. Use dc.get_stat_all() to see which variables have data for your entity. Check that the DCID format is correct for your geography level.
Time series has gaps. Different sources publish at different frequencies. Annual data may have gaps where the source didn't report. Use pandas interpolation for visualization, but clearly note interpolated vs. actual values in analysis.
Conflicting values from different sources. Data Commons may have multiple observations for the same variable from different sources. Specify a measurement_method or observation_period filter if you need consistency from a single source.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.