D

Datacommons Client Expert

Production-ready skill that handles work, data, commons, platform. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

DataCommons Client Expert

A scientific computing skill for querying Google's Data Commons — an open knowledge graph aggregating statistical data from UN, WHO, CDC, Census Bureau, and 100+ other sources into a unified API. DataCommons Client Expert helps you retrieve statistical observations, explore entities, and build data-driven analyses across demographics, economics, health, and environment.

When to Use This Skill

Choose DataCommons Client Expert when:

  • Querying statistical data across demographics, economics, health, or climate
  • Comparing indicators across countries, states, or cities
  • Exploring time-series data from official government sources
  • Building data analyses that combine multiple public data sources

Consider alternatives when:

  • You need real-time market data (use financial APIs)
  • You need raw census microdata (use IPUMS or Census Bureau directly)
  • You need custom survey data (use specific survey data repositories)
  • You need non-statistical geographic data (use OpenStreetMap)

Quick Start

claude "Compare life expectancy across G7 countries over the last 20 years"
import datacommons as dc import datacommons_pandas as dcpd # Get life expectancy for G7 countries g7 = ["country/USA", "country/GBR", "country/FRA", "country/DEU", "country/ITA", "country/JPN", "country/CAN"] df = dcpd.build_time_series_dataframe( g7, "LifeExpectancy_Person" ) print(df.tail(20)) # Get multiple indicators for one place us_stats = dc.get_stat_all( ["country/USA"], ["Count_Person", "Median_Income_Person", "LifeExpectancy_Person", "UnemploymentRate_Person"] )

Core Concepts

Data Commons Concepts

ConceptDescriptionExample
Entity (DCID)Unique identifier for a place/thingcountry/USA, geoId/06
Statistical VariableMeasured quantityCount_Person, Median_Income_Person
ObservationSingle data point (value + date)Population in 2020: 331M
PropertyAttribute of an entityname, containedInPlace

Python API Methods

import datacommons as dc # Explore entities dc.get_property_values(["country/USA"], "name") # → {"country/USA": ["United States of America"]} dc.get_property_values(["country/USA"], "containedInPlace") # → US states and territories # Get statistical data dc.get_stat_value("country/USA", "Count_Person") # → Most recent population count dc.get_stat_series("country/USA", "Count_Person") # → Time series: {2010: 309M, 2015: 321M, 2020: 331M, ...} # Search for statistical variables dc.search_statvar("unemployment rate") # → List of matching statistical variable IDs

Building Comparative Datasets

import datacommons_pandas as dcpd import pandas as pd # Compare US states on multiple indicators states = [f"geoId/{str(i).zfill(2)}" for i in range(1, 57)] indicators = { "Count_Person": "Population", "Median_Income_Person": "Median Income", "Count_Person_Employed": "Employed", "LifeExpectancy_Person": "Life Expectancy", } dfs = {} for var_id, label in indicators.items(): df = dcpd.build_time_series_dataframe(states, var_id) dfs[label] = df # Get latest values latest = pd.DataFrame({ label: df.iloc[:, -1] for label, df in dfs.items() }) print(latest.sort_values("Median Income", ascending=False).head(10))

Configuration

ParameterDescriptionDefault
api_keyGoogle Data Commons API keyNone (limited without key)
default_dateDefault observation dateLatest available
place_typeDefault entity typeCountry
output_formatDataFrame or dictdataframe
cache_responsesCache API responses locallytrue

Best Practices

  1. Use DCIDs instead of names. Always reference entities by their DCID (country/USA, geoId/06037) rather than names. Names can be ambiguous — "Georgia" could be a US state or a country. DCIDs are unambiguous.

  2. Search for the right statistical variable. Data Commons has thousands of statistical variables. Use dc.search_statvar() to find the exact variable ID before querying. "Population" maps to Count_Person, not Population_Total or similar guesses.

  3. Check data availability before large queries. Not all variables are available for all places or time periods. Query a single entity first to verify the variable exists, then scale to your full entity list.

  4. Use datacommons_pandas for analysis. The pandas wrapper (dcpd) returns DataFrames directly, which integrate naturally with analysis workflows. The base datacommons library returns dictionaries that require manual conversion.

  5. Cite the underlying data source. Data Commons aggregates from many sources (Census, WHO, etc.). Identify and cite the original data source for your specific variable, not just "Data Commons."

Common Issues

Statistical variable returns no data. The variable may not exist for your entity type or time period. Use dc.get_stat_all() to see which variables have data for your entity. Check that the DCID format is correct for your geography level.

Time series has gaps. Different sources publish at different frequencies. Annual data may have gaps where the source didn't report. Use pandas interpolation for visualization, but clearly note interpolated vs. actual values in analysis.

Conflicting values from different sources. Data Commons may have multiple observations for the same variable from different sources. Specify a measurement_method or observation_period filter if you need consistency from a single source.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates