G

Geopandas Expert

Battle-tested skill for python, library, working, geospatial. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

GeoPandas Expert

A scientific computing skill for geospatial data analysis using GeoPandas — the Python library that extends pandas to support spatial operations on geometric types, combining the power of shapely for geometry, fiona for file access, and matplotlib for plotting.

When to Use This Skill

Choose GeoPandas Expert when:

  • Working with shapefiles, GeoJSON, or other geospatial vector data
  • Performing spatial joins, overlays, and geometric operations
  • Creating choropleth maps and spatial visualizations
  • Analyzing geographic distributions and spatial relationships

Consider alternatives when:

  • You need raster data processing (use rasterio or xarray)
  • You need interactive web maps (use Folium or Leaflet)
  • You need large-scale geospatial processing (use PostGIS or Apache Sedona)
  • You need GPS trajectory analysis (use MovingPandas)

Quick Start

claude "Load a shapefile, perform a spatial join, and create a choropleth map"
import geopandas as gpd import matplotlib.pyplot as plt # Read geospatial data world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")) cities = gpd.read_file(gpd.datasets.get_path("naturalearth_cities")) print(f"Countries: {len(world)}") print(f"CRS: {world.crs}") print(f"Columns: {list(world.columns)}") # Spatial join: which country is each city in? cities_with_country = gpd.sjoin(cities, world, how="left", predicate="within") print(f"\nCities with country info: {len(cities_with_country)}") # Choropleth map fig, ax = plt.subplots(figsize=(15, 10)) world.plot(column="gdp_md_est", cmap="YlOrRd", legend=True, legend_kwds={"label": "GDP (millions USD)"}, ax=ax, edgecolor="black", linewidth=0.5) ax.set_title("World GDP") ax.set_axis_off() plt.savefig("world_gdp.png", dpi=150, bbox_inches="tight")

Core Concepts

GeoDataFrame Operations

OperationMethodDescription
Readgpd.read_file()Load shapefile, GeoJSON, GeoPackage
Writegdf.to_file()Save to geospatial format
CRS Transformgdf.to_crs()Reproject coordinate system
Spatial Joingpd.sjoin()Join based on spatial relationship
Overlaygpd.overlay()Geometric set operations
Buffergdf.buffer(distance)Create buffer zones
Dissolvegdf.dissolve(by=col)Aggregate by attribute

Geometric Operations

# Buffer: Create 50km zones around cities cities_buffered = cities.copy() cities_buffered = cities_buffered.to_crs(epsg=3857) # Project for meters cities_buffered["geometry"] = cities_buffered.buffer(50000) # 50km # Intersection: Find overlapping areas intersection = gpd.overlay(gdf1, gdf2, how="intersection") # Union: Merge geometries union = gpd.overlay(gdf1, gdf2, how="union") # Dissolve: Merge polygons by attribute continents = world.dissolve(by="continent", aggfunc="sum") # Centroid: Get center points world["centroid"] = world.geometry.centroid # Area calculation (requires projected CRS) world_projected = world.to_crs(epsg=6933) # Equal area world_projected["area_km2"] = world_projected.area / 1e6

Coordinate Reference Systems

# Check CRS print(gdf.crs) # e.g., EPSG:4326 (WGS84) # Transform to different CRS gdf_mercator = gdf.to_crs(epsg=3857) # Web Mercator gdf_equal_area = gdf.to_crs(epsg=6933) # Equal area gdf_utm = gdf.to_crs(epsg=32633) # UTM Zone 33N # Common CRS codes # EPSG:4326 — WGS84 (lat/lon, GPS coordinates) # EPSG:3857 — Web Mercator (web maps, meters) # EPSG:6933 — Equal Area (for area calculations)

Configuration

ParameterDescriptionDefault
crsCoordinate reference systemFile-defined
driverOutput file formatESRI Shapefile
encodingCharacter encodingutf-8
spatial_indexBuild R-tree spatial indexTrue
predicateSpatial relationship typeintersects

Best Practices

  1. Always check and set CRS before spatial operations. Spatial joins and distance calculations require matching CRS. Use gdf.to_crs() to reproject. Area and distance calculations require a projected CRS (not lat/lon), such as UTM or an equal-area projection.

  2. Use spatial indexing for large datasets. GeoPandas builds R-tree spatial indices automatically. For repeated spatial queries on large datasets, this speeds up operations dramatically. Ensure the index exists with gdf.sindex.

  3. Project to appropriate CRS for measurements. WGS84 (EPSG:4326) is for storage and display, not measurement. For distance calculations, use UTM for your region. For area calculations, use an equal-area projection. Using lat/lon degrees for distances gives wrong results.

  4. Dissolve before plotting large datasets. Rendering 100,000 individual polygons is slow. Dissolve by the attribute you're visualizing to reduce geometry count. This also produces cleaner maps.

  5. Use GeoPackage instead of Shapefile for modern workflows. Shapefiles have limitations: 2GB file size limit, 10-character column names, no null values. GeoPackage (.gpkg) supports all data types, large files, and multiple layers in one file.

Common Issues

Spatial join returns more rows than expected. A point can fall within multiple overlapping polygons, creating duplicate rows. Use how="left" to keep all left entries, and deduplicate by keeping the first match or the one with the largest overlap area.

CRS mismatch error in spatial operations. Both GeoDataFrames must have the same CRS. Use gdf2 = gdf2.to_crs(gdf1.crs) to align them before spatial joins, overlays, or comparisons.

Geometry column contains invalid geometries. Shapefiles sometimes contain self-intersecting or otherwise invalid polygons. Use gdf["geometry"] = gdf.geometry.buffer(0) to fix common geometry errors before spatial operations.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates