Cookbook: Python & SQL
This reference provides essential snippets for cleaning, enriching, and transforming EUBUCCO data using GeoPandas or DuckDB.
Identifiers
Determining Block ID
EUBUCCO IDs are formatted as {uuid}-{index}. The prefix can be extracted to identify building blocks, i.e. clusters of adjacent buildings.
Determining NUTS 0, 1, or 2 Region
Attribute Metadata
Handling Confidence Values
Authoritative data lacks explicit confidence scores. Fill these gaps with 1.0 to ensure they are not excluded during quality filters.
Attribute Source Comparison
Determine if an attribute was merged from an external source or estimated using ML by identifying mismatches between geometry and attribute sources.
Custom Building Type Harmonization
Map raw subtypes from source datasets to custom building type classification.
Geometry
Decoding WKB and WKT
Geometries are stored as Well-Known Binary (WKB). Use these methods for manual decoding or to export human-readable Well-Known Text (WKT).
import pandas as pd
import geopandas as gpd
from shapely import wkb
# Load raw parquet (geometry is binary)
df = pd.read_parquet("data.parquet")
# Fast decoding of WKB column to Shapely objects
gdf = gpd.GeoDataFrame(
df,
geometry=gpd.GeoSeries.from_wkt(df["geometry"]),
# OR geometry=df["geometry"].apply(wkb.loads),
crs="EPSG:3035"
)
Coordinate Reference System (CRS) Transformation
Convert building geometries from the local projected CRS (EPSG:3035) to WGS84 (Lat/Lng).
Centroid Generation
Replace building footprints with centroids to reduce computational overhead for point-in-polygon operations or visualization.
H3 Grid Aggregation
Map buildings to hexagonal grid (H3) for analysis.