Metadata Files
EUBUCCO includes comprehensive metadata files to support data exploration, validation, analysis, and reproducibility. These files are organized into four main categories:
| Category | Files | Description |
|---|---|---|
| Data Files | eubucco_lat_lon.parquet |
Lightweight building dataset |
| Boundaries | LAU-cities-2016.parquet, NUTS-regions-2016.parquet |
Administrative boundaries used |
| Statistics | city-stats.parquet, region-stats.parquet, prediction-eval-metrics.parquet |
Aggregated statistics and evaluation metrics |
| Reference Tables | building-type-harmonization.csv, input-datasets-metadata.xlsx |
Source dataset information and type mappings |
Data Files
eubucco_lat_lon.parquet
A lightweight version of the EUBUCCO v0.2 dataset containing only building centroids (latitude/longitude coordinates) and footprint area, without full footprint geometries.
Boundary Files
LAU-cities-2016.parquet
City (LAU) administrative boundaries used in EUBUCCO v0.2, based on 2016 boundaries.
Contents: city_id, region_id and geometry for each city.
NUTS-regions-2016.parquet
Regional (NUTS 0-3) administrative boundaries used in EUBUCCO v0.2, based on 2016 boundaries.
Contents: region_id, region_name, and geometry for each NUTS region at all levels (0-3).
NUTS Code Discrepancies
EUBUCCO uses modified NUTS 2016 boundaries with two regional merges (DEB33 → DEB3H, UKD73 → UKD47) and one reconstructed region: UKN1 was missing from the official download but reconstructed from its NUTS 3 components.
Statistics Files
city-stats.parquet
Comprehensive building stock statistics aggregated at the city (LAU) level. This GeoDataFrame includes city geometry and provides detailed metrics for each city.
Administrative Information
city_id,region_id(NUTS 3),country- City geometry (CRS: EPSG:3035)
Building Counts by Source
n_gov,n_osm,n_msft— Total counts by geometry source
Data Quality Indicators
- Ground truth counts: Buildings with attributes from the same source as geometry
n_gt_type,n_gt_subtype,n_gt_height,n_gt_floors,n_gt_construction_year- Merged attribute counts: Buildings with attributes merged from different sources
n_merged_type,n_merged_subtype,n_merged_height,n_merged_floors,n_merged_construction_year- Estimated attribute counts: Buildings with estimated attributes
n_estimated_type,n_estimated_subtype,n_estimated_height,n_estimated_floors
Building Type Distributions
- Main types: Residential, non-residential (counts and areas)
- Subtypes: Commercial, industrial, agricultural, public, others, detached, semi-detached, terraced, apartment (counts and areas)
Attribute Distributions
- Height bins: 0-5m, 5-10m, 10-20m, >20m
- Floor bins: 0-3, 4-6, >6 floors
- Construction year bins: ≤1900, 1901-1970, 1971-2000, >2000
- Footprint area bins: 0-25m², 25-100m², 100-500m², >500m²
Area Metrics
- Total footprint area and floor area
- Breakdowns by source (gov, osm, msft), type, and subtype
region-stats.parquet
Building stock statistics aggregated at the regional (NUTS 3) level. Contains the same metrics as city-level statistics but aggregated to NUTS 3 regions, including regional geometry.
prediction-eval-metrics.parquet
Regional (NUTS 2) evaluation metrics for building attribute estimation models. This GeoDataFrame provides comprehensive performance metrics for predicted building attributes.
Administrative Information
region_id(NUTS 2),country- Regional geometry (CRS: EPSG:3035)
Sample Sizes
n— Total number of buildingsn_gt_binary_type,n_gt_type,n_gt_residential_type,n_gt_height,n_gt_floors— Ground truth counts per attribute
Categorical Variable Metrics
For binary_type, type, and residential_type:
- Overall classification metrics: F1 score (macro and micro), Cohen's kappa
- Per-class F1 scores: Individual F1 scores for each building type/subtype
Continuous Variable Metrics
For height and floors:
- Overall metrics: MAE, RMSE, R² score
- Binned metrics: MAE and RMSE by value ranges
- Height: 0-5m, 5-10m, 10-20m, >20m
- Floors: 0-3, 3-6, >6
External Validation
For height:
- Microsoft height comparison: Metrics comparing predicted height with heights from Microsoft's GlobalMLBuildingFootprints dataset
Reference Tables
building-type-harmonization.csv
Mapping table showing how building types from various source datasets are harmonized to the standardized EUBUCCO building type classification.
input-datasets-metadata.xlsx
Comprehensive metadata table providing detailed information about all source datasets used in EUBUCCO v0.2.
Contents
- Dataset identification: Name, country, geographic coverage
- Access information: Data owner, license, access date, download links or procedures
- Technical details: File format, data structure, attribute availability
- Integration information: Processing workflow and integration approach