Data Schema Reference
Core Data Fields
| Group | Attribute | Type | Definition | Example |
|---|---|---|---|---|
| Identifiers | id |
string |
Unique ID composed of a Block ID1 and sequence number. | 3ba70f9963714924-0 |
region_id |
string |
NUTS3 regional identifier. | FR931 |
|
city_id |
string |
Local Administrative Unit (LAU) identifier for the city. | FR93066 |
|
| Attributes | type |
category |
Binary usage type2. | residential |
subtype |
category |
Detailed usage type3. | terraced |
|
height |
float |
Distance in meters from ground floor to top of building. | 23.9 |
|
floors |
float |
Total number of above-ground floors. | 3.5 |
|
construction_year |
integer |
Year construction finished (not renovation year). | 1963 |
|
| Geometry | geometry |
binary |
Footprint geometry projected in ETRS89 (EPSG:3035) encoded as WKB. Units in meters. |
01030000... |
Auxiliary Data Fields
| Group | Attribute | Type | Definition | Example |
|---|---|---|---|---|
| Confidence | type_confidence |
float |
Relative intersection of footprints (merged) or calibrated class probability (predicted). Range: [0, 1]. |
0.95 |
subtype_confidence |
float |
Relative intersection of footprints (merged) or calibrated class probability (predicted). Range: [0, 1]. |
0.64 |
|
height_confidence_lower |
float |
Min source value (merged) or lower 95% bootstrap CI (predicted). | 7.5 |
|
height_confidence_upper |
float |
Max source value (merged) or upper 95% bootstrap CI (predicted). | 10.5 |
|
floors_confidence_lower |
float |
Min source value (merged) or lower 95% bootstrap CI (predicted). | 2.7 |
|
floors_confidence_upper |
float |
Max source value (merged) or upper 95% bootstrap CI (predicted). | 3.2 |
|
construction_year_confidence_lower |
int |
Min source year (merged) or lower 95% bootstrap CI (predicted). | 1990 |
|
construction_year_confidence_upper |
int |
Max source year (merged) or upper 95% bootstrap CI (predicted). | 2000 |
|
| Sources | geometry_source |
category |
Origin of footprint geometry. | gov-france |
type_source |
category |
Origin of type attribute. |
osm |
|
subtype_source |
category |
Origin of subtype attribute. |
estimated |
|
height_source |
category |
Origin of height attribute. |
estimated |
|
floors_source |
category |
Origin of floors attribute. |
gov-france |
|
construction_year_source |
category |
Origin of construction_year attribute. |
gov-france |
|
| Source IDs | geometry_source_id |
string |
Primary identifier from the source provider. | BATIMENT000... |
type_source_ids |
array |
List of IDs from source(s) contributing to the type attribute. |
['osm_123'] |
|
subtype_source_ids |
array |
List of IDs from source(s) contributing to the subtype attribute. |
['osm_123'] |
|
height_source_ids |
array |
List of IDs from source(s) contributing to the height attribute. |
['ign_456'] |
|
floors_source_ids |
array |
List of IDs from source(s) contributing to the floors attribute. |
['ign_456'] |
|
construction_year_source_ids |
array |
List of IDs from source(s) contributing to the construction_year. |
['ign_456'] |
|
| Source values | subtype_raw |
string |
The original, unmapped building use type from the source dataset. | Einfamilienhaus |
Incorrect OSM Source IDs
The geometry_source_id and <attr>_source_ids fields currently do not contain the original OSM ID but a sequential index.
NUTS Code Discrepancies
EUBUCCO uses modified NUTS 2016 boundaries with two regional merges (DEB33 → DEB3H, UKD73 → UKD47) and one reconstructed region: UKN1 was missing from the official download but reconstructed from its NUTS 3 components.
Example
| id | region_id | city_id | type | subtype | height | floors | construction_year | type_confidence | subtype_confidence | height_confidence_lower | height_confidence_upper | floors_confidence_lower | floors_confidence_upper | construction_year_confidence_lower | construction_year_confidence_upper | geometry_source | type_source | subtype_source | height_source | floors_source | construction_year_source | geometry_source_id | type_source_ids | subtype_source_ids | height_source_ids | floors_source_ids | construction_year_source_ids | subtype_raw | geometry |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| aec95c88db604a13-0 | FRF31 | FR54029 | residential | detached | 3.3 | 1.0 | nan | 1.0 | 0.99 | 3.2 | 3.5 | 1.0 | 1.0 | <NA> | <NA> | gov-france | osm | estimated | estimated | osm | <NA> | BATIMENT0000000334652469 | ['lorraine-latest_1372503'] | <NA> | <NA> | ['lorraine-latest_1372503'] | <NA> | Indifférencié | POLYGON ((402...)) |
| 9d1c73d618ef419e-0 | FRF31 | FR54037 | residential | detached | 5.7 | 2.0 | nan | 0.86 | 0.84 | 5.3 | 6.2 | 2.0 | 2.0 | <NA> | <NA> | gov-france | osm | estimated | estimated | osm | <NA> | BATIMENT0000002101827291 | ['lorraine-latest_1661552'] | <NA> | <NA> | ['lorraine-latest_1661552'] | <NA> | Indifférencié | POLYGON ((404...)) |
| ce19c9e4976d46f6-2 | FRF31 | FR54039 | non-residential | others | 3.8 | 2.0 | nan | 0.84 | 0.84 | nan | nan | 2.0 | 2.0 | <NA> | <NA> | gov-france | osm | osm | gov-france | osm | <NA> | BATIMENT0000000334989890 | ['lorraine-latest_805209'] | ['lorraine-latest_805209'] | <NA> | ['lorraine-latest_804619'] | <NA> | Indifférencié | POLYGON ((407...)) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Technical Notes
Building Type Classification
- We use a two-tier hierarchy to categorize building usage. The
typefield provides a high-level binary split, while thesubtypefield contains the granular class.- Type:
residential. Subtypes:detached(single-family),semi-detached(duplex),terraced(row house), andapartment(multi-family). - Type:
non-residential. Subtypes:industrial,commercial,public,agricultural, andothers(e.g. garages)
- Type:
- The
subtype_rawcolumn preserves the original source classification string prior to this harmonization. Refer to the metadata file on building type mapping for more details.
Geometry Encoding
- The
geometrycolumn is stored as Well-Known Binary (WKB). This is a compact, machine-readable format optimized for GIS tools like PostGIS, QGIS, and GeoPandas.
Attribute Sources
- Attribute source categories are
osm,msft,gov-<region>,estimated(whererepresents the specific regional authoritative dataset identifier)
Attribute Uncertainty
- Ground Truth:
NaNin a<attr>_confidencecolumn indicates the attribute was provided directly by the geometry source; no merging or machine learning estimation was required. - Interpretation: For
type_confidenceandsubtype_confidence, a value of 0.6 implies that 60% of buildings in that cohort are statistically expected to be correctly classified. - Methodology: For details on how the uncertainty of attribute estimation and merging is quantified, please refer to the Uncertainty Section.
Attribute Merging
- Source Mismatch: If
geometry_sourceand<attr>_sourcediffer, the attribute has been merged between datasets. - Data Fusion: If
<attr>_source_idscontains multiple values, the final value has been from aggregated from multiple source buildings.
Attribute Prediction
- See the Prediction Evaluation section for an assessment of attribute estimation quality.