Challenges of huge open-source datasets for constructing detection in Africa Intermezzo: constructing footprint vs rooftop Comparison Discussion Conclusion References

Written by Sara Verbič. Work performed by Sara Verbič, Devis Peressutti, Nejc Vesel, Matej Batič, Žiga Lukšič, Jan Geršak, Matic Lubej and Nika Oman Kadunc.

We desired to create a big training dataset for automated constructing detection in Africa, so we reviewed open-source datasets for that purpose. Machine-generated datasets lack the accuracy for use as reference training data and are highly depending on the satellite imagery they were inferred from. Manual labeling on the specified goal satellite imagery stays essentially the most accurate choice to create ML-ready training datasets, although very laborious and expensive.

World population and settlements proceed to grow rapidly and far of this transition is especially noticeable in developing countries. Identifying the locations and footprints of buildings provides data for various practical and scientific purposes, reminiscent of population mapping, urban management, and environmental sciences. This data is especially useful in developing regions, where alternative data sources, especially from local authorities, is probably not available.

Different methods for estimating the placement and extent of buildings are viable. Although very accurate, the manual processing of an aerial/satellite image will not be scalable — it’s time-consuming and laborious. Machine learning (ML), computer vision and distant sensing have come a great distance in routinely and reliably delineating buildings, partly because of the increasing availability of very high-resolution imagery (with a spatial resolution of lower than 1 m). But open-source imagery with sufficient resolution, which will likely be 50 cm, is often only available for a small variety of locations around the globe. This presents a challenge because it’s crucial to make sure the training dataset is geographically diverse and includes a wide range of rural and concrete locations with different constructing styles. Nonetheless, not even ML methods coupled with very high-resolution imagery, reminiscent of imagery provided by Airbus Pleiades and Maxar WorldView, are sufficient to derive accurate estimates of constructing footprints. That is because of limitations of the input imagery, mostly because of the challenges posed by optical satellite image acquisitions, as shown in Figure 1. As well as, differences in spectral, temporal and spatial resolution between satellite image providers must be considered when selecting the suitable goal imagery.

Fig 1. Examples of various acquisition conditions for a similar location for Maxar WorldView imagery (WorldView © 2020 MAXAR Technologies). The examples show differences in sun azimuth and elevation angle, and satellite viewing angle, which affect how buildings are depicted and the shadows they forged. ML models might struggle to take care of such variations in appearances.

Alternatively, we want to keep in mind that reference labels utilized in ML could also be subject to errors. As well as, we want to confess that identifying buildings stays a difficult and difficult task in lots of scenarios, considering:

geological and vegetation features that will be confused for built-up structures;
areas characterised by small buildings, which might appear only just a few pixels wide at this resolution;
buildings constructed with natural materials that are inclined to mix in with the encompassing rural or desert areas;
clusters of buildings which can be very close together is probably not easily identified.

Such scenarios are common in Africa, which accounts for about 20% of the Earth’s total land area and presents a big selection of terrain and constructing types. Africa’s scarcity of reference data makes the validity of the constructing footprints all of the more necessary.

In the next, we are going to concentrate on reviewing just a few data sources and labels openly available for Africa, that are Microsoft’s Constructing Footprints (MBF [1]), Google’s Open Buildings (GOB [2]) and the Replicable AI for Microplanning (RAMP [3]) datasets. We review such datasets with the intent of using them as reference labels for our Hierarchical Detector (HIECTOR) [4]. As we’re excited by running HIECTOR on your complete continent of Africa, we will probably be different regions and areas. For those who are excited by the subject of constructing segmentation, we recommend the superb overview provided in Azavea’s blog-post series [5]. We’re also aware of and have checked out other open-source datasets, reminiscent of the SpaceNet challenge datasets [6], and the manual constructing footprints available on Radiant Earth MLHub [7]. Nonetheless, these datasets have limited spatial coverage, which limits their use as training datasets for predicting buildings on your complete continent. A further resource of open-source labels value mentioning is the Datasets section of the curated Satellite Imagery Deep Learning repository [8].

As many things, the analysed data sources have their pros and cons. Their foremost advantage is that they’re accessible and may often be used under permissive licences, suitable for each academic and business applications. Typically, the built-up areas are generally detected, however the accuracy of constructing footprints’ polygons significantly differs between datasets and locations. More on that later! If we concentrate on the previous, the MBF dataset doesn’t cover the entire of Africa, meaning that constructing footprints are missing for certain areas. They didn’t process imagery if tiles were dated before 2014 or had a low probability of detection [8]. The downside of each MBF and GOB datasets is that the imagery acquisition dates are unknown. GOB predictions were created in August 2022 [2], but essentially the most recent image for some locations was at the moment several years old or not available in any respect, and within the dataset there isn’t any information concerning the 12 months of acquisition of the used satellite imagery. The MBF dataset carries image acquisition date attribute for every constructing footprint, in the event that they could deduce the vintage of the imagery used. Nonetheless, our locations of interest didn’t have such information. All that is understood is that imagery used is from Bing Maps, including Maxar and Airbus imagery taken between 2014 and 2022 [9]. The shortage of this information makes working with data far more difficult, because it will not be possible to interpret the info on the suitable underlying imagery. The identical should be considered when assessing the standard of constructing footprints’ labels, because datasets don’t necessarily reflect the state of the (latest) underlying satellite imagery. In our case, this is applicable to MBF, GOB and HIECTOR datasets.

To see how polygons of constructing footprints are different between datasets and varieties of locations of interest we are going to concentrate to 4 locations of interest in numerous regions of Africa. They were chosen to represent various kinds of settlements — rural, urban, and high-rise. As an extra point of comparison, we may also incorporate labels detected using HIECTOR, that are for the moment only available for Dakar, Senegal.

A general definition for constructing footprint is a polygon, or set of polygons, representing a selected constructing within the physical world, providing a ground-centred representation of a constructing’s location, shape, dimensions, and area [10]. Getting all this information from overhead satellite imagery won’t be possible, so often algorithms provide an approximation of the footprint, depending on the image acquisition conditions and shape of the constructing. As an illustration, for a few of the high-rise buildings shown here, shadows and orientation of the constructing occlude the actual constructing footprint. In other cases, like for terraced houses or blocks, the separation of constructing footprints doesn’t correspond to physical visible features. Because of this, some automated algorithms are more successful in detecting and delineating constructing rooftops somewhat than the actual footprint.

For a good comparison, we visualise the datasets on the corresponding satellite imagery used to infer the constructing footprints. Comparison across different satellite imagery is difficult because of differences of the image acquisition conditions and processing. Nonetheless, despite the challenges, the estimated constructing footprints should provide a reliable estimate of the particular constructing position whatever the image it was derived from.

Below one can find image examples taken from larger areas of interest (AOI) that we investigated. Red bounding boxes surrounding buildings represent predictions of Microsoft’s Constructing Footprints dataset with Bing maps as underlying satellite imagery. Green bounding boxes represent Google’s Open buildings dataset, and its underlying imagery is Google Earth satellite imagery. RAMP predictions are marked with yellow bounding boxes, where detection results were obtained using the high resolution Pleiades imagery. Unlike the MBF and GOB datasets, the RAMP project provides the model and excellent instructions to derive the footprint polygons for any AOI. Moreover, location of interest in Dakar includes detections from HIECTOR, that are marked with blue bounding boxes and were also obtained using Pleiades imagery.

Fig 2. Bounding box color and underlying satellite imagery source for every dataset.

What first catches the attention is how the constructing footprints obtained through the RAMP prediction model have a distinctively amorphous shape, lacking well-defined edges, which consequently doesn’t accurately represent the bottom truth. One other limitation of RAMP is the unfinished extraction of larger buildings, in addition to the shortcoming to totally encompass the visible structure depicted within the satellite imagery, visible in some constructing footprints. That is after all a limitation of the model and never of the imagery, and it is probably going because of a scarcity of generalization to latest areas. One issue observed for GOB in all locations is because of partial buildings being predicted, probably because of the stitching of satellite tiles from different acquisition takes.

The MBF dataset tends to be less precise when detecting large-sized buildings, as these structures are often combined right into a single block of polygons, leading to a lack of detail. This factor must be considered particularly in densely populated areas where attached buildings are common. The RAMP prediction model is subject to an analogous issue, but with smaller-sized buildings.

Fig 3. Example location from Serrekunda, Gambia. Images show the considered open-source dataset overlaid onto the imagery used for his or her inference for a fairer comparison. Top-left, MBF dataset shown in red on Bing Maps imagery. Top-right, GOB dataset shown in green on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Areas featuring tall buildings generally exhibit the next accuracy because of their orderly designs. Such scenario is observable in districts in Cairo. Nonetheless, despite the homogeneity of structures, certain constructing footprints within the GOB dataset are fragmented. This will be attributed to the presence of smaller constructions situated on the rooftops of high-rise buildings. Such structures have diverse reflection characteristics and ranging roof heights, resulting in their identification as individual buildings. Quite the opposite, MBF groups high-rise buildings together in the identical bounding box, despite their clear separation. The GOB dataset can be subject to a challenge with accurately representing high-rise buildings, on this specific location their bounding boxes exhibit variation of their delineation. Specifically, certain bounding boxes capture the outline of the constructing’s roof, while others delineate the outline of the structure on the bottom.

Fig 4. Example location from Cairo, Egypt. Images show the considered open-source dataset overlaid onto the imagery used for his or her inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

Quickly changing scenery will not be a rare occurrence in Africa, subsequently datasets and satellite imagery don’t all the time reflect current state of the realm of interest. That is highlighted by the observed differences in detections and temporal diversity of underlying satellite imagery in the next comparison. As previously mentioned, it’s unclear which specific dates the MBF and GOB consult with, which might create difficulties in utilizing these two datasets. A notable issue is that the model fails to detect quite a few objects, including each smaller objects within the north and bigger objects within the south of the placement of interest. This presents a challenge to the accuracy and reliability of the model.

Fig 5. Example location from Gatumba, Bujumbura. Images show the considered open-source dataset overlaid onto the imagery used for his or her inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS). In this instance, large difference of land cover will be seen between images, making it difficult to evaluate the temporal veracity.

The restrictions of outdated datasets and inaccuracies in constructing detection models are clearly evident on this particular location of interest. The MBF dataset only includes buildings that were present before the 12 months 2017, while the RAMP prediction model shows significant inaccuracies in detecting buildings on this location, with numerous buildings going undetected and several other false detections of larger size.

Upon comparing the variety of detections across the datasets, it is obvious that the GOB dataset stands out with the next variety of smaller-sized detections, a few of which can not actually be buildings, but somewhat rocks or vegetation. Detections within the dataset are already filtered and include only those with confidence rating of 0.6 or greater. Google recommends filtering the detections based on confidence scores to realize a desired precision level depending on the appliance. The dataset quality varies per location, and Google provides a CSV file with suggested rating thresholds to acquire the advisable precision level for every download tile.

Fig 6. Example location from Modderspruit, South Africa. Images show the considered open-source dataset overlaid onto the imagery used for his or her inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, MBF, GOB and RAMP datasets shown together on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

In Dakar, we chosen an urban area of interest, where buildings are densely positioned in close proximity to at least one one other. Upon comparing datasets, now we have observed that HIECTORs detections have been essentially the most comprehensive. Nonetheless, there continues to be much room for improvement, as a few of the bounding boxes overlap and there are some false detections, reminiscent of parking spaces and random sections of roads. RAMP prediction model was largely unsuccessful in extracting individual constructing footprints. Many of the detected footprints contain multiple buildings, which poses a big challenge for accurate evaluation and evaluation of the dataset. The MBF dataset also presents a comparable challenge, albeit with fewer such instances observed. As well as, its foremost drawback is that loads of buildings weren’t detected. Analysing the GOB has proven to be difficult because of a unique viewing angle of the underlying satellite imagery. Nonetheless, high frequency of smaller-sized detections stays a persistent issue.

Fig 7. Example location from Dakar, Senegal. Images show the considered open-source dataset overlaid onto the imagery used for his or her inference for a fairer comparison. Top-left, MBF dataset in red shown on Bing Maps imagery. Top-right, GOB dataset in green shown on Google Earth imagery. Bottom-left, RAMP detection shown in yellow on Airbus Pleiades imagery. Bottom-right, HIECTOR predictions shown in blue on Airbus Pleiades imagery from 2021 (© CNES 2021, Distribution AIRBUS DS).

The above review was carried out with the aim of reviewing open-source constructing footprint datasets for his or her use as training dataset over large AOI, i.e., Africa, for our own constructing detection model HIECTOR. The presented quality assessment won’t apply in other use-cases, for example for a rough estimation of buildings in a given area. Nonetheless, for our use-case, we feel like providing the next suggestions and warnings:

Consider upfront which satellite imagery will probably be used as base layer, and bear in mind concerning the variations brought in by different acquisition conditions, particularly for those who plan to make use of multiple sources of images.
Manually labelled or validated constructing footprints provide essentially the most accurate estimation of constructing footprints, although their spatial coverage could be very limited. Ensure to ascertain the open-source datasets for manually labelled data.
For those who goal large areas and manually labelled footprints aren’t an option, consider machine-generated datasets. Nonetheless, the accuracy and coverage of machine-generated constructing footprints greatly varies across regions, so be certain to judge their accuracy using the goal imagery of selection.
Although machine-generated datasets won’t be accurate enough for use as training labels, they could present place to begin to hurry up manual labelling and validation. This, again, depends upon the region and on the complexity of the buildings and landscape being depicted.

Accurate and up-to-date constructing footprint data is crucial for various practical and scientific purposes. Latest technologies have made it possible to routinely delineate buildings. Nonetheless, limitations of the input imagery and reference labels still pose challenges, particularly in developing areas where accurate data could also be scarce. To deal with this issue, we explored various open-source datasets available for Africa. We identified a few of the cons and showed that the standard of the datasets varies from location to location and imagine it is vitally necessary to judge the suitability and limitations of those datasets for specific regions and applications. Further efforts are needed to enhance the accuracy and coverage of such datasets, but nevertheless, they supply a promising path towards more accurate and comprehensive constructing footprint data, especially for regions where alternative data sources is probably not available.

[1] https://www.microsoft.com/en-us/maps/building-footprints

[2] https://sites.research.google/open-buildings

[3] https://rampml.global/

[4] https://github.com/sentinel-hub/hiector

[5] https://www.azavea.com/blog/2022/10/26/automated-building-footprint-extraction-open-datasets/

[6] https://spacenet.ai/datasets/

[7] https://mlhub.earth/datasets?tags=constructing+footprints

[8] https://github.com/satellite-image-deep-learning/datasets

[9] https://github.com/microsoft/GlobalMLBuildingFootprints

[10] https://www.safegraph.com/blog/building-footprint