DBF R&D

Can we create a prediction system for buildings based on location data?

With Sayjel Vijay Patel and Kam-Ming Mark Tam

Project Focus 🧿

In this project we explored how to apply Machine Learning to augment the architectural briefing process. Anyone is able to access troves of free data describing building type, tenants, property values, and energy use by location. With ML it is possible to identify hidden relationships between these datasets to predict the requirements of a new building at a specific location. We tried to predict the

  1. Building Attributes using Random Forest Classifiers
  2. Building Footprint using Conditional Generative Adversarial Nets (cGANs)
How to predict design attributes by location? A complete workflow

Data Pre-processing 🗃️

This was the most important step in this project. We used OpenStreetMaps (OSM); a collaborative, free, global mapping API that provides 2D and 3D geodata about different cities in JSON file format. OSM is a 'structured dataset', meaning textual and numeric elements such as coordinates, street names, postal codes, building geometry, and building attributes are organized in table format. The data was cleaned and preprocessed according to the need of the ML models, which is given below.

Random Forest Classifier

The OSM data has a variety of features and finding the right and useful features was a tough task. Though the structured data made it easier to train the supervised ML model, random forests classifier. The target variable for this experiment was Building Attribute Type (ie. building use). To train our ML model, we used the following features of the OSM data to predict our target variable for a new development site; Building perimeter, Number of neighbor buildings, Postal code, Street name, Site area, Site perimeter, Number of buildings in 50m, 100m, and 200m radius. To learn more about this approach check out our Towards Data Science article.

cGANs

The feature engineering was done using Shapely, Python package for manipulation and analysis of planar geometric objects and pySAL (Python Spatial Analysis Library), for geospatial analysis and creating new urban features. This structured data was then converted to Geo-Dataframe using geopandas, for plotting the building footprints, and contextual data like neighbors, streets and building sites.

One set of input and output images used for model training

Model Architecture (cGANs) 🧩

The input for training the ML model, cGAN was the images of the above mentioned features, created using Matplotlib, a popular python data visualization module. After analysis of initial results, the data was filtered and segregated into various categories based on similarities like, geometry, address, programs, and cities, for better performance along with optimizing the cGAN model.

The cGAN model architecture used to predict the building footprint based on site context

Results 🧪

Random Forest Classifier

The random forest classifier model achieved a prediction accuracy of 61.1 %. The accuracy was low in this example because we use only 100 data points in our training data.

cGANs

Results of cGAN without considering site shape as an input
The results from the cGAN model, for narrow shaped small site in New York City