DBF R&D

📄 Read more about the project in the medium article

Though artificial general intelligence (AGI) is years away, great progress is being made in one aspect of AGI focusing on consuming, interpreting, and classifying information shared by humans called Semantic AI. This project explores, how Semantic AI can help city designers move beyond the objective data to draw insights from variuos experiential data sources.

Project Focus 🧿

In this project we tried to major challenges faced during urban analysis, like incomplete and outdated datasets, cost of APIs etc. We recognized a need for our contextual data sources to improve the accuracy and potency of our Neighborhood Quality Index.

Semantic AI helped us to extract neighborhood-specific keywords and summaries from alternative datasets such as social media reviews. This methodology was integrated to evaluate neighborhood quality with the help of qualitative data for any location.

The summarized information about the Kallang neighborhood, Singapore. The centrally located popular neighborhood, has the most Airbnb listings and reviews in Singapore.

Data Collection 📦

Collecting valuable and relevant data sources is a tedious, costly, and time-consuming process for urban designers. For this project we used Airbnb's open-source dataset Inside Airbnb. This is because the information Airbnb collects focuses on practical, location-specific, insights to help their customers rate, rank, and compare different properties based on criteria such as proximity to public transport stations, noise levels, parks, and other important characteristics. In turn, user reviews can create a better perception of neighborhoods, districts, and even cities. An additional benefit is the global availability of these datasets, which are crowd-sourced from individual reviews and opinions.

The dataset includes detailed listings and reviews in a tabular format and the neighborhoods of Singapore in GeoJSON format. Since our project focuses on mining insights from qualitative inputs, we extracted features like "Neighborhood Overview" and "Guest Review" for further analysis.

Text Summarization 📝

Next step was to summarize the user-generated guest reviews neighborhood overviews written by guests, travelers, and hosts, for different neighborhoods of Singapore, using Natural Language Processing (NLP). This task was done using Natural Language Toolkit (NLTK).

Model Pipeline showing the process of converting merged text data into meaningful summarized paragraph

Finally, we concatenate the reviews for different neighborhoods, and then remove special characters, extra spaces, and stop-words. Next, we tokenize the text; this means converting the words into numbers while considering the position of words in sentences. We then measured the occurrence and frequency of keywords and passages within the reviews. And then, we extracted the most frequent sentences and keywords to create a summary of the most common elements of a location. The resulting summary informs a more qualitative understanding of the neighborhood in contrast to more objective map data.

In this new data layer, user experience is translated to feedback and reviews for many other users, adding another dimension to urban analytics. We believe, these insights will help city designers choose the right building type and function to enhance a particular location.

Next Steps 💡

This project was further expanded and worked on, with the help of generative AI. Transformers like Large Language Models (LLMs) were used to generate and evaluate neighborhood design proposals based on the principles, frameworks and research of urban design experts.

*Research paper titled, "Generative Pre-Trained Transformers for 15-Minute City Design" on the topic has been submitted to CAADRIA'23