The building industry is exploding with data sources that impact the energy performance of the built environment and health and well-being of occupants. Spreadsheets just don’t cut it anymore as the sole analytics tool for professionals in this field. Participating in mainstream data science courses might provide skills such as programming and statistics, however the applied context to buildings is missing, which is the most important part for beginners.
This course focuses on the development of data science skills for professionals specifically in the built environment sector. It targets architects, engineers, construction and facilities managers with little or no previous programming experience. An introduction to data science skills is given in the context of the building life cycle phases. Participants will use large, open data sets from the design, construction, and operations of buildings to learn and practice data science techniques.
Essentially this course is designed to add new tools and skills to supplement spreadsheets. Major technical topics include data loading, processing, visualization, and basic machine learning using the Python programming language, the Pandas data analytics and sci-kit learn machine learning libraries, and the web-based Colaboratory environment. In addition, the course will provide numerous learning paths for various built environment-related tasks to facilitate further growth.
Section 1: Introduction to Course and Python Fundamentals – In this introduction, an overview of key Python concepts is covered as well as the motivating factors for building industry professionals to learn to code. The NZEB at the NUS School of Design and Environment is introduced as an example of a building that uses various data science-related technologies in its design, construction, and operations.
Section 2: Introduction to the Pandas Data Analytics Library and Design Phase Application Example – The foundational functions of Pandas are demonstrated in the context of the integrated design process through the processing of data from parametric EnergyPlus models. Further future learning path examples are introduced for the Design Phase including building information modeling (BIM) using Revit or Rhino, spatial analytics, and building performance modeling Python libraries.
Section 3: Pandas Analysis of Time-Series Data from IoT and Construction Phase Application Example – Time-series analysis Pandas functions are demonstrated in the Construction Phase through the analysis of hourly IoT data from electrical energy meters. Further future learning path examples are introduced for the Construction Phase including project management, building management system (BMS) data analysis, and digital construction such as robotic fabrication.
Section 4: Statistics and Visualization Basics and Operations Phase Application Example – Various statistical aggregations and visualization techniques using Pandas and the Seaborn library are demonstrated on Operations Phase occupant comfort data from the ASHRAE Thermal Comfort Database II. Further future learning path examples are introduced for the Operations Phase including energy auditing, IoT analysis, and occupant detection and reinforcement learning.
Section 5: Introduction to Machine Learning for the Built Environment – This concluding section gives an overview of the motivations and opportunities for the use of prediction in the built environment. Prediction, classification, and clustering using the sci-kit learn library is demonstrated on electrical meter and occupant comfort data. The course is concluded with suggestions on more in-depth Python, Data Science, and Statistics courses on EDx.
Development of this curriculum was led by Dr. Clayton Miller with support from NUS students Ananya Joshi, Charlene Tan, Chun Fu, James Zhan, Mahmoud Abdelrahman, Matias Quintana, Miguel Martin, and Vanessa Neo.