Research Proposal: Sea Level Rise Effect on Coastal Housing Prices

By BozandtheBozzers: Andrew Bosland, Will Rothpletz, Carter Karinshak, and Linh Thai

Research Question

Over the past two decades, climate change has become a topical issue that many worry about due to its implications on our planet and quality of life. One effect of increasing temperatures from climate change is the melting of the polar ice caps which leads to high amounts of freshwater flooding our oceans and the sea level gradually rising. With this sea level rise comes numerous problems, one of which is its impact on coastal cities or homes. As sea levels rise, they erode beaches, reduce usable land, and jeopardize the integrity of structures that were not built in accordance with much higher sea levels. The rising sea level is a major concern for many coastal homeowners, and we plan to analyze this problem further.

In this project, we want to analyze the correlation between the global sea level change and the listing prices for U.S. houses in coastal regions in an attempt to measure the relationship between rising sea levels and coastal housing prices. Using our results we want to look at the difference in listing prices for houses in different zip codes from the same coastal region. By comparing housing prices for zip codes that are directly on the coastline with those that are further inland, we will be able to perform our analysis to determine if there is a correlation between housing prices and sea level rise. We also want to examine the magnitude of impact on housing prices given how further inland a particular coastal zip code is. Finally, using all of our results from our analysis, we will attempt to predict coastal property listing values given the estimated rise in U.S. coastal sea waters. Data on our estimated rise over the next 10 years is provided by the National Oceanic and Atmospheric Administration.

To answer our question, we located a dataset from Data World that contains Zillow’s median home listing price for every zip code in the United States. The dataset contains median housing prices from January of 2010 to September of 2017. Each column presents the median housing price for the given month of a year, and for our analysis we plan to analyze the data from 2013 to 2017 as there is considerably more data recorded in the later years of this dataset. Additionally, from Kaggle we were able to locate data on global sea level rises over the past three decades (1991-2021). In this dataset, the primary variable we will be using for analysis is GMSL (Global Isostatic Adjustment (GIA) not applied) variation (mm) with respect to 20-year TOPEX/Jason collinear mean reference. Not accounting for Global Isostatic Adjustment means that we will not be accounting for possible movements in the earth’s crust under or around ice caps. Additionally, the 20-year TOPEX/Jason collinear mean reference is a prominently used study that examines historic sea levels rises and predicts a trend of a rise of 3.3 mm/year in sea level.

For our hypothesis, we predict that a rise in sea level has a statistically significant impact on the listing prices of homes in U.S. coastal cities when compared to home prices in adjacent zip codes. To explore this hypothesis, our null hypothesis is that the median change in coastal city home listing prices associated with a 1 mm rise in global sea levels is equal to 0, or β1 = 0, with all other variables held constant. Our alternative hypothesis therefore tests if this association is not equal to one, or β1 ≠ 0, and if any correlation is present between these variables.

Necessary Data

As explained above, we currently have two types of data set to merge for further investigation and analysis. Sourced from Data World, the first dataset is the median monthly house listing price at the city and zip code level. Listing data for non-coastal and national data is also included in the data to serve as a benchmark for our comparison and analysis. This data set is currently sampled for the period 2010-2017. With sample data from Kaggle, the second dataset is the change in sea level as observed by satellites. This dataset includes 9 variables and 1048 observations for the period 1993 – 2021. Some variables of interest are GMSL (Global Isostatic Adjustment (GIA) not applied) variation (mm) with respect to 20-year TOPEX/Jason collinear mean reference, Smoothed (60-day Gaussian type filter) GMSL (GIA not applied) variation (mm). We will also examine whether GIA applied will have any effect on our analysis. The original measurements are taken from NASA satellites, which we intend to look further into for more updated data. After merging these two types of dataset, our final data set will have an observation of a zip code - month. Necessary variables are date-time, the median housing prices for select coastal cities, the month-over-month percentage change in price, and sea level data (both raw and smoothed).

Considerations for further data collection include expanding and refining our criteria for selecting more coastal cities to represent different US coastal regions (e.g. Northeast, Mid-Atlantic, Southeast, Gulf of Mexico). We project that the correlation might look different depending on how close or further away some zip codes are from the coast.

Our final data set may look like this:

Zip Code City Month Year Price ChangePrice TotalWeightedObservations GMSL_noGIA SmoothedGSML_noGIA GMSL_GIA SmoothedGSML_GIA
xxxx1 Miami Jan 1993 $ %          
xxxx2 Miami Feb 1993 $ %          
xxxx3 NY Jan 1993 $ %          
xxxx4 NY Feb 1993 $ %          

Currently, we plan to analyze the median monthly listing price of houses in a specific coastal zip code and compare that to the median house price in the zip codes adjacent to it that are further inland. To organize our data, we will create two different subfolders within our inputs folder to help us compartmentalize our analysis. The first folder will contain only information pertaining to the prices of homes in coastal cities and will also contain a file outlining our criteria for what our definition of a coastal city is. Within this folder, we will first have a grouping of folders to break out our data by city/region so that we may organize our zip codes into a more easily digestible format for our analysis. Each of these folders will then have a series of folders within them for each zip code that we are examining, which will contain the median house prices for the specific coastal zip code and the zip codes adjacent to it. By having a folder for each specific region and zip code we are working with, which contains only the information pertaining to that area of examination, we believe it will be significantly easier to perform our analysis effectively. Along with our folder on median housing prices, our second subfolder will contain all of the data we’ve collected with regard to sea level rise and also any information that is important for someone to know before using the sea level data. We believe that organizing our folders in such a manner will not only make our repository easier for us to use, but will also help future visitors better understand and interact with our work.

Transforming our data into its final form will likely not be a difficult task for our group due to a multitude of reasons. While our analysis of the potential correlations we discover will be complex, the data itself that we will be sourcing is quite simple in nature and is also well-organized from common sources. For example, all our data with regard to housing prices comes from Zillow and is already conveniently broken out meaning that our data should be easy to clean and manipulate as needed. However, there is also a significant amount of missing data within this file for certain years, meaning that we will likely have to take some time to analyze what data is usable for our analysis. We will create a new variable to calculate the percentage change in housing price for each observation to better measure and demonstrate the magnitude of change in the listing values. The data from NASA/Kaggle concerning sea level rise is also quite simple and we will only need to utilize a small portion of the data available. The only concerns we have noted with this dataset are with regard to changing our columns and However, one issue that we will likely encounter will be determining and making sense of the timeline for both data sets since our housing data and sea level data have different units of measurement for time. Thus, the biggest issue for us will likely be removing unnecessary data and then cleaning the remaining data so that we can merge the necessary data together on a timeline that will provide fruitful results for our analysis. To accomplish this we will likely have to further examine and transform our data about sea levels in some way so that we may achieve a consistent and useful timeline for our analysis (likely on a scale of month(s)).

Revisions

  • Fixed typos.
  • Updated our Research Question section to include the analysis on different zip codes and their adjacent zip codes.
  • Updated our Data section to describe the new data set on different zip codes and their respective housing prices.