Chapter 3 Data transformation

The process of importing the data into R was extremely smooth. We downloaded the data from Redfin, extracted it and then read it directly into an R data frame. Afterwards, we changed a few variables to factor and date classes. We then used the dplyr select function to choose only the columns and time range (2018-01-01 to 2021-10-31) we wanted to focus on. Here are the first 6 rows of the dataset.

##   period_begin period_end table_id          state state_code   property_type
## 1   2021-10-01 2021-10-31       18 North Carolina         NC All Residential
## 2   2021-10-01 2021-10-31       17       Columbia         DC All Residential
## 3   2021-10-01 2021-10-31       46  West Virginia         WV All Residential
## 4   2021-10-01 2021-10-31       12     New Jersey         NJ All Residential
## 5   2021-10-01 2021-10-31       21        Georgia         GA All Residential
## 6   2021-10-01 2021-10-31       39      Louisiana         LA All Residential
##   median_sale_price median_list_price median_ppsf median_list_ppsf homes_sold
## 1            327800            340000         174              180      11101
## 2            706000            660000         529              528        840
## 3            282400            274300         152              151        496
## 4            409900            417100         251              255      11670
## 5            328200            330900         160              165      13306
## 6            257700            279300         153              160       2312
##   new_listings inventory months_of_supply avg_sale_to_list sold_above_list
## 1         9768     17638              1.6        1.0132368       0.5203135
## 2         1000      2362              2.8        1.0034682       0.3666667
## 3          518       842              1.7        0.9947753       0.3447581
## 4        11197     26931              2.3        1.0110519       0.5131962
## 5        13305     17973              1.4        1.0010887       0.4439351
## 6         2057      3595              1.6        0.9841027       0.2564879
##   off_market_in_two_weeks parent_metro_region
## 1               0.3020888        South Region
## 2               0.3279045        South Region
## 3               0.3634018        South Region
## 4               0.2899781    Northeast Region
## 5               0.5239125        South Region
## 6               0.5555043        South Region