Final Project – DATA 73000

For this project, I was interested in learning more about the trees in New York City by exploring the 2015 Tree Census data from NYC Open Data. I was interested in exploring trends relating to the tree population throughout the city, such as species, health status, and density of trees in each community district. Additionally, I wanted to see if there were any correlations between density/health of trees and other quality of life-related variables in the community district, such as health and economic conditions.

My interest in this topic comes from my work researching the relationship between public health and the built environment, particularly access to quality green space. Through my work (and also just from my own experience living in the city!), I’ve learned that green space positively contributes to mental wellbeing, but also that wealthier areas tend to have more access to amenities such as quality parks. I’m personally interested in exploring these data sets to see if these correlations are reflected in the NYC Tree Census. Other audience members could be my co-workers, or other researchers who are interested in this topic.

My datasets

In order to create these visualizations, I used three separate datasets. The main dataset I used was the NYC Tree Census from NYC OpenData. This data was collected in 2015 by NYC Parks employees and a team of volunteers. Additionally, I sourced data from two other datasets: Community Health Profiles from NYC Health, which is aggregated data by community district from the 2019-2020 Community Health Survey, and median household income by community district from the 2020 US Census. These data were collected by NYC DOHMH and the US Census Bureau, respectively. The three variables I used from these datasets were median household income from the US Census, a “health score” which represents the percentage of residents that reported their own health as “excellent,” “very good,” or “good,” and physical activity, which is the percentage of residents who have exercised in the past 30 days from the Community Health Survey.

A potential source of bias from these data would be from the Community Health Survey, which has a relatively small sample size, as the website cautioned against using these data to characterize the community district. Additionally, this data is self-reported, but self-perceived health is still a very valuable measure to assess general wellbeing.

A compromise I made was using aggregate data rather than the full dataset. This is due to the size of the Census and Community Health survey data. Because I considered this supplementary information to the main dataset (NYC Tree Census), I did not want to tackle the volume of data that would come with using all three large datasets. However, if I were to be primarily investigating topics such as economic or health conditions of New Yorkers by community district or looking at a smaller geographical region, I would opt to use the full datasets. Instead, I created by own dataset with community board number, health score, physical activity, and median household income. I then joined this dataset with the Tree Census dataset by community district in Tableau.

Visualizations

#1: What are the most popular tree species throughout the five boroughs? This visualization describes the top 10 most prevalent tree species in NYC by borough. I wanted to create this visualization first because it was this information that drew me to the dataset initially. I use the NYC Tree Census data regularly as I am walking through the city and see a tree that I’m curious about. Through creating this visualization, I can see that the highest volume of trees seems to be in Queens, with the London Planetree being the most prevalent tree species. I was also surprised to learn that a majority of NYC’s cherry trees can be found in Queens!

#2: Which NYC zipcodes have the most trees? Next, I further investigated the location of trees in NYC by creating a map that breaks down the number of trees by zipcode. I was curious to see which neighborhoods or boroughs had more trees and where they were lacking. Because trees are a key contributor to urban design, public health, and air quality, their location is very important. Through this map, I found that the most trees were located in Staten Island, which is mostly suburban, while the least amount of trees appeared to be in Manhattan, the most densely populated borough. In the other boroughs (The Bronx, Queens, Brooklyn), the trees tended to be located on the outskirts of the city, closer to the more suburban areas. This map was important to visualize the location of the trees spatially in the city. Creating this map also prompted further questions for me that I will explore in later visualizations, particularly if a neighborhood’s number of trees correlate with other aspects of life in the neighborhood, such as health and income.

#3: Is the number of neighborhood trees related to the health of neighborhood residents? As a next step in creating these visualizations, I explored the relationship between the number of the trees and health-related variables: the average “health score,” or the percentage of residents who self-reported their health as good or better, and “physical activity,” or the percentage of residents who self-reported that they have exercised in the past 30 days. I chose to visualize this data using a scatterplot to retain every data point, while also investigating whether or not there is a correlation between health and number of trees. Ultimately, using these variables, I did not find a correlation between health and number of trees. However, it did show an outlier in CB 503 in Staten Island, which had far more trees than any other community board in the city.

here

#4: Do wealthier neighborhoods have healthier trees? Lastly, I was interested in exploring the relationship between the wealth of a neighborhood (measured by median household income by community board) and health of neighborhood trees (measured by number of dead trees within the community board). My reasoning behind this question was that neighborhoods with higher income residents may have better maintenance on the trees in their neighborhood. However, I did not find a strong case for this hypothesis through this visualization. I grouped the five highest and lowest income community boards in this visualization, and though the poorer community boards generally did have more dead trees than the richer ones, they all fell in the middle of the distribution when it came to the number of dead trees. The number of dead trees is more likely a function of total trees, rather than proportion of total trees that have died and not been cleared. I chose a bar graph to represent these data so I could sort the community boards from lowest to highest number of dead trees to see where the highest and lowest income CBs fell.

here

Next steps

As usual as a beginner data visualizer, I see so many additional steps I could take with these data and these topics. Unfortunately, I did not find strong a strong relationship between number/health of trees and resident health/income levels. I would love to look at other measures of green space (e.g. number of neighborhood parks, community gardens), and explore their correlation to other economic (e.g. property value) or health measures (e.g. life expectancy, access to healthcare). A major issue I ran into was creating my map visualization. In other visualizations, I was investigating my data by borough or community board. However, I created my map visualization by zipcode data, which was inconsistent and not ideal. I found a community board shapefile on NYC Open Data, but had a lot of difficulty loading this into Tableau and using it in conjunction with my dataset. Instead, I used a preloaded shapefile (zipcode) to give a rough idea of where trees were located throughout the city. If I were to continue working to perfect this project, I would definitely work with this shapefile to get community board boundaries on my map.

My datasets

Visualizations

Next steps

Leave a Reply Cancel reply