Categories
Uncategorized

Final Project

For this project, I was interested in learning more about the trees in New York City by exploring the 2015 Tree Census data from NYC Open Data. I was interested in exploring trends relating to the tree population throughout the city, such as species, health status, and density of trees in each community district. Additionally, I wanted to see if there were any correlations between density/health of trees and other quality of life-related variables in the community district, such as health and economic conditions.

My interest in this topic comes from my work researching the relationship between public health and the built environment, particularly access to quality green space. Through my work (and also just from my own experience living in the city!), I’ve learned that green space positively contributes to mental wellbeing, but also that wealthier areas tend to have more access to amenities such as quality parks. I’m personally interested in exploring these data sets to see if these correlations are reflected in the NYC Tree Census. Other audience members could be my co-workers, or other researchers who are interested in this topic.

My datasets

In order to create these visualizations, I used three separate datasets. The main dataset I used was the NYC Tree Census from NYC OpenData. This data was collected in 2015 by NYC Parks employees and a team of volunteers. Additionally, I sourced data from two other datasets: Community Health Profiles from NYC Health, which is aggregated data by community district from the 2019-2020 Community Health Survey, and median household income by community district from the 2020 US Census. These data were collected by NYC DOHMH and the US Census Bureau, respectively. The three variables I used from these datasets were median household income from the US Census, a “health score” which represents the percentage of residents that reported their own health as “excellent,” “very good,” or “good,” and physical activity, which is the percentage of residents who have exercised in the past 30 days from the Community Health Survey.

A potential source of bias from these data would be from the Community Health Survey, which has a relatively small sample size, as the website cautioned against using these data to characterize the community district. Additionally, this data is self-reported, but self-perceived health is still a very valuable measure to assess general wellbeing.

A compromise I made was using aggregate data rather than the full dataset. This is due to the size of the Census and Community Health survey data. Because I considered this supplementary information to the main dataset (NYC Tree Census), I did not want to tackle the volume of data that would come with using all three large datasets. However, if I were to be primarily investigating topics such as economic or health conditions of New Yorkers by community district or looking at a smaller geographical region, I would opt to use the full datasets. Instead, I created by own dataset with community board number, health score, physical activity, and median household income. I then joined this dataset with the Tree Census dataset by community district in Tableau.

Visualizations

#1: What are the most popular tree species throughout the five boroughs? This visualization describes the top 10 most prevalent tree species in NYC by borough. I wanted to create this visualization first because it was this information that drew me to the dataset initially. I use the NYC Tree Census data regularly as I am walking through the city and see a tree that I’m curious about. Through creating this visualization, I can see that the highest volume of trees seems to be in Queens, with the London Planetree being the most prevalent tree species. I was also surprised to learn that a majority of NYC’s cherry trees can be found in Queens!

#2: Which NYC zipcodes have the most trees? Next, I further investigated the location of trees in NYC by creating a map that breaks down the number of trees by zipcode. I was curious to see which neighborhoods or boroughs had more trees and where they were lacking. Because trees are a key contributor to urban design, public health, and air quality, their location is very important. Through this map, I found that the most trees were located in Staten Island, which is mostly suburban, while the least amount of trees appeared to be in Manhattan, the most densely populated borough. In the other boroughs (The Bronx, Queens, Brooklyn), the trees tended to be located on the outskirts of the city, closer to the more suburban areas. This map was important to visualize the location of the trees spatially in the city. Creating this map also prompted further questions for me that I will explore in later visualizations, particularly if a neighborhood’s number of trees correlate with other aspects of life in the neighborhood, such as health and income.

#3: Is the number of neighborhood trees related to the health of neighborhood residents? As a next step in creating these visualizations, I explored the relationship between the number of the trees and health-related variables: the average “health score,” or the percentage of residents who self-reported their health as good or better, and “physical activity,” or the percentage of residents who self-reported that they have exercised in the past 30 days. I chose to visualize this data using a scatterplot to retain every data point, while also investigating whether or not there is a correlation between health and number of trees. Ultimately, using these variables, I did not find a correlation between health and number of trees. However, it did show an outlier in CB 503 in Staten Island, which had far more trees than any other community board in the city.

#4: Do wealthier neighborhoods have healthier trees? Lastly, I was interested in exploring the relationship between the wealth of a neighborhood (measured by median household income by community board) and health of neighborhood trees (measured by number of dead trees within the community board). My reasoning behind this question was that neighborhoods with higher income residents may have better maintenance on the trees in their neighborhood. However, I did not find a strong case for this hypothesis through this visualization. I grouped the five highest and lowest income community boards in this visualization, and though the poorer community boards generally did have more dead trees than the richer ones, they all fell in the middle of the distribution when it came to the number of dead trees. The number of dead trees is more likely a function of total trees, rather than proportion of total trees that have died and not been cleared. I chose a bar graph to represent these data so I could sort the community boards from lowest to highest number of dead trees to see where the highest and lowest income CBs fell.

Next steps

As usual as a beginner data visualizer, I see so many additional steps I could take with these data and these topics. Unfortunately, I did not find strong a strong relationship between number/health of trees and resident health/income levels. I would love to look at other measures of green space (e.g. number of neighborhood parks, community gardens), and explore their correlation to other economic (e.g. property value) or health measures (e.g. life expectancy, access to healthcare). A major issue I ran into was creating my map visualization. In other visualizations, I was investigating my data by borough or community board. However, I created my map visualization by zipcode data, which was inconsistent and not ideal. I found a community board shapefile on NYC Open Data, but had a lot of difficulty loading this into Tableau and using it in conjunction with my dataset. Instead, I used a preloaded shapefile (zipcode) to give a rough idea of where trees were located throughout the city. If I were to continue working to perfect this project, I would definitely work with this shapefile to get community board boundaries on my map.

Categories
Uncategorized

Project 2

My research question for this project was how I move around the city. Over the course of one week (March 28-April 3), I wanted to explore when, where, how long, how far, and how (mode) of my travel. I also collected data on outside factors that may influence my transportation choices, such as temperature and precipitation.

This question interests me because, as a New York City resident, I have a variety of transportation options available to me, such as trains, Citibikes, buses, and walking that are not available to people who live in most parts of the country, who are generally restricted to car travel. Because I have so many options, I was curious about exploring which modes of transportation I tend to gravitate towards, as well as how long and far I tend to travel. Living in a densely populated location such as Manhattan, I would hypothesize that I don’t have to travel very far to go to work, run errands, and see my friends in comparison to someone living in a less densely populated area.

The sole audience of this visualization is myself. I don’t think anyone else would be interested in knowing this information, but I am personally curious about learning more about my habits, so this was a fun exercise to complete.

How far do I travel each day?

This visualization shows the total distance I traveled each day, broken down by transportation type. I chose to use a stacked bar chart because it visualizes the data by day as well as by transportation type. This chart shows that I travel farthest on Tuesdays, Wednesdays, and Thursdays, which are days that I travel uptown for school and work. On other days of the week, I stay closer to home and prefer walking or biking to get around.

How long do I spend traveling each day?

This visualization shows the total time spent traveling each day, broken down by transportation type. Similar to the previous visualization, I chose to use a stacked bar chart because it visualizes the data by day as well as by transportation type. I chose to visualizes these variables because it acts as a contrast to the previous distance traveled graph. While the previous graph showed that I traveled the most distance by the train by a wide margin, I spent a larger proportional amount of time using slower modes of transportation like walking and biking. While I didn’t cover as much ground, this graph shows I spent a large amount of time biking and walking.

How far did I travel using each mode of transportation in total this week?

This visualization shows how far I traveled by each mode of transportation in total throughout the entire week of data collection. I used a bubble chart because I wanted a simple, overall view of the total distance traveled by each type of transportation. I covered the most ground using a train, which comes across in the scale of this bubble compared to the other transportation types. In contrast to the bar charts, the simplicity of this graph conveys that I chose trains, rather than walking, biking, or cars, for longer trips.

Next Steps

There are so many options to further explore this topic! I think the visualizations would tell a much more interesting story if I were able to collect data over a longer period of time (1) to have more data, and (2) to have data that spans across the many seasonal fluctuations New York experiences in a year. I would be very interested to visualize this data by season.

For this project, I collected data on temperature and precipitation to see if it would correlate with my transportation habits, but it ultimately did not, so I didn’t end up using these data in my visualizations. Again, if I had more data to work with, I think these variables could tell a more interesting story. For a larger scale project, I would also like to use my start and end points to explore my most visited locations throughout the city on a map. A variable I might add to tell this story would be the coordinates of my start and end points of my travels.

Lastly, I would be interested in comparing my travel habits to someone living in a vastly different area. For example, I’d love to compare my data to someone living in a very hot or cold climate, or a rural area.

Categories
Uncategorized

Project 1: NYC Rodent Population

The objective of this investigation was to get a better understanding of the rodent population throughout New York City: specifically, the location and time of year where they are most likely to be seen. I looked at 311 complaints that reported rodent sightings (mice and rats) in all five boroughs from 2018-2023 in order to identify any potential change over time, and to have a sizable data set of five complete years. This topic was of interest to me as a resident of NYC as well as someone who works in the public health field. Rodents can be a nuisance to residents and business owners, as well as a serious public health concern.

The audience of my visualizations could be anyone who is interested in the rodent population of NYC. This could be people living in, or looking to move to/within the City and wanting to live in an area with a low or declining rodent population. The Department of Sanitation would also be an appropriate audience for these visualizations if they are interested in the outcome of their mitigation efforts, or where they should focus their efforts moving forward.

Where are rodents most likely to be seen in the NYC boroughs?

In this visualization, I found that Brooklyn consistently had the highest number of rodent sightings between the years 2018 and 2023, while Staten Island had by far the lowest. This visualization also shows that other than a dip in 2020 (possibly due to people being outside less due to COVID-19, as well as changes to restaurant operations), rodent sightings are on the rise in Brooklyn and Queens, while they are declining in Manhattan, the Bronx, and Staten Island.

I chose to visualize this data using a line chart because I wanted to demonstrate change over time, a continuous variable. I broke the data down by borough, because I was interested to see any differences between them.

Are people seeing more mice or rats throughout the year?

This graph shows that rat sightings are significantly higher than mouse sightings, and that mouse sightings are consistent throughout the year, while rat sightings peak in the warmer months. This graph shows that rats, not mice, are the major concern for NYC residents, and that they are an issue particularly during the summer.

I chose a line graph to visualize these data because I wanted to show change over time. This time, however, I looked at the average change within a year, rather than change from year to year. By visualizing the data this way, I could identify seasonal changes as well as average out any outlying years, such as 2020.

Which community boards have the highest frequency of rodent sightings?

Through this visualization, I found that the community boards with the highest rodent sightings tended to be located in Manhattan and Brooklyn, while the lowest rodent sightings were primarily in Queens. This was interesting to me, because it contrasted what I found in the borough-level visualization, which showed that Staten Island had by far the lowest number of rodent sightings. This can be explained by the low population density of Staten Island, as it only has three community boards, far less than the other four boroughs.

I chose a bar graph to visualize these data because I wanted to visually break down the data by borough and rank the community boards from highest to lowest. Ideally, a map would’ve been a great way to visualize these data. The boroughs would have been distinct because they would’ve been shown geographically on a map, and it would be easier to identify hot spots if there were a cluster of community boards with a large number of rodent sightings.

How does my community board compare to the surrounding areas when it comes to rodent sightings?

Finally, I wanted to look into a personal aspect of this data by visualizing the rodent sightings in my community board (Manhattan 03) in comparison to the bordering community boards. Through this visualization, I learned two things, 1) the frequency of rodent sightings throughout the year by month, and 2) how the rodent sightings in my own community board compare to the surrounding areas. Through creating this visualization, I found that my community board had consistently higher rodent sightings throughout the year when compared to bordering community boards.

Chart type was a limitation in creating this visualization. Ideally, these data would have been expressed as a map, so the audience could get a better idea of where these community boards are located within the city and relation to one another. As a next step, I would also like to further break down the rodent sightings within my own neighborhood by street intersection. It would be interesting to see exactly where the rodent sightings are located, and what may be drawing them there (parks, restaurants, etc).