I hypothesize that the number of traffic-related injuries may be determined by features of the street on which they occur. One example is the number of lanes the road has.
I decided to explore this relationship directly by joining two datasets, the NYPD motor collision dataset to obtain the number of persons injured on a given street, and extracting the pertinent information from the New York City LION street shapefile, which contains properties of the streets, including the number of lanes. From these two datasets I was able to obtain the average number of injuries on streets with a given number of lanes, as shown in the plot below.
The data is with with a second order polynomial which describes the data very well, with a p-value of 0.000104, and an adjusted R-squared value of 0.9064. Included is 95% confidence intervals of the fit
Interestingly, we see that streets that have between 2 and 6 lanes have the least number of injuries, and the relationship is not simply linear between the number of lanes and the mean number of injuries. From this figure we can conclude that while a large number of injuries occur on streets with many lanes, which is not entirely unsurprising, a large number of injuries occur on streets that have just one lane, which is surprising.
The LION shape file contains a number of other features of streets in New York City that can be explored, such as the number of parking lane. One of the goals I hope to achieve is to generate data-driven suggestions, such as what the consequence of turning a driving lane to a parking lane would be to the total number of injuries, or the consequence to the average traffic in the area.