Motivation
After reading one of my regular blogs, Ben Wellington’s IQuantNY post, I was intrigued to find that there is a spike in the number of robberies that occur between 3 and 4PM. Does this correlate to the time the schools in NYC usually end, if so, we expect the spike in the robberies to occur on weekdays, and not weekends. If we refer to the blog post, it does indeed.
I decided to explore this for myself, though I wanted to see whether the robberies occured near schools. My hypothesis is that robberies which occur after school (between 3 and 4PM), will happen close to schools. I gather the NYC felony data from the NYC Open data website here, and the NYC school data here.
If there are a large number of robberies occuring close to schools then this add weight to our hypothesis. I will go about trying to confirm my hypothesis by grouping the robberies using k-means clustering and seeing if many of the clusters overlap, or are close to schools. I make the assumption that school children will "diffuse" evenly from their school in all and equal directions, such that their mean location will converge at the school.
I will perform the analysis in R, opposed to python since the mapping function is quick and easy in R, using the ggmaps library. In the end, I will transfer my dataset to carto as the maps produced are pretty, and the tooltips function are interactive, intuitive and useful for displaying information to users.
Following, I will compare the distances from robberies to the closest schools with a uniform random distribution of locations across New York City, if there is any statistical significance between the two, we can say that the location that robberies occur is dependent on their distance to the closest school.
The R source code can be found on my github here, and a link to the carto profile is here.