University of Cambridge > Talks.cam > Computer Laboratory Security Seminar > Large Scale Ubiquitous Data Sources for Crime Prediction

Large Scale Ubiquitous Data Sources for Crime Prediction

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Alexander Vetterl.

In this talk, I will present two approaches to geographical crime profiling that leverage machine learning techniques and large scale ubiquitous data sources. I will briefly touch on their motivation in criminology and urban studies, as well as on their challenges and limitations.

The first work mines large-scale human mobility data to craft an extensive set of features for yearly crime counts prediction in New York City. Traditional crime models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying only on census data. The proposed ensemble machine learning models achieve absolute R2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area’s crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy.

The second work investigates a forecasting approach for daily burglary risk within a region of Switzerland characterized by significantly lower levels of urbanization compared to the areas analyzed in prevailing crime prediction research. The lower levels of urbanization in combination with high spatial and temporal granularity pose a significant challenge to building accurate prediction models necessary to derive feasible and effective preventive actions, e.g. in form of police patrols. We employ machine learning methods, which allow for integration of diverse fine-grained data on the demographic, geographic, economic, temporal, and meteorological characteristics of the environment, next to past burglary events. We propose an approach which addresses the sparsity of the data and significantly outperforms the baseline implementation of a prospective hotspot model, which only makes use of historical crime data and is an industry standard. For instance, by setting the coverage of the predicted areas to 5% of the total studied area, the model is able to predict the committed burglaries on a specific day within a four-hectare rectangular area with an average hit ratio of 57% compared to the 36% hit ratio of the baseline. This research has direct implications for decision makers in charge of resource allocation for crime prevention.

Bio: Cristina is a PhD candidate at the Department of Management, Technology, and Economics (D-MTEC) of the Swiss Federal Institute of Technology in Zurich (ETH Zurich). She holds a M.Sc. with Honors in Software Engineering from Technical University of Munich and a B.Sc. in Computer Science from Leibniz University of Hanover. Her research interests evolve around information systems, computational social science, and applied machine learning, with a focus on crime and fear of crime.

This talk is part of the Computer Laboratory Security Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2018 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity