In this work we introduce a fast big data approach for road incident hot spot identification using Apache Spark. We implement an existing immuno-inspired mechanism, namely SeleSup, as a series of MapReduce-like operations. SeleSup is composed of a number of iterations that remove data redundancies and result in the detection of areas of high likelihood of vehicles incidents. It has been successfully applied to large datasets, however, as the size of the data increases to millions of instances, its performance drops significantly. Our objective therefore is to re-conceptualise the method for big data. In this paper we present the new implementation, the challenges faced when converting the method for the Apache Spark platform as well as the outcomes obtained. For our experiments we employ a large dataset containing hundreds of thousands of Heavy Good Vehicles incidents, collected via telematics. Results show a significant improvement in performance with no detriment to the accuracy of the method.
Triguero, I., Figueredo, G. P., Mesgarpour, M., Garibaldi, J. M., & John, R. (2017). Vehicle incident hot spots identification: an approach for big data. https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.329