Driving Pattern Risk Score: how do we build it?

6 min readMay 5, 2020

Road fatalities are still going to increase in the coming decade

The number of car registrations is around 800 million in the year 2011 and it has burgeoned to around one billion in the year 2018. It is estimated that this number might cross around 2 billion by the year 2050. With more vehicles congesting the roads, the probabilities of accidents are also increasing proportionately. For example, road traffic fatalities per 1,00,000 population are 10.3 for Europe, 18.5 for South East Asia, and 24.1 for Africa.

Factors of fatalities

The causal reasons for accidents are classified into three categories:

bad weather or bad infrastructure (rain, potholes on the road),
vehicle malfunctioning (manufacturing defects or wear and tear) or
human factors (physiological or behavioural).

While the physiological mistakes are happening due to driver fatigue, behavioural mistakes could take many forms such as distracted driving, aggressive driving, road rage, hard acceleration, hard braking, and cornering and speeding. These driving factors are a priori behaviours that are potentially leading to fatal or non-fatal road accidents revealed in numerous research articles and sociological roads fatalities researches.

Having in mind that we could identify and predict human factors, environment conditions and vehicle driving manner we are building a machine learning-based driving pattern risk engine that identifies the riskiness of the driver that is driving a vehicle in the given moment and we aim to predict that driver possibility of making an accident in the near future of his drive.

Features For The Risk Engine

Multiple qualitative and quantitative research claims that the following vehicle dynamic parameters are the most significant helping to identify the riskiness of the driving: Speed, Acceleration, Braking, Cornering. Following this we breakdown the parameters to the following:

To acquire the following future-to-be features we use smartphone hardware elements and sensors: accelerometer, gyroscope, and magnetometer check our setup here: Driving Pattern Analytics in the Smartphone Age https://medium.com/@stemicbrain/driving-analytics-in-smartphone-age-143fdcc682ab.

Immediately after a driver starts a journey, the server receives the various attributes of the behaviour data at regular intervals. The data from all the drivers at any given moment is so large that none of the traditional data management tools are able to store or process it efficiently. We empower Big data technology that makes it possible to handle this data deluge comprising huge volumes, high velocity, and veracity.

The other element that adds significant value to the risk score definition is the volume of distracted driving. Here we distinguish it to the following phases:

And the last set of parameters we dedicate to the environment and contextual data which is extracted from weather conditions, time of the day, road type (country road, city road) as freeways and arterials have significantly different traffic flow characteristics and drivers may behaviour differently in terms of car-following and lane-changing behaviour or being influenced by the driving context (traffic, social movements around).

Research in Machine Learning for Driving Risk Scoring

We have been investigating numerous scientific and business researches to identify the different ML models and features applied. Here are some the most important researches to mention:

Yu et al. propose a system called “Fine-grained abnormal driving behavior detection and identification system, D3” to detect real-time high-accurate abnormal driving behaviour. SVM and Neural Network algorithms are used to detect the abnormality. D3 achieved an average total accuracy of 95.36 percent with SVM classifier model, and 96.88 percent with the NN classifier model.
Shi et al. model consider only the speed parameter. K-means clustering and neural network algorithms are used.
Liu et al. have designed a system called “Deep Sparse Auto Encoder (DSAE)” which extracts the hidden features for visualization using a driving behavior visualization method called a driving color map that maps the extracted 3-D hidden feature to the red, green and blue color space. The parameters used are hard braking and hard cornering. Deep learning algorithms are used.
Daptardar et al. have experimented on a new technique by using Hidden Markov Model (HMM). This is to detect lateral maneuvers and Jerk Energy-based technique to detect longitudinal maneuvers. The parameters used are hard acceleration and hard braking. The accuracy of the system is 95%.
Zhao et al. model assessment is based upon the time of usage, distance driven, and driving behaviour.
Tselentis et al. focus on harsh braking and acceleration events taking place and degree of exposure of annual mileage and the time of day traveling.
Hu et al. have developed a model by using a locally designed neural network and the real-world Vehicle Test Data (VTD). The parameters used are speed, hard brake, and hard acceleration. Lack of real-time driving data is considered to be the drawbacks of VTD based system.
Zhou et al. have identified the aggressive/risky driving behavior patterns on horizontal curves using real field Basic Safety Messages (BSM) data. The parameters used are hard acceleration and hard braking. Private Usage-Based Scoring (Pri-UBS) algorithm and Probabilistic Usage Data Audition (Pro-UDA) protocol are used to identify the abnormality. The authors well stated that many environmental factors such as real-time traffic and traffic regulations could influence the driving speed.
Nai et al. are using the Fuzzy Risk Mode and Effect Analysis (FRMEA) method. Risk modes used are jerking low speed, always speed changing, and jerking high speed. The parameters used are speed, hard acceleration, hard braking.

Following the research and the data collected we are identifying the best setup that fits our business needs and context. We will keep posted on our progress of research results in the following blog posts.

Conclusion

Telematics data from the car driving performance is very important for user riskiness identification and here we put all our effort to collect and identify it correctly. Nevertheless, major driving fatalities indicate that driver behaviour and context are the major factors of driving riskiness.

Taking this into consideration we build our risk score in a holistic manner: based on vehicle driving parameters, distracted user behaviour while driving, weather conditions in the area and driving context such as traffic volume surrounding our target, and social activities (i.e. riots and manifestations or concerts or other massive gatherings) around the target.

Currently, we are researching and tuning all these internal and external parameters into the locale where the users are driving. We believe that the localization of the model will benefit as the context where fleets are based will matter a lot.

About Stemic Drive

Stemic Drive is a next-generation usage-based insurance (UBI) technology platform. We manage the full life cycle of data. From acquisition to transformation to analytics to insights for predictive insurance dynamic pricing. We are providing a risk score for the driver and fleet. We are targeting insurance companies that want to leap in UBI for their clients. We provide all the toolset and platform to run UBI business smoothly. Visit www.stemicdrive.com