Application of Machine Learning in Accident Data Analysis: A Case Study Using Self-report Questionnaire

AuthorTahereh Manouchehrien
AuthorReza Fereidoonien
AuthorSeyyed Taghi Heydarien
AuthorKamran Bagheri Lankaranien
OrcidSeyyed Taghi Heydari [0000-0001-7711-1137]en
OrcidKamran Bagheri Lankarani [0000-0002-7524-9017]en
Issued Date2025-06-30en
AbstractBackground: Traffic accidents remain a critical global public health issue, resulting in numerous fatalities and injuries annually. Objectives: This study aims to explore the application of machine learning (ML) in analyzing traffic accident data obtained from self-report questionnaires to identify factors influencing the incidence and severity of accidents. Methods: The study design is cross-sectional. In this study, approximately 660 participants completed the questionnaire, of which 43 were incomplete or invalid and were excluded. The remaining 617 participants answered all questions in full. Participants were selected using a convenience sampling method from five districts in Shiraz to ensure diversity, including outreach to taxi and heavy vehicle terminals. Data were collected through face-to-face questionnaires administered by trained researchers, and all responses were self-reported. The dataset collected from 617 participants includes information on demographics, vehicle and road features, personality traits, driving habits, and risky driving behavior. The questionnaire incorporated multiple validated instruments capturing driving behavior, demographics (such as age, gender, marital status, education, income), and habits (e.g., driving duration, cellphone use, fatigue, and substance use). Various ML algorithms, such as random forest and SHapley Additive exPlanations (SHAP) analysis, were employed to identify factors influencing both the occurrence and severity of accidents. Furthermore, the C5.0 algorithm was utilized to extract specific patterns, while prediction tasks were addressed using a combination of random forest, support vector machine (SVM), logistic regression, and Naive Bayes algorithms. Results: The random forest algorithm highlighted that factors such as income, driving time, working time, age, duration of non-stop driving, type of law enforcement, openness, normlessness, sensation seeking, and vehicle safety significantly influence the occurrence of accidents. For accident severity, important predictors included driving time, non-stop driving, working time, age, aggressive violations, income, road quality, type of law enforcement, driving while tired, vehicle safety, foreign car status, and vehicle comfort. Additionally, the C5.0 algorithm revealed specific patterns—such as high normlessness and extended driving hours—increasing the likelihood of accidents, while factors like low normlessness and balanced income served as protective elements. Conclusions: The findings highlight the impact of lifestyle and work-related factors, as well as certain personality traits of drivers, on the incidence and severity of accidents. While the results of the study should not be taken verbatim due to the reliance on self-reported data, the study supports the application of ML in the analysis of accident data. It also advocates for the use of strategies including social and economic interventions, psychological assessments, enhanced road safety education, and customized regulatory measures based on individual risk assessments to effectively prevent traffic accidents.en
DOIhttps://doi.org/10.5812/semj-158678en
KeywordTraffic Accidentsen
KeywordPredictive Analyticsen
KeywordMachine Learningen
KeywordFeature Selectionen
PublisherBrieflandsen
TitleApplication of Machine Learning in Accident Data Analysis: A Case Study Using Self-report Questionnaireen
TypeResearch Articleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
semj-26-6-158678-publish-pdf.pdf
Size:
553.59 KB
Format:
Adobe Portable Document Format
Description:
Article/s PDF