首页> 外文期刊>Computer Science & Information Technology >Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach
【24h】

Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach

机译:基于SensorAnd路数据的交通崩溃严重性分类中的类别不平衡:特征选择和数据增强方法

获取原文
           

摘要

This paper presents machine learning-based approaches to classification of historical trafficcrashes in Kansas by severity, applied to a data set consisting of highway geometry, weather,and road sensor data. The goal of this work is to identify relevant features using a variety ofloss measures and algorithms for feature selection. This is shown to facilitate the discovery ofthe most relevant sensors for the task of learning to predict severe crashes (those involvingbodily injury). The key technical challenges are to cope with class imbalance (as a 75%majority of crashes are non-severe) and a highly correlated and redundant set of features frommultiple coalesced sources. The major novel contributions of this work are the development of arandom oversampling strategy for data augmentation, combined with the systematic applicationof multiple feature selection measures over a range of supervised inductive learning models andalgorithms. Positive results from this approach, on a data set of 277 initial ground features and20,000 vehicle crashes collected over 9 years (2007 – 2015) by the Kansas Department ofTransportation (KDOT), included models trained using 30 features (out of 277) that achievecross-validation precision and recall comparable to those obtained using the full set of features.These and other results point towards potential use of feature selection findings and theresultant models in planning future road construction.
机译:本文提出了基于机器学习的历史交通分类方法,通过严重程度对堪萨斯州的历史交流进行分类,应用于由公路几何,天气和道路传感器数据组成的数据集。这项工作的目标是使用各种尺寸测量和算法来识别相关特征,以进行特征选择。这被证明是为了促进最相关的传感器的发现,以便学习预测严重崩溃(那些涉及伤害的人)。关键的技术挑战是应对课程失衡(由于75%的大多数崩溃是非严重的),并且具有高度相关和多余的一组特征,包括多种合并的来源。这项工作的主要新颖贡献是arandom超采样策略的发展,用于数据增强,结合多种特征选择措施的系统应用,在一系列监督的感应学习模型中Andalgorithms。这种方法的积极结果,在277个初始地面特征和200,000辆汽车崩溃超过9年(2007年至2015)的数据集上(2007年至2015),由堪萨斯部门(KDOT)收集,包括使用30个功能培训的模型(277个) ACHIEGCROSTS验证精度和召回与使用全套特征获得的精度和召回。这些结果与其他结果指向规划未来道路建设中的特征选择结果和本文模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号