Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach

Deepti Lamba; Gregory Newmark

首页> 外文期刊>Computer Science & Information Technology >Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach

【24h】

Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach

机译：基于SensorAnd路数据的交通崩溃严重性分类中的类别不平衡：特征选择和数据增强方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents machine learning-based approaches to classification of historical trafficcrashes in Kansas by severity, applied to a data set consisting of highway geometry, weather,and road sensor data. The goal of this work is to identify relevant features using a variety ofloss measures and algorithms for feature selection. This is shown to facilitate the discovery ofthe most relevant sensors for the task of learning to predict severe crashes (those involvingbodily injury). The key technical challenges are to cope with class imbalance (as a 75%majority of crashes are non-severe) and a highly correlated and redundant set of features frommultiple coalesced sources. The major novel contributions of this work are the development of arandom oversampling strategy for data augmentation, combined with the systematic applicationof multiple feature selection measures over a range of supervised inductive learning models andalgorithms. Positive results from this approach, on a data set of 277 initial ground features and20,000 vehicle crashes collected over 9 years (2007 – 2015) by the Kansas Department ofTransportation (KDOT), included models trained using 30 features (out of 277) that achievecross-validation precision and recall comparable to those obtained using the full set of features.These and other results point towards potential use of feature selection findings and theresultant models in planning future road construction.

机译：本文提出了基于机器学习的历史交通分类方法，通过严重程度对堪萨斯州的历史交流进行分类，应用于由公路几何，天气和道路传感器数据组成的数据集。这项工作的目标是使用各种尺寸测量和算法来识别相关特征，以进行特征选择。这被证明是为了促进最相关的传感器的发现，以便学习预测严重崩溃（那些涉及伤害的人）。关键的技术挑战是应对课程失衡（由于75％的大多数崩溃是非严重的），并且具有高度相关和多余的一组特征，包括多种合并的来源。这项工作的主要新颖贡献是arandom超采样策略的发展，用于数据增强，结合多种特征选择措施的系统应用，在一系列监督的感应学习模型中Andalgorithms。这种方法的积极结果，在277个初始地面特征和200,000辆汽车崩溃超过9年（2007年至2015）的数据集上（2007年至2015），由堪萨斯部门（KDOT）收集，包括使用30个功能培训的模型（277个） ACHIEGCROSTS验证精度和召回与使用全套特征获得的精度和召回。这些结果与其他结果指向规划未来道路建设中的特征选择结果和本文模型。

著录项

来源
《Computer Science & Information Technology》 |2019年第6期|共13页
作者
Deepti Lamba; Gregory Newmark;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Machine LearningClass ImbalancePredictive AnalyticsFeature SelectionDataAugmentationTraffic Engineering;

机译：机器学习Class imbalancePredictive分析方法选择DataAugmentationAttication工程;

相似文献

外文文献
中文文献
专利

1. A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion (vol 168, pg 365, 2015) [J] . Liu Zhen, Wang Ruoyu, Tao Ming, Neurocomputing . 2016,第JANa1期

机译：基于局部和全局指标融合的多类不平衡网络流量数据集的面向类特征选择方法（第168卷，第365页，2015年）
2. A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion [J] . Liu Zhen, Wang Ruoyu, Tao Ming, Neurocomputing . 2015,第nova30期

机译：基于局部和全局度量融合的多类不平衡网络流量数据集的面向类特征选择方法
3. Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data [J] . Jeong Heejin, Jang Youngchan, Bowman Patrick J., Accident Analysis & Prevention . 2018,第NOVa期

机译：机动车碰撞伤害严重程度的分类：不平衡数据的混合方法
4. Feature selection and Ensemble Hierarchical Cluster-based Under-sampling approach for extremely imbalanced datasets: Application to gene classification [C] . Soltani Sima, Sadri Javad, Torshizi Hassan Ahmadi International eConference on Computer and Knowledge Engineering;ICCKE . 2011

机译：极不平衡数据集的特征选择和基于集合层次聚类的欠采样方法：在基因分类中的应用
5. Data-driven Bayesian method-based traffic crash driver injury severity formulation, analysis, and inference [D] . Chen, Cong. 2015

机译：基于数据驱动贝叶斯方法的交通事故驾驶员伤害严重性表述，分析和推断
6. Sparse Proteomics Analysis – a compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data [O] . Tim O. F. Conrad, Martin Genzel, Nada Cvetkovic, 2017

机译：稀疏蛋白质组学分析–一种基于压缩感知的方法用于高维蛋白质组学质谱数据的特征选择和分类
7. A Genetic Algorithm for Feature Selection and Granularity Learning in Fuzzy Rule-Based Classification Systems for Highly Imbalanced Data-Sets [O] . Pedro Villar, Francisco Herrera 2014

机译：基于模糊规则的高度不平衡数据集分类系统的特征选择和粒度学习遗传算法

Coping with Class Imbalance in Classification of Traffic Crash Severity Based On Sensorand Road Data: A Feature Selection And Data Augmentation Approach

摘要

著录项

相似文献

相关主题

期刊订阅