首页> 外文会议>2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing >Preprocessing of radicalism dataset to predict radical content in Indonesia
【24h】

Preprocessing of radicalism dataset to predict radical content in Indonesia

机译:预处理激进主义数据集以预测印度尼西亚的激进分子含量

获取原文
获取原文并翻译 | 示例

摘要

A radical definition according to procedural meanings is content that invites, provokes, performs certain acts, interprets jihad as a suicide bomb. And interpret the jihad is limited. In Indonesia, the radical content is often associated with content issues such Tribe, Religion, and Race. The classification of radical content is a challenging technical problem due to its large numbers, unstructured, and a lot of noise. The larger the amount of content it will produce more and more features. So that impact on the high dimensions and can lead to poor performance against the classification algorithm. How to solve the problem is dimensional reduction such as feature selection. In this study, we propose an approach to select features that are categorized radically and not radically using Human Brain and DF-Threshold. Prior to feature selection, preprocessing is performed, then text mining, then selection of features using Human Brain and DF-Threshold. Testing is done through 10-cross validation with k-Nearest Neighbor (k-NN) as its classification. Based on these trials we get the highest accuracy performance results of 66.37% with k on k-NN equal to 7.
机译:根据程序含义的激进定义是引诱,挑衅,执行某些行为,将圣战解释为自杀炸弹的内容。并解释圣战是有限的。在印度尼西亚,激进内容通常与部落,宗教和种族等内容相关。自由基含量的分类由于其数量大,结构混乱和噪音大,因此是一个具有挑战性的技术问题。内容量越大,它将产生越来越多的功能。这样会对高尺寸产生影响,并可能导致分类算法性能不佳。如何解决该问题是诸如特征选择之类的降维。在这项研究中,我们提出了一种使用人脑和DF阈值选择从根本上而不是从根本上分类的特征的方法。在特征选择之前,先进行预处理,然后进行文本挖掘,然后使用人脑和DF阈值选择特征。通过使用k最近邻(k-NN)作为分类的10交叉验证来完成测试。根据这些试验,在k-NN上的k等于7时,我们获得了66.37%的最高精度性能结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号