...
首页> 外文期刊>International Journal of Innovative Computing Information and Control >A HYBRID FILTER-WRAPPER FEATURE SELECTION APPROACH FOR AUTHORSHIP ATTRIBUTION
【24h】

A HYBRID FILTER-WRAPPER FEATURE SELECTION APPROACH FOR AUTHORSHIP ATTRIBUTION

机译:Autheration归因的混合滤波器包装器特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

Many criminals make use of the convenience of anonymity in the cyber world to conduct inappropriate or illegal activities. Authorship attribution aims to identify the most likely author from potential suspects for evidence collection and forensic investigation. Authorship attribution is typically achieved by employing classification algorithms to identify the author based on various writing-style features. However, not all features are useful (relevant) and irrelevant or redundant features may even deteriorate the classification accuracy and slow down the processing time. Feature selection as important data processing techniques can solve this problem, but they have not been investigated in authorship attribution. In this paper, we propose a novel hybrid filter-wrapper feature selection approach to authorship attribution tasks, where a rich set of writing-style features, including syntactic features, lexical features, and structural features, is extracted in order to include all available useful information. In the proposed approach, a correlation based filter feature selection method is used to filter out irrelevant features and then a particle swarm optimization based wrapper method is proposed for feature selection to further remove redundant features, select only relevant features. Experiments on real-life Blog and E-mail datasets show that the proposed approach can improve the classification performance by selecting only a small subset of features, and achieve better classification performance than filter only, wrapper only, and a commonly used wrapper method (linear forward selection).
机译:许多罪犯利用网络世界匿名的便利,以行为不适当或非法活动。作者归属旨在识别潜在嫌疑人的最可能作者,以获得证据收集和法医调查。 Autheration归属通常通过使用分类算法来识别作者根据各种写作风格特征来识别作者。但是,并非所有功能都有用(相关),并且无关或冗余功能甚至可能会降低分类精度并减慢处理时间。特征选择作为重要的数据处理技术可以解决这个问题,但他们尚未在作者归属中进行调查。在本文中,我们提出了一种新的混合滤波器包装器特征选择方法,用于作者归属任务,其中提取了丰富的写作风格特征,包括句法特征,词汇特征和结构特征,以便包括所有可用的可用信息。在所提出的方法中,基于相关的滤波器特征选择方法来滤除无关的功能,然后提出了一种用于特征选择的粒子群优化的包装方法,以进一步删除冗余特征,仅选择相关的功能。实验博客和电子邮件数据集的实验表明,该方法可以通过仅选择一个小的功能,并仅实现比仅限滤波器,包装器的更好的分类性能以及常用的包装方法来提高分类性能。(线性转发选择)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号