首页> 外文期刊>Journal of Data and Information Science >Effective Opinion Spam Detection: A Study on Review Metadata Versus Content
【24h】

Effective Opinion Spam Detection: A Study on Review Metadata Versus Content

机译:有效意见垃圾邮件检测:关于审查元数据与内容的研究

获取原文
           

摘要

Purpose This paper aims to analyze the effectiveness of two major types of features—metadata-based (behavioral) and content-based (textual)—in opinion spam detection. Design/methodology/approach Based on spam-detection perspectives, our approach works in three settings: review-centric (spam detection), reviewer-centric (spammer detection) and product-centric (spam-targeted product detection). Besides this, to negate any kind of classifier-bias, we employ four classifiers to get a better and unbiased reflection of the obtained results. In addition, we have proposed a new set of features which are compared against some well-known related works. The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection. Findings Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings. In addition, models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual, further establishing the superiority of behavioral features as dominating indicators of opinion spam. The features used in this work provide improvement over existing features utilized in other related works. Furthermore, the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual. Research limitations The analyses conducted in this paper are solely limited to two well-known datasets, viz., YelpZip and YelpNYC of Yelp.com. Practical implications The results obtained in this paper can be used to improve the detection of opinion spam, wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information. Originality/value To the best of our knowledge, this study is the first of its kind which considers three perspectives (review, reviewer and product-centric) and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features. This study also introduces some novel features, which help to improve the performance of opinion spam detection methods.
机译:目的本文旨在分析两种主要类型的特征 - 基于元数据(行为)和基于内容的(文本) - 在意见垃圾邮件检测的有效性。基于垃圾邮件检测视角的设计/方法/方法,我们的方法在三种设置中工作:以审查为中心(垃圾邮件检测),以审查员为中心(垃圾邮件发送)和以产品为中心(垃圾邮件靶向产品检测)。除此之外,要否定任何类型的分类器 - 偏见,我们雇用了四个分类器来获得更好,并且对所获得的结果的反映更好。此外,我们提出了一组新的特征,这些特征与一些知名的相关作品进行比较。在两个现实世界数据集上进行的实验表明了意见垃圾邮件检测中不同特征的有效性。调查结果我们的调查结果表明,行为特征比文本方式更有效,并且有效地检测到所有三种设置的意见垃圾邮件。此外,在混合特征上培训的模型产生的结果与对行为特征的培训相似,而不是文本,进一步建立了行为特征的优越性,作为意见垃圾邮件的主导指标。本工作中使用的功能提供了对其他相关工程中使用的现有功能的改进。此外,特征提取阶段的计算时间分析显示了文本上的行为特征的更好成本效率。研究限制本文进行的分析仅限于两个众所周知的数据集,yelpzip和yelp.com的yelpzip和yelpnyc。实际意义本文中获得的结果可用于改善意见垃圾邮件的检测,其中研究人员可以在改进和开发特征工程和选择技术上致力于更多地参考元数据信息。本研究的原创性/价值是我们所知的最佳,这是首先考虑三个观点(审查,审阅者和以产品为中心)和四个分类器,可以使用两种主要特征分析意见垃圾邮件检测的有效性。本研究还介绍了一些新颖功能,有助于提高意见垃圾邮件检测方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号