首页> 外文期刊>International Journal of Research in Marketing >Comparing automated text classification methods
【24h】

Comparing automated text classification methods

机译:比较自动文本分类方法

获取原文
获取原文并翻译 | 示例
           

摘要

Online social media drive the growth of unstructured text data. Many marketing applications require structuring this data at scales non-accessible to human coding, e.g., to detect communication shifts in sentiment or other researcher-defined content categories. Several methods have been proposed to automatically classify unstructured text. This paper compares the performance of ten such approaches (five lexicon-based, five machine learning algorithms) across 41 social media datasets covering major social media platforms, various sample sizes, and languages. So far, marketing research relies predominantly on support vector machines (SVM) and Linguistic Inquiry and Word Count (LIWC). Across all tasks we study, either random forest (RF) or naive Bayes (NB) performs best in terms of correctly uncovering human intuition. In particular, RF exhibits consistently high performance for three-class sentiment, NB for small samples sizes. SVM never outperform the remaining methods. All lexicon-based approaches, LIWC in particular, perform poorly compared with machine learning. In some applications, accuracies only slightly exceed chance. Since additional considerations of text classification choice are also in favor of NB and RF, our results suggest that marketing research can benefit from considering these alternatives. (C) 2018 Elsevier B.V. All rights reserved.
机译:在线社交媒体推动了非结构化文本数据的增长。许多营销应用程序要求以人类编码不可访问的规模来构造此数据,例如,以检测情绪或其他研究人员定义的内容类别中的交流变化。已经提出了几种方法来自动分类非结构化文本。本文在涵盖主要社交媒体平台,各种样本规模和语言的41个社交媒体数据集上比较了十种此类方法(五种基于词典,五种机器学习算法)的性能。到目前为止,市场营销研究主要依赖于支持向量机(SVM)和语言查询和字数统计(LIWC)。在我们研究的所有任务中,随机森林(RF)或朴素贝叶斯(NB)在正确揭示人类直觉方面表现最佳。特别是,RF在三类情感方面表现出始终如一的高性能,而在小样本量情况下则表现出NB。 SVM永远不会胜过其余方法。与机器学习相比,所有基于词典的方法(尤其是LIWC)的性能均较差。在某些应用中,精度仅略高于机会。由于文本分类选择的其他考虑因素也有利于NB和RF,因此我们的结果表明,营销研究可以从考虑这些替代方法中受益。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号