...
首页> 外文期刊>Expert Systems with Application >Bag of meta-words: A novel method to represent document for the sentiment classification
【24h】

Bag of meta-words: A novel method to represent document for the sentiment classification

机译:一袋元词:一种用于情感分类的代表文档的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

It is crucial to represent the semantic information of a document in sentiment classification. Various semantic information representation models have been proposed, however existing approaches have their setbacks. Notable weaknesses among these are: (1) tradition VSM methods, completely ignore the semantic information; (2) averaging word embedding methods, cannot depict the synthetical semantic meaning of the given document; (3) neural network methods, require complex structure and are notoriously difficult to be trained. To overcome these limitations, we introduce a simple but novel method which we call bag of meta-words (BoMW). In our method, the semantic information of the document is indicated by a meta-words vector in which every single meta-word element denotes particular semantic information. Especially, these meta-words are extracted from pre-trained word embeddings through two different but complemental models, naive interval meta-words (NIM) and feature combination meta-words (FCM). In general, our new model BoMW is as simple as traditional VSM model but it can capture the synthetical semantic meanings of the document. Numerous experiments on two benchmarks (IMDB dataset and Pang's dataset) are carried out to verify the effectiveness of the proposed method, and the results show that the performance of our method can exceed the traditional VSM methods and methods using pre- trained word embedding. (C) 2018 Elsevier Ltd. All rights reserved.
机译:在情感分类中表示文档的语义信息至关重要。已经提出了各种语义信息表示模型,但是现有方法有其挫折。这些中的显着弱点是:(1)传统的VSM方法,完全忽略了语义信息; (2)平均单词嵌入方法,无法描述给定文档的综合语义; (3)神经网络方法,需要复杂的结构并且众所周知难以训练。为了克服这些限制,我们介绍了一种简单但新颖的方法,我们称其为元词袋(BoMW)。在我们的方法中,文档的语义信息由一个元词向量表示,其中每个单个元词元素都表示特定的语义信息。尤其是,这些元词是通过两个不同但互补的模型(朴素区间元词(NIM)和特征组合元词(FCM))从预训练的词嵌入中提取的。通常,我们的新模型BoMW与传统的VSM模型一样简单,但是它可以捕获文档的综合语义。进行了两个基准测试(IMDB数据集和Pang数据集),以验证该方法的有效性,结果表明,该方法的性能可以超过传统的VSM方法和使用预训练词嵌入的方法。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号