Bag of meta-words: A novel method to represent document for the sentiment classification

Fu Mingsheng; Qu Hong; Huang Li; Lu Li

首页> 外文期刊>Expert Systems with Application >Bag of meta-words: A novel method to represent document for the sentiment classification

【24h】

Bag of meta-words: A novel method to represent document for the sentiment classification

机译：一袋元词：一种用于情感分类的代表文档的新方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is crucial to represent the semantic information of a document in sentiment classification. Various semantic information representation models have been proposed, however existing approaches have their setbacks. Notable weaknesses among these are: (1) tradition VSM methods, completely ignore the semantic information; (2) averaging word embedding methods, cannot depict the synthetical semantic meaning of the given document; (3) neural network methods, require complex structure and are notoriously difficult to be trained. To overcome these limitations, we introduce a simple but novel method which we call bag of meta-words (BoMW). In our method, the semantic information of the document is indicated by a meta-words vector in which every single meta-word element denotes particular semantic information. Especially, these meta-words are extracted from pre-trained word embeddings through two different but complemental models, naive interval meta-words (NIM) and feature combination meta-words (FCM). In general, our new model BoMW is as simple as traditional VSM model but it can capture the synthetical semantic meanings of the document. Numerous experiments on two benchmarks (IMDB dataset and Pang's dataset) are carried out to verify the effectiveness of the proposed method, and the results show that the performance of our method can exceed the traditional VSM methods and methods using pre- trained word embedding. (C) 2018 Elsevier Ltd. All rights reserved.

机译：在情感分类中表示文档的语义信息至关重要。已经提出了各种语义信息表示模型，但是现有方法有其挫折。这些中的显着弱点是：（1）传统的VSM方法，完全忽略了语义信息；（2）平均单词嵌入方法，无法描述给定文档的综合语义；（3）神经网络方法，需要复杂的结构并且众所周知难以训练。为了克服这些限制，我们介绍了一种简单但新颖的方法，我们称其为元词袋（BoMW）。在我们的方法中，文档的语义信息由一个元词向量表示，其中每个单个元词元素都表示特定的语义信息。尤其是，这些元词是通过两个不同但互补的模型（朴素区间元词（NIM）和特征组合元词（FCM））从预训练的词嵌入中提取的。通常，我们的新模型BoMW与传统的VSM模型一样简单，但是它可以捕获文档的综合语义。进行了两个基准测试（IMDB数据集和Pang数据集），以验证该方法的有效性，结果表明，该方法的性能可以超过传统的VSM方法和使用预训练词嵌入的方法。（C）2018 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2018年第12期|33-43|共11页
作者
Fu Mingsheng; Qu Hong; Huang Li; Lu Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification [J] . Khan Jawad, Alam Aftab, Hussain Jamil, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2019,第8期

机译：ENSWF：与文档情绪分类的集合学习方法结合的有效特征提取和选择
2. Micro-Blog Sentiment Classification Method Based on the Personality and Bagging Algorithm [J] . Wenzhong Yang, Tingting Yuan, Liejun Wang Future Internet . 2020,第4期

机译：基于个性和堆积算法的微博情感分类方法
3. Unsupervised Sentiment-Bearing Feature Selection for Document-Level Sentiment Classification [J] . Yan LI, Zhen QIN, Weiran XU, IEICE transactions on information and systems . 2013,第12期

机译：用于文档级情感分类的无监督情感特征选择
4. Bag-of-Words, Bag-of-Topics and Word-to-Vec Based Subject Classification of Text Documents in Polish - A Comparative Study [C] . Tomasz Walkowiak, Szymon Datko, Henryk Maciejewski International Conference on Dependability and Complex Systems . 2018

机译：在波兰语中文本文档的文本文档袋式，主题和基于Word-to-Vec的主题分类 - 比较研究
5. Bagged Projection Methods for Supervised Classification in Big Data [D] . da Silva Cousillas, Natalia. 2017

机译：大数据监督分类的袋装投影方法
6. Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification [O] . Emine Yaman, Abdulhamit Subasi 2019

机译：袋装和升压集合机学习方法的比较自动化EMG信号分类
7. An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification [O] . Nur Syafiqah Mohd Nafis, Suryanti Awang 2021

机译：具有术语频率 - 逆文档频率的增强混合特征选择技术，并支持传染媒介机递归特征消除情绪分类

Bag of meta-words: A novel method to represent document for the sentiment classification

摘要

著录项

相似文献

相关主题

期刊订阅