首页> 外文会议>Conference on Industrial and Information Systems >Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation
【24h】

Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation

机译:泰米尔文本中的情感分析:机器学习技术与特征表示研究

获取原文

摘要

Sentiment Analysis (SA) is an application of Natural Language Processing (NLP) to extract the sentiments expressed in the text. In this paper, we experimented five approaches to perform SA, namely, Lexicon based approach, Supervised Machine learning based approach, Hybrid approach, K-means with Bag of Word (BoW) approach and K-modes with BoW approach. We have experimented these approaches using five corpora with different feature representation techniques to predict the best approach to perform SA in Tamil texts. In this research we used Basic features such as word count and punctuation count in addition to traditional features such as Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) included to check their influence in the prediction. We have compared these approaches, features and the corpora. From the evaluation the highest accuracy of 79% is obtained for UJ_Corpus_Opinions_Nouns corpus with fastText for supervised Machine learning based approach.
机译:情绪分析(SA)是自然语言处理(NLP)的应用,以提取文本中表达的情绪。在本文中,我们尝试了五种方法来执行SA,即基于词汇的方法,受监管机器学习的方法,混合方法,K-mease,用弓法的袋子(弓)方法和k模式。我们使用五个Corpora尝试了这些方法,其中包含不同的特征表示技术,以预测在泰米尔文本中执行SA的最佳方法。在本研究中,除了包括单词(弓)和术语频率逆文档频率(TF-IDF)之类的传统功能之外,还使用了单词计数和标点符号等基本功能,以检查它们在预测中的影响。我们比较了这些方法,功能和语料库。从评估,对于UJ_CORPUS_OPINIONS_NOUNS语料库,可以获得79%的最高精度为基于监督机器学习的方法,获得了UJ_CORPUS_OPINIONS_NOUNS语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号