【24h】

Classification of Chinese-to-English translated social network timelines using naive Bayes

机译:使用朴素贝叶斯对汉英翻译的社交网络时间轴进行分类

获取原文
获取原文并翻译 | 示例

摘要

This study proposes a method that classifies Chinese social network positive-negative comments (Weibo) using naive Bayes algorithm trained from English social network (Twitter) corpus. We train our text classifier using Twitter corpus (in English language), and use this classifier to classify Chinese text. In the previous research, Chinese sentences are processed using Chinese word segmentation algorithms before the application of machine learning algorithm. Chinese word segmentation algorithms split Chinese sentences into a series of words since a Chinese word consists of several Chinese characters unlike English sentences. Therefore, the quality of word segmentation algorithm obviously influences the accuracy of Chinese text categorization problems. In our research, we eliminate Chinese word segmentation stage (a traditional preprocessing stage of Chinese text classification) to avoid the effect on the quality of segmentation algorithms. Instead of Chinese word segmentation processing, we translate Chinese text into English text via Google translator. Based on Twitter corpus, we directly generate a text classifier by using naive Bayes multinomial algorithm. Finally, the text classifier classifies a new Chinese text (a Weibo text, which has been translated into English by Google translation at preprocessing stage). We conduct an experiment comparing the performance of naive Bayes multinomial algorithm and C4.5 in terms of accuracy.
机译:这项研究提出了一种方法,该方法使用从英国社交网络(Twitter)语料库训练的朴素贝叶斯算法对中国社交网络的正面-负面评论(Weibo)进行分类。我们使用Twitter语料库(英语)训练文本分类器,并使用该分类器对中文文本进行分类。在先前的研究中,在应用机器学习算法之前,使用中文分词算法处理中文句子。中文分词算法将中文句子分为一系列单词,因为中文单词由多个与英文句子不同的汉字组成。因此,分词算法的质量显然会影响中文文本分类问题的准确性。在我们的研究中,我们消除了中文分词阶段(传统的中文文本分类预处理阶段),以避免影响分词算法的质量。我们不使用中文分词处理,而是通过Google翻译器将中文文本翻译成英文文本。基于Twitter语料库,我们使用朴素贝叶斯多项式算法直接生成文本分类器。最后,文本分类器对新的中文文本(微博文本,在预处理阶段已通过Google翻译译成英语)进行分类。我们进行了一项实验,在准确性方面比较了朴素贝叶斯多项式算法和C4.5的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号