首页> 中文期刊> 《情报学报》 >双语潜在语义对应分析及在跨语言文本分类中的应用研究

双语潜在语义对应分析及在跨语言文本分类中的应用研究

         

摘要

Bilingual text corresponding analysis can help bridge the language barrier of cross-lingual corpora. Cross-lingual latent semantic indexing corpus-based does not fully take into account bilingual semantic relationship. The paper proposes a new method building semantic relationship of bilingual parallel document via partial least squares. In this method, the parallel documents are viewed as two different lingual representations for the same semantic content, such that, a unify latent semantic space can be constructed for two languages. The task of cross-lingual text categorization is performed in the new bilingual latent semantic spaces. The Chinese-English document-aligned dataset for evaluating is collected from the Hong Kong government news website. Experimental results on the task of mono-and cross-lingual classification show that performance of the presented method is over or near to mono-lingual classification in the original feature spaces.%双语文本对应分析在处理多语言文本数据、克服语言障碍等方面有着重要的作用,基于语料库技术的跨语言潜在语义索引方法没有充分考虑双语的语义相关性.本文将双语平行文档看作同一语义内容的两种语言表达,运用偏最小二乘方法构建双语文本的语义相关性,为每种语言建立单独的潜在语义空间,并在这两个空间上实现跨语言的分类任务.在香港政府网中英双语新闻的实验结果显示,本文方法构造的双语潜在语义空间上完成的跨语言和单语言文本分类性能接近或优于原始特征空间的单语言分类,并具有良好的稳健性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号