首页> 外文会议>Advances in Information Retrieval >Clustering and Visualization in a Multi-lingual Multi-document Summarization System
【24h】

Clustering and Visualization in a Multi-lingual Multi-document Summarization System

机译:多语言多文档摘要系统中的聚类和可视化

获取原文

摘要

To measure the similarity of words, sentences, and documents is one of the major issues in multi-lingual multi-document summarization. This paper presents five strategies to compute the multilingual sentence similarity. The experimental results show that sentence alignment without considering the word position or order in a sentence obtains the best performance. Besides, two strategies are proposed for multilingual document clustering. The two-phase strategy (translation after clustering) is better than one-phase strategy (translation before clustering). Translation deferred to sentence clustering, which reduces the propagation of translation errors, is most promising. Moreover, three strategies are proposed to tackle the sentence clustering. Complete link within a cluster has the best performance, however, the subsumption-based clustering has the advantage of lower computation complexity and similar performance. Finally, two visualization models (i.e., focusing and browsing), which consider the users' language preference, are proposed.
机译:衡量单词,句子和文档的相似性是多语言多文档摘要中的主要问题之一。本文提出了五种计算多语言句子相似度的策略。实验结果表明,在不考虑单词在单词中的位置或顺序的情况下,句子对齐可以获得最佳性能。此外,针对多语言文档聚类提出了两种策略。两阶段策略(群集之后的翻译)比一阶段策略(群集之前的翻译)更好。推迟到句子聚类的翻译(减少翻译错误的传播)是最有前途的。此外,提出了三种策略来解决句子聚类问题。群集中的完整链接具有最佳性能,但是,基于包含的群集具有较低的计算复杂度和相似性能的优点。最后,提出了两种可视化模型(即聚焦和浏览),它们考虑了用户的语言偏好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号