首页> 外文会议>Multilingual information access in South Asian languages >Cross Lingual Text Reuse Detection Based on Keyphrase Extraction and Similarity Measures
【24h】

Cross Lingual Text Reuse Detection Based on Keyphrase Extraction and Similarity Measures

机译:基于关键词提取和相似度度量的跨语言文本重用检测

获取原文
获取原文并翻译 | 示例

摘要

Information on web in various languages is growing fast, but large amount of content still exists in English. There are several cases of English text re-use (cross language plagiarism) observed in non-English languages. Detecting text re-use in non-English languages is a challenging task due to complexity of the language used. Complexity further increases for less resource languages like Arabic and Indian languages. In this paper, we address the problem proposed in FIRE CL!TR 2011 task of detecting plagiarized documents in Hindi language which was reused from English language source documents. We proposed three approaches using classification and key-phrase retrieval techniques. Our winning approach attained 0.792 F-measure.
机译:各种语言的网络信息都在快速增长,但是仍然存在大量英语内容。在非英语语言中,有几种情况下会重复使用英语文本(跨语言窃)。由于所用语言的复杂性,检测非英语语言的文本重用是一项艰巨的任务。资源较少的语言(如阿拉伯语和印度语)的复杂性进一步增加。在本文中,我们解决了FIRE CL!TR 2011任务中提出的检测印地语抄袭文档的问题,该问题可从英语源文档中重复使用。我们提出了三种使用分类和关键短语检索技术的方法。我们的制胜法获得了0.792 F测度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号