首页> 外文会议>2011 International Conference on Business Management and Electronic Information >On international business intelligence Out-Of-Vocabulary processing based on sentence-aligned web corpus
【24h】

On international business intelligence Out-Of-Vocabulary processing based on sentence-aligned web corpus

机译:基于句子对齐的Web语料库的国际商务智能词汇外处理

获取原文

摘要

International business intelligence processing is an important problem of cross-disciplinary research in artificial intelligence. The recognition of Out-Of-Vocabulary (OOV in short) in international commercial activities and its derivate OOV phrase brings challenge to widely used machine translation technology. Electronic dictionary with a fixed lexicon cannot catch up with the fast increase of international commercial OOV phrase. In this paper, we present a recognition and translation technology for OOV phrases in international business intelligence based on sentence-aligned web corpus. We first obtain the latest and most related textual resource from the Internet and build up a sentence-aligned corpus. Then calculate the relevancy of adjacent word string by Markov model to get a maximum likelihood of segmentation, and determine the OOV and OOV phrase in such business context. Then wipe off the redundancy and calculate the probabilities and weight of co-occurrence word pairs. Thus we have the OOV word pair and the translation of OOV phrase in business intelligence. Experiments show a good result in international business domain and timely update.
机译:国际商务智能处理是人工智能跨学科研究的重要问题。国际商业活动中的词汇量不足(OOV)及其派生的OOV短语的认识给广泛使用的机器翻译技术带来了挑战。具有固定词典的电子词典无法赶上国际商业OOV短语的快速增长。在本文中,我们提出了一种基于句子对齐的网络语料库的国际商务智能中OOV短语的识别和翻译技术。我们首先从互联网上获取最新和最相关的文本资源,并建立一个与句子对齐的语料库。然后通过马尔可夫模型计算相邻词串的相关性,以获得最大的分割可能性,并在这种业务环境下确定OOV和OOV短语。然后擦除冗余并计算同现单词对的概率和权重。因此,我们在商业智能中拥有OOV单词对和OOV短语的翻译。实验表明,在国际业务领域取得了良好的效果,并及时进行了更新。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号