首页> 外文学位 >Multilingual named entity extraction and translation from text and speech.
【24h】

Multilingual named entity extraction and translation from text and speech.

机译:多语言命名实体从文本和语音中提取和翻译。

获取原文
获取原文并翻译 | 示例

摘要

Named entities (NE), the noun or noun phrases referring to persons, locations and organizations, are among the most information-bearing linguistic structures. Extracting and translating named entities benefits many natural language processing problems such as cross-lingual information retrieval, cross-lingual question answering and machine translation.; In this theisis we propose an efficient and effective framework to extract and translate NEs from text and speech. We adopt the hidden Markov model (HMM) as a baseline NE extraction system, and investigate its performance in multiple language pairs with varying amounts of training data. We expand the baseline text NE tagger with a context-based NE extraction model, which aims to detect and correct NE recognition errors from automatic speech recognition hypotheses. We also adapt the broadcast stews trained NE tagger for meeting transcripts.; We develop several language-independent features to capture phonetic and semantic similarity measures between source and target NE pairs. We incorporate these features to solve various NE translation problems presented in different language pairs (Chinese to English, Arabic to English and Hindi to English), with varying resources (parallel and non-parallel corpora as well as the World Wide Web) and different input data streams (text and speech).; We also propose a cluster-specific name transliteration framework. By grouping names from similar origins into one cluster and training cluster-specific transliteration and language models, we manage to dramatically reduce the name transliteration error rates.
机译:命名实体(NE),指的是人物,位置和组织的名词或名词短语,是信息最多的语言结构之一。提取和翻译命名实体有益于许多自然语言处理问题,例如跨语言信息检索,跨语言问题解答和机器翻译。在本论文中,我们提出了一个有效且有效的框架,用于从文本和语音中提取和翻译NE。我们采用隐马尔可夫模型(HMM)作为基线NE提取系统,并使用不同数量的训练数据来研究其在多种语言对中的性能。我们使用基于上下文的NE提取模型扩展基线文本NE标签器,该模型旨在从自动语音识别假设中检测并纠正NE识别错误。我们也将经过广播炖煮训练的NE标签器改编为会议记录。我们开发了几种独立于语言的功能,以捕获源和目标网元对之间的语音和语义相似性度量。我们结合了这些功能,以解决以不同语言对(中文到英语,阿拉伯语到英语和印地语到英语),各种资源(并行和非并行语料库以及万维网)和不同输入提出的各种NE翻译问题。数据流(文本和语音)。我们还提出了特定于群集的名称音译框架。通过将来自相似起源的名称分组到一个群集中,并训练群集特定的音译和语言模型,我们设法显着降低了名称音译错误率。

著录项

  • 作者

    Huang, Fei.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号