【24h】

Introduction

机译:介绍

获取原文
获取原文并翻译 | 示例

摘要

The goal of this workshop is to expand the current area of cross-lingual learning to include more NLP problems, encourage approaches that explore low-resource scenarios, and improve upon existing approaches to multilinguality. State-of-the-art NLP tools such as text parsing, speech recognition and synthesis, text and speech translation, semantic analysis and inference, rely on availability of language-specific data resources that exist only for a few resource-rich languages. To make NLP tools available in more languages, techniques have been developed for projecting such resources from resource-rich languages using parallel (translated) data as a bridge for cross-lingual NLP applications. The limiting reagent in these methods is parallel data or bilingual lexicons. While small parallel corpora do exist for many languages, suitably large parallel corpora are expensive, and these typically exist only for English and a few other geopolitically or economically important language pairs. Given this state of affairs, there is an urgent need for new cross-lingual methods, language-independent multilingual methods, and methods for establishing lexical links across languages that do not necessarily rely on large-scale parallel corpora. Without new strategies, most of the 7,000+ languages in the world-many with millions of speakers-will remain resource-poor from the standpoint of NLP.
机译:该研讨会的目的是扩大当前的跨语言学习范围,以包括更多的自然语言处理问题,鼓励探索资源匮乏场景的方法,并改进现有的多语言方法。最新的NLP工具(例如文本解析,语音识别和合成,文本和语音翻译,语义分析和推理)依赖于仅针对几种资源丰富的语言存在的特定于语言的数据资源。为了使NLP工具能够以更多的语言提供,已经开发了使用并行(翻译)数据作为跨语言NLP应用程序的桥梁从资源丰富的语言中投射此类资源的技术。这些方法中的限制性试剂是平行数据或双语词典。虽然对于许多种语言确实存在小型并行语料库,但是适当的大型并行语料库是昂贵的,并且通常仅针对英语以及其他一些具有地缘政治或经济意义的语言对存在。考虑到这种情况,迫切需要新的跨语言方法,独立于语言的多语言方法以及用于跨语言建立词汇链接的方法,这些方法不一定依赖于大型并行语料库。如果没有新的策略,从NLP的角度来看,世界上7,000多种语言中的大多数(拥有数以百万计的讲者)将仍然资源匮乏。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号