首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
【24h】

Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation

机译:使用上下文和非上下文子字表示的序列标记:多语言评估

获取原文

摘要

Pretrained contextual and non-contextual sub-word embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works well across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual sub-word embeddings in a low-resource setting.
机译:预先训练的上下文和非上外上下文子字嵌入式已有超过250种语言可用,从而允许大量多语言NLP。然而,虽然没有紫外线嵌入的缺乏症,但在他们之间的明确缺乏系统评价使得从业者难以选择它们。在这项工作中,我们进行了一个广泛的评估,比较了非上下文子字嵌入,即FastText和BPEMB,以及一个上下文表示方法,即伯特,在多语言命名实体识别和语音标记上。我们发现总体而言,BERT,BPEMB和字符表示的组合跨越语言和任务。更详细的分析揭示了不同的优点和缺点:多语言杆在中到高资源语言中表现良好,但是在低资源设置中的非上下文子字嵌入式表现优于优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号