COVID-19 Named Entity Recognition for Vietnamese

机译：Covid-19命名越南人的实体认同

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese. Particularly, our dataset is annotated for the named entity recognition (NER) task with newly-defined entity types that can be used in other future epidemics. Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets. We empirically conduct experiments using strong baselines on our dataset, and find that: automatic Vietnamese word segmentation helps improve the NER results and the highest performances are obtained by fine-tuning pre-trained language models where the monolingual model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) produces higher results than the multilingual model XLM-R (Conneau et al., 2020).

机译：目前的Covid-19 Pandemic导致创造许多关于NLP研究和下游申请的Corpora，以帮助战斗大流行。但是，大多数这些公司都是专门用于英语。随着大流行是一个全球问题，值得创建英语以外的语言的Covid-19相关数据集。在本文中，我们介绍了越南语的第一个手动注释的Covid-19域特定数据集。特别是，我们的数据集是为命名实体识别（ner）任务的注释，具有可用于其他未来的Epidemics的新定义的实体类型。与现有的越南NER数据集相比，我们的数据集还包含最多的实体数。我们在我们的数据集上使用强基线进行实验，并发现：自动越南语词分割有助于改善NER结果，并通过微调预先训练的语言模型来获得最高的性能，其中越南语的单声道模型Phobert（Nguyen和Nguyen ，2020）产生比多语言型号XLM-R（Conneau等，2020）产生更高的结果。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2021年|2146-2153|共8页
会议地点
作者
Thinh Hung Truong; Mai Hoang Dao; Dat Quoc Nguyen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Named Entity Recognition in Vietnamese documents based on CRF [J] . Vo Trung Hung American Journal of Engineering Research . 2020,第5期

机译：基于CRF的越南文档中的名为实体识别
2. Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model [J] . Ngoc-Vu Nguyen, Thi-Lan Nguyen, Cam-Van Nguyen Thi, Vietnam Journal of Computer Science . 2019,第4期

机译：通过角色级深终身学习模型改善越南文本中的命名实体识别
3. Text normalization for named entity recognition in Vietnamese tweets [J] . Vu H. Nguyen, Hien T. Nguyen, Vaclav Snasel Computational Social Networks . 2016,第1期

机译：越南推文中用于命名实体识别的文本规范化
4. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing [C] . Linh The Nguyen, Dat Quoc Nguyen Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2021

机译：Phonlp：越南语术语标记的联合多任务学习模型，名为实体识别和依赖解析
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Text normalization for named entity recognition in Vietnamese tweets [O] . Vu H. Nguyen, Hien T. Nguyen, Vaclav Snasel -1

机译：越南推文中用于命名实体识别的文本规范化
7. Improving Named Entity Recognition in Vietnamese Texts by a Character-Level Deep Lifelong Learning Model [O] . Ngoc-Vu Nguyen, Thi-Lan Nguyen, Cam-Van Nguyen Thi, 2019

机译：通过角色级深终身学习模型改善越南文本中的命名实体识别

COVID-19 Named Entity Recognition for Vietnamese

摘要

著录项

相似文献

相关主题

期刊订阅