Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields

K.P. Pallavi; L. Sobha; M.M. Ramya

首页> 外文期刊>Journal of computer sciences >Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields

【24h】

Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields

机译：使用带有条件随机字段的地名词典列表将其命名为卡纳达语实体识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entities (NEs) that exist in the sentences are essential to build Natural Language Processing (NLP) applications for Information Extraction (IE) from large corpora. However, generating a large corpus is challenging for resource poor languages, such as Kannada. Further, there is no annotated corpus available online. The challenges faced in annotating NEs with pre-defined classes are: It is morphologically joined with other words and the spelling variations are more frequent for Kannada words. Sentence structure varies according to morphology, parts of speech (pos) and chunking of a language. These parameters differ from one language to another. To address these challenges, a novel application system is proposed to identify NEs in Kannada using a large corpus of 73.676 tokens. The Named Entity Recognition (NER) system consist of a robust pos tagger and Noun Phrase (NP) chunker developed for generic data. Five gazetteer lists were created from many orthographic patterns for each word. Context information such as previous two words, next two words, word morphology and gazetteer lists were added to feature lists. An unigram-bigram template was designed and incorporated into Conditional Random Fields (CRFs) to generate conditional feature functions. The proposed system resulted in 86.85% and 71.01% f-measure for gold test data and newspaper data respectively.

机译：句子中存在的命名实体（NE）对于构建用于从大型语料中提取信息（IE）的自然语言处理（NLP）应用程序至关重要。但是，对于资源贫乏的语言（例如卡纳达语），生成大型语料库具有挑战性。此外，在线没有可用的注释语料库。用预定义类注释网元时面临的挑战是：它在形态上与其他单词结合在一起，而卡纳达语单词的拼写变化更为频繁。句子结构根据词法，词性（pos）和语言块化而变化。这些参数因一种语言而异。为了解决这些挑战，提出了一种新颖的应用系统，该系统使用73.676个令牌的大型语料库在卡纳达语中标识NE。命名实体识别（NER）系统由一个健壮的pos标记器和为通用数据开发的名词短语（NP）分块器组成。根据每个单词的许多拼字形式创建了五个地名词典列表。上下文信息（例如前两个单词，后两个单词，单词形态和地名词典列表）已添加到功能列表中。设计了一个字母组合图模板，并将其合并到条件随机字段（CRF）中以生成条件特征函数。所提出的系统分别对黄金测试数据和报纸数据进行了86.85％和71.01％的f测量。

著录项

来源
《Journal of computer sciences》 |2018年第5期|645-653|共9页
作者
K.P. Pallavi; L. Sobha; M.M. Ramya;
展开▼
作者单位

Department of Computing Sciences, Hindustan Institute of Technology and Science, Chennai, India;

AUKBC, MIT campus, Chennai, India;

Center of Automation and Robotics, Hindustan Institute of Technology and Science, Chennai, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Named Entities; Natural Language Processing; Noun Phrase Chunker; Conditional Random Fields;

机译：命名实体;自然语言处理;名词短语分块;条件随机场;

相似文献

外文文献
中文文献
专利

1. Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields [J] . Pallavi K. P., Sobha L., Ramya M. M. Journal of computer sciences . 2018,第5期

机译：使用带有条件随机字段的地名词典列表将其命名为卡纳达语实体识别
2. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [J] . Wangjin Lee, Jinwook Choi BMC Medical Informatics and Decision Making . 2019,第1期

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
3. Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields [J] . Pingchuan Ma, Bo Jiang, Zhigang Lu, Tsinghua Science and Technology . 2021,第3期

机译：网络安全使用双向短期内存命名实体识别，其中包含有条件的随机字段
4. Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers [C] . Wu Xixin, Wu Zhiyong, Jia Jia, 2012 8th International Symposium on Chinese Spoken Language Processing. . 2012

机译：基于条件随机字段和自动更新动态地名索引的自适应命名实体识别
5. A study on the use of conditional random fields for automatic speech recognition. [D] . Morris, Jeremy J. 2010

机译：关于使用条件随机场进行自动语音识别的研究。
6. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [O] . Wangjin Lee, Jinwook Choi 2019

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
7. Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields [O] . K. P. Pallavi, L. Sobha, M. M. Ramya 2018

机译：使用带有条件随机字段的公鸡列表命名为kannada的实体识别

Named Entity Recognition for Kannada using Gazetteers list with Conditional Random Fields

摘要

著录项

相似文献

相关主题

期刊订阅