...
首页> 外文期刊>Expert systems with applications >Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory
【24h】

Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory

机译:使用端到端双向短期内存的印度尼西亚语言在印度尼西亚语言中提取本体建设中的概念的命名实体识别

获取原文
获取原文并翻译 | 示例
           

摘要

Information Extraction has been widely used to extract information from text. Named Entity Recognition (NER) is one of the primary tasks of Information Extraction to extract entities such as person, location, and organization. Extraction from text collection is essential to obtain information from unstructured text. Moreover, Named Entity Recognition is part of ontology building, which is the main objective of this research. Ontology can be built on the basis of a collection of concepts and relation between concepts. Concepts in ontology usually consist of a group of entities and are obtained using Noun Phrase Extraction or Named Entity Recognition. Our main focus in this research is to extract concepts in Ontology Building automatically using Named Entity Recognition. In this paper, Named Entity Recognition was chosen as our approach due to the lack of results from the previous Noun Phrase Extraction works, which is not all nouns obtained are entities. Our proposed methodology for Named Entity Recognition is applying an end-to-end model using Bidirectional Long Short Term Memory (BiLSTM). Bi-LSTM is able to perform a sequence classification task by understanding the context of the input. Named Entity Recognition approaches in the previous study uses Part-of-Speech (POS) Tagging in the preprocessing phase by using other tools or models. This Part-of Speech is also used as a feature to improve the performance of Named Entity Recognition. Our proposed methodology provides an end-to-end system that can be used for both POS Tagging and Named Entity Recognition. By using our proposed end-to-end model, no additional tool is needed for Part-of-Speech Tagging. This the advantage of our model compared to other models. Experiments were conducted on news documents that were labeled with four types of entity classes and 35 types of part-of-speech. The target entities that we have extracted in this study are person, location, organization, and miscellaneous. We evaluated the performance of our model using F1-Score. We have achieved the best F1-Score for Part-of-Speech Tagging of 91.79% and Named Entity Recognition of 83.18%.
机译:信息提取已被广泛用于从文本中提取信息。命名实体识别(ner)是信息提取的主要任务之一,以提取人,位置和组织等实体。从文本收集提取对于从非结构化文本获取信息至关重要。此外,命名实体识别是本体建设的一部分,这是本研究的主要目标。本体可以基于概念之间的概念和关系的基础构建。本体中的概念通常由一组实体组成,并使用名词短语提取或命名实体识别获得。我们在这项研究中的主要焦点是在自动使用命名实体识别中自动提取本体构建的概念。在本文中,选择了名为实体识别作为我们的方法,因为前名词短语提取工作中缺乏结果,这不是所获得的所有名词都是实体。我们所提出的命名实体识别的方法是使用双向短期内存(BILSTM)应用端到端模型。 Bi-LSTM能够通过了解输入的上下文来执行序列分类任务。先前研究中的命名实体识别方法在预处理阶段使用其他工具或模型使用语音部分(POS)标记。这部分语音也用作提高命名实体识别性能的功能。我们提出的方法提供了端到端系统,可用于POS标记和命名实体识别。通过使用我们提出的端到端模型,不需要换句话说额外的工具。这与其他模型相比我们模型的优势。在新闻文件上进行了实验,这些文件标有四种类型的实体课程和35种类型的演讲类型。我们在本研究中提取的目标实体是人,位置,组织和杂项。我们使用F1分数评估了模型的性能。我们已经实现了最佳的F1分数,以供言语标签为91.79%,命名为83.18%。

著录项

  • 来源
    《Expert systems with applications》 |2021年第8期|114856.1-114856.11|共11页
  • 作者单位

    Inst Sains & Teknol Terpadu Surabaya Dept Informat Technol Surabaya East Java Indonesia|Inst Sains & Teknol Terpadu Surabaya Dept Informat Surabaya East Java Indonesia;

    Inst Sains & Teknol Terpadu Surabaya Dept Informat Technol Surabaya East Java Indonesia|Inst Sains & Teknol Terpadu Surabaya Dept Informat Surabaya East Java Indonesia;

    Inst Sains & Teknol Terpadu Surabaya Dept Informat Surabaya East Java Indonesia;

    Inst Teknol Sepuluh Nopember Dept Elect Engn Surabaya East Java Indonesia|Inst Teknol Sepuluh Nopember Dept Comp Engn Surabaya East Java Indonesia;

    Inst Teknol Sepuluh Nopember Dept Elect Engn Surabaya East Java Indonesia|Inst Teknol Sepuluh Nopember Dept Comp Engn Surabaya East Java Indonesia;

    Inst Teknol Sepuluh Nopember Dept Elect Engn Surabaya East Java Indonesia|Inst Teknol Sepuluh Nopember Dept Comp Engn Surabaya East Java Indonesia|Sci & Technol Ctr Artificial Intelligence Healthc Surabaya Indonesia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Named entity recognition; Ontology building; Concept extraction; Indonesian language; Information extraction; End-to-end model;

    机译:命名实体识别;本体建设;概念提取;印度尼西语;信息提取;端到端模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号