首页> 外国专利> MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD AND NAMED ENTITY RECOGNITION DEVICE

MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD AND NAMED ENTITY RECOGNITION DEVICE

机译:机器学习程序,机器学习方法和命名实体识别设备

摘要

The present invention addresses the problem of improving the accuracy of named entity recognition with respect to an unknown word not listed in a dictionary. A character string included in text data is divided into a plurality of tokens. A matching process is executed between a token array indicating a predetermined number of successive tokens among a plurality of tokens and dictionary information including a plurality of named entities, to search for a similar named entity, among the plurality of named entities, of which similarity to the token array is greater than or equal to a threshold. Matching information indicating a result of the matching process between the token array and the similar named entity is converted into first vector data. Input data is generated by using a plurality of pieces of vector data converted from a plurality of tokens and the first vector data, and a named entity recognition model for detecting a named entity is generated through machine learning using the input data.
机译:本发明解决了提高了关于在字典中未列出的未知单词的命名实体识别的准确性的问题。文本数据中包含的字符串被分成多个令牌。在指示包括多个命名实体的多个令牌和字典信息中的预定数量的连续令牌之间执行匹配过程,用于搜索与其相似度相似度的多个命名实体中的类似命名实体令牌阵列大于或等于阈值。指示令牌阵列与类似命名实体之间的匹配过程结果的匹配信息被转换为第一矢量数据。通过使用从多个令牌和第一矢量数据转换的多个矢量数据生成输入数据,并且通过使用输入数据通过机器学习生成用于检测命名实体的命名实体识别模型。

著录项

  • 公开/公告号WO2021214941A1

    专利类型

  • 公开/公告日2021-10-28

    原文格式PDF

  • 申请/专利权人 FUJITSU LIMITED;

    申请/专利号WO2020JP17488

  • 发明设计人 NGUYEN LE AN;MORITA HAJIME;

    申请日2020-04-23

  • 分类号G06F40/216;G06F40/295;

  • 国家 JP

  • 入库时间 2022-08-24 21:59:49

获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号