首页> 外国专利> DATA CLEANSING SYSTEM, DATA CLEANSING METHOD, AND DATA CLEANSING PROGRAM

DATA CLEANSING SYSTEM, DATA CLEANSING METHOD, AND DATA CLEANSING PROGRAM

机译:数据清洗系统,数据清洗方法和数据清洗程序

摘要

PROBLEM TO BE SOLVED: To enable accurate data cleansing even in a case where place names with the same notation, abbreviated notation, building name, person's name, and the like coexist.SOLUTION: A data cleansing system comprises: a candidate adding unit 102a for adding an address as a basic token which is a candidate for an element constituting the address with reference to the dictionary data 101b defining the address, and if a corresponding word for the divided character string is not detected from the dictionary data 101b, analyzing a character type of the divided character string, and adding it as an analysis token corresponding to the analyzed character type; a tree construction unit 102b for branching and connecting the basic token and the analysis token added by the candidate adding unit 102a to construct a tree structure; a cost calculation unit 102c for calculating a cost which is added a weight given according to the priority of each token for each branch pattern included in the tree structure; and a candidate selection unit 102e for selecting a predetermined branch pattern as a candidate for the address according to the calculated cost.SELECTED DRAWING: Figure 2
机译:解决的问题:即使在具有相同符号,缩写符号,建筑物名称,人的名字等的地名共存的情况下,也能够实现准确的数据清洁。解决方案:一种数据清洁系统包括:候选添加单元102a,用于参照定义该地址的字典数据101b,添加地址作为基本令牌,该地址是构成该地址的元素的候选者,并且如果没有从字典数据101b中检测到对应于分割字符串的单词,则分析字符分割后的字符串的类型,并将其添加为与所分析的字符类型相对应的分析标记;树构建单元102b,用于分支连接候选添加单元102a添加的基本令牌和分析令牌,以构建树结构;成本计算单元102c,用于计算成本,该成本与针对树结构中包括的每个分支图案的,根据每个令牌的优先级而给定的权重相加;候选选择单元102e,用于根据计算出的成本,选择预定的分支图案作为地址的候选。

著录项

  • 公开/公告号JP2018101244A

    专利类型

  • 公开/公告日2018-06-28

    原文格式PDF

  • 申请/专利权人 SOFTBANK CORP;

    申请/专利号JP20160246327

  • 发明设计人 NISHIDA MASAICHI;

    申请日2016-12-20

  • 分类号G06F17/27;G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-21 13:12:11

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号