首页> 外国专利> METHODS AND SYSTEMS FOR DETECTING DUPLICATE DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING

METHODS AND SYSTEMS FOR DETECTING DUPLICATE DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING

机译:基于深度学习的文档相似度测量模型检测重复文档的方法和系统

摘要

Disclosed is a method and system, the method including extracting similar and dissimilar document pair sets from a document database, the similar document pair set including similar document pairs having a common attribute, and the dissimilar document pair set including dissimilar document pairs extracted randomly, calculating a mathematical similarity for each of the similar and dissimilar document pairs using a mathematical measure to obtain a first and second mathematical similarities, calculating a semantic similarity for each of the similar and dissimilar document pairs to obtain a first and second semantic similarities, the first semantic similarities being higher than the first mathematical similarities, and the second semantic similarities being lower than the second mathematical similarities, training a similarity model based on the similar and dissimilar document pairs, and the first and second semantic similarities to obtain a trained similarity model, and detecting a duplicate document using the trained similarity model.
机译:公开了一种方法和系统,该方法包括从文档数据库中提取类似和不相似的文档对集合,包括具有共同属性的类似文档对的类似文档对集合,以及包括随机提取的异种文档对的不同文档对集合,计算使用数学措施来获得第一和第二数学相似性的每个类似和不同的文献对的数学相似性,计算每个类似和不同的文件对的语义相似度,以获得第一和第二语义相似性,第一个语义相似之处高于第一数学相似性,第二语义相似性低于第二数学相似性,基于类似和不同的文献对训练相似性模型,以及第一和第二语义相似性以获得训练有素的相似模型,以及检测重复使用培训的相似模型进行文档。

著录项

  • 公开/公告号US2021182551A1

    专利类型

  • 公开/公告日2021-06-17

    原文格式PDF

  • 申请/专利权人 NAVER CORPORATION;

    申请/专利号US202017119028

  • 发明设计人 SUNG MIN KIM;BYEONGHOON HAN;

    申请日2020-12-11

  • 分类号G06K9;G06K9/62;G06F16/93;

  • 国家 US

  • 入库时间 2022-08-24 19:23:37

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号