首页>
外国专利>
METHODS AND SYSTEMS FOR DETECTING DUPLICATE DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING
METHODS AND SYSTEMS FOR DETECTING DUPLICATE DOCUMENT USING DOCUMENT SIMILARITY MEASURING MODEL BASED ON DEEP LEARNING
展开▼
机译:基于深度学习的文档相似度测量模型检测重复文档的方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
Disclosed is a method and system, the method including extracting similar and dissimilar document pair sets from a document database, the similar document pair set including similar document pairs having a common attribute, and the dissimilar document pair set including dissimilar document pairs extracted randomly, calculating a mathematical similarity for each of the similar and dissimilar document pairs using a mathematical measure to obtain a first and second mathematical similarities, calculating a semantic similarity for each of the similar and dissimilar document pairs to obtain a first and second semantic similarities, the first semantic similarities being higher than the first mathematical similarities, and the second semantic similarities being lower than the second mathematical similarities, training a similarity model based on the similar and dissimilar document pairs, and the first and second semantic similarities to obtain a trained similarity model, and detecting a duplicate document using the trained similarity model.
展开▼