首页> 外国专利> GENERATING A CONSISTENTLY LABELED TRAINING DATASET BY AUTOMATICALLY GENERATING AND DISPLAYING A SET OF MOST SIMILAR PREVIOUSLY-LABELED TEXTS AND THEIR PREVIOUSLY ASSIGNED LABELS FOR EACH TEXT THAT IS BEING LABELED FOR THE TRAINING DATASET

GENERATING A CONSISTENTLY LABELED TRAINING DATASET BY AUTOMATICALLY GENERATING AND DISPLAYING A SET OF MOST SIMILAR PREVIOUSLY-LABELED TEXTS AND THEIR PREVIOUSLY ASSIGNED LABELS FOR EACH TEXT THAT IS BEING LABELED FOR THE TRAINING DATASET

机译:通过自动生成并显示一组最相似的预先标注的文本以及它们为每个要标记为训练数据集的文本预先分配的标签,来生成一致的标注的训练数据集

摘要

Technology for generating a consistently labeled training dataset. For each one of multiple previously labeled texts, a distance between the previously labeled text and a current text to be labeled is generated by comparing a list of tokens for the previously labeled text to a list of tokens for the current text to determine an overlap value equal to a number of tokens that match between the list of tokens for the previously labeled text and the list of tokens for the current text, and using the overlap value to calculate a distance between the previously labeled text and the current text that is inversely correlated to the overlap value. Previously labeled texts that are most similar to the current text are identified as those previously labeled texts having the shortest distances to the current text, and are displayed with their previously assigned labels in a label selection user interface.
机译:用于生成一致标记的训练数据集的技术。对于多个先前标记的文本中的每一个,通过将先前标记的文本的标记列表与当前文本的标记列表进行比较以确定重叠值,来生成先前标记的文本和要标记的当前文本之间的距离。等于在先前标记的文本的标记列表与当前文本的标记列表之间匹配的标记数量,并使用重叠值来计算先前标记的文本和与反向相关的当前文本之间的距离到重叠值。与当前文本最相似的先前标记的文本被标识为与当前文本距离最短的那些先前标记的文本,并在标签选择用户界面中以其先前分配的标签显示。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号