首页> 外国专利> System and method for rapid development of natural language understanding using active learning

System and method for rapid development of natural language understanding using active learning

机译:利用主动学习快速发展自然语言理解的系统和方法

摘要

A method, computer program product, and data processing system for training a statistical parser by utilizing active learning techniques to reduce the size of the corpus of human-annotated training samples (e.g., sentences) needed is disclosed. According to a preferred embodiment of the present invention, the statistical parser under training is used to compare the grammatical structure of the samples according to the parser's current level of training. The samples are then divided into clusters, with each cluster representing samples having a similar structure as ascertained by the statistical parser. Uncertainty metrics are applied to the clustered samples to select samples from each cluster that reflect uncertainty in the statistical parser's grammatical model. These selected samples may then be annotated by a human trainer for training the statistical parser.
机译:公开了一种通过利用主动学习技术来减少所需的人类注释训练样本(例如句子)的语料库大小来训练统计解析器的方法,计算机程序产品和数据处理系统。根据本发明的优选实施例,训练中的统计解析器用于根据解析器的当前训练水平来比较样本的语法结构。然后将样本划分为聚类,每个聚类代表具有与统计解析器确定的结构相似的样本。将不确定性度量应用于聚类样本,以从每个聚类中选择反映统计分析器语法模型中不确定性的样本。这些选择的样本然后可以由人类教练注释以训练统计解析器。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号