首页> 美国政府科技报告 >Transductive Pattern Learning for Information Extraction
【24h】

Transductive Pattern Learning for Information Extraction

机译:信息抽取的转导模式学习

获取原文

摘要

The requirement for large labelled training corpora is widely recognized as a key bottleneck in the use of learning algorithms for information extraction. We present TPLEX, a semi-supervised learning algorithm for information extraction that can acquire extraction patterns from a small amount of labelled text in conjunction with a large amount of unlabelled text. Compared to previous work, TPLEX has two novel features. First, the algorithm does not require redundancy in the fragments to be extracted, but only redundancy of the extraction patterns themselves. Second, most bootstrapping methods identify the highest quality fragments in the unlabelled data and then assume that they are as reliable as manually labelled data in subsequent iterations. In contrast, TPLEX's scoring mechanism prevents errors from snowballing by recording the reliability of fragments extracted from unlabelled data. Our experiments with several benchmarks demonstrate that TPLEX is usually competitive with various fully-supervised algorithms when very little labelled training data is available.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号