...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >AML: Efficient Approximate Membership Localization within a Web-Based Join Framework
【24h】

AML: Efficient Approximate Membership Localization within a Web-Based Join Framework

机译:AML:基于Web的加入框架中的有效的近似成员资格本地化

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.
机译:在本文中,我们提出了一种新型的基于字典的实体识别问题,称为近似成员资格本地化(AML)。流行的近似成员抽取(AME)提供了给定文档中真正匹配的子字符串的完整覆盖,但是许多冗余导致AME处理效率低下,并降低了使用抽取的子字符串的实际应用程序的性能。 AML问题的目标是找到不重叠的子字符串,这是对真正匹配的子字符串的更好近似,而不会产生重叠的冗余。为了有效地执行AML,我们提出了优化算法P-Prune,该算法在生成重叠的冗余匹配子字符串之前会对其进行修剪。我们使用几个实词数据集的研究证明了P-Prune在基线方法上的效率。我们还研究了在提议的基于Web的连接框架方案中应用AML的情况,该方案是一种基于搜索的方法,使用来自Web文档的基于字典的实体识别来连接两个表。结果不仅证明了AML相对于AME的优势,而且证明了我们基于搜索的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号