首页> 外文期刊>International journal of web services research >XWRAPComposer: A Multi-Page Data Extraction Service
【24h】

XWRAPComposer: A Multi-Page Data Extraction Service

机译:XWRAPComposer:多页数据提取服务

获取原文
获取原文并翻译 | 示例
           

摘要

We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web service from the tasks that are repetitive for any service, thus the code can be generated as a wrapper library component and reused automatically by the wrapper generator system. Second, we use inductive learning algorithms that derive information flow and data extraction patterns by reasoning about sample pages or sample specifications. More importantly, we design a declarative rule-based script language for multi-page information extraction, encouraging a clean separation of the information extraction semantics from the information flow control and execution logic of wrapper programs. We implement these design principles with the development of the XWRAPComposer toolkit, which can semi-automatically generate WSDL-enabled wrapper programs. We illustrate the problems and challenges of multi-page data extraction in the context of bioinformatics applications and evaluate the design and development of XWRAPComposer through our experiences of integrating various BLAST services.
机译:我们提出了一种面向服务的体系结构和一套用于开发包装器代码生成器的技术,包括设计有效包装器程序构造工具的方法和称为XWRAPComposer的具体实现。我们的包装器生成框架有两个独特的设计目标。首先,我们明确地将构建特定于Web服务的包装器的任务与对任何服务重复的任务分开,因此可以将代码生成为包装器库组件,并由包装器生成器系统自动重用。其次,我们使用归纳学习算法,通过推理样本页面或样本规范来导出信息流和数据提取模式。更重要的是,我们设计了一种基于声明的基于规则的脚本语言来进行多页信息提取,从而促进了信息提取语义与包装程序的信息流控制和执行逻辑的清晰分离。我们通过XWRAPComposer工具箱的开发来实现这些设计原则,该工具箱可以半自动生成启用WSDL的包装程序。我们在生物信息学应用程序的背景下说明了多页数据提取的问题和挑战,并通过我们整合各种BLAST服务的经验来评估XWRAPComposer的设计和开发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号