【24h】

On the Midpoint of a Set of XML Documents

机译:在一组XML文档的中点上

获取原文

摘要

The WWW contains a huge amount of documents. Some of them share the subject, but are generated by different people or even organizations. To guarantee the interchange of such documents, we can use XML, which allows to share documents that do not have the same structure. However, it makes difficult to understand the core of such heterogeneous documents (in general, schema is not available). In this paper, we offer a characterization and algorithm to obtain the midpoint (in terms of a resemblance function) of a set of semi-structured, heterogeneous documents without optional elements. The trivial case of midpoint would be the common elements to all documents. Nevertheless, in cases with several heterogeneous documents this may result in an empty set. Thus, we consider that those elements present in a given amount of documents belong to the midpoint. A exact schema could always be found generating optional elements. However, the exact schema of the whole set may result in overspecialization (lots of optional elements), which would make it useless.
机译:WWW包含大量文档。其中一些共享主题,但由不同的人甚至组织生成。为了保证这些文档的交换,我们可以使用XML,允许共享没有相同结构的文档。但是,难以理解这种异构文件的核心(一般,模式不可用)。在本文中,我们提供了一种表征和算法,以获得一组半结构化的异构文档的中点(在相似函数方面),无需可选元素。中点的琐碎案将是所有文件的常见元素。尽管如此,在几个异构文件的情况下,这可能导致空集。因此,我们认为给定的文件中存在的那些元素属于中点。可以始终找到一个完全的架构生成可选元素。但是,整个集合的确切架构可能会导致超微化(许多可选元素),这将使它无用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号