首页> 外文OA文献 >Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.
【2h】

Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.

机译:企业数据仓库开发的策略和方法。集成了数据挖掘和社交网络技术,以识别数据仓库内的不同社区。

摘要

Data warehouse technology has been successfully integrated into the informationudinfrastructure of major organizations as potential solution for eliminating redundancy andudproviding for comprehensive data integration. Realizing the importance of a dataudwarehouse as the main data repository within an organization, this dissertation addressesuddifferent aspects related to the data warehouse architecture and performance issues.udMany data warehouse architectures have been presented by industry analysts andudresearch organizations. These architectures vary from the independent and physicaludbusiness unit centric data marts to the centralised two-tier hub-and-spoke data warehouse.udThe operational data store is a third tier which was offered later to address the businessudrequirements for inter-day data loading. While the industry-available architectures are alludvalid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity).udIn this dissertation, I am advocating a new architecture (The Hybrid Architecture)udwhich encompasses the industry advocated architecture. The hybrid architecture demandsudthe acquisition, loading and consolidation of enterprise atomic and detailed data into audsingle integrated enterprise data store (The Enterprise Data Warehouse) where businessunitudcentric Data Marts and Operational Data Stores (ODS) are built in the same instanceudof the Enterprise Data Warehouse.udFor the purpose of highlighting the role of data warehouses for differentudapplications, we describe an effort to develop a data warehouse for a geographicaludinformation system (GIS). We further study the importance of data practices, quality andudgovernance for financial institutions by commenting on the RBC Financial Group case.udvudThe development and deployment of the Enterprise Data Warehouse based on theudHybrid Architecture spawned its own issues and challenges. Organic data growth andudbusiness requirements to load additional new data significantly will increase the amountudof stored data. Consequently, the number of users will increase significantly. Enterpriseuddata warehouse obesity, performance degradation and navigation difficulties are chiefudamongst the issues and challenges.udAssociation rules mining and social networks have been adopted in this thesis toudaddress the above mentioned issues and challenges. We describe an approach that usesudfrequent pattern mining and social network techniques to discover different communitiesudwithin the data warehouse. These communities include sets of tables frequently accessedudtogether, sets of tables retrieved together most of the time and sets of attributes thatudmostly appear together in the queries. We concentrate on tables in the discussion;udhowever, the model is general enough to discover other communities. We first build audfrequent pattern mining model by considering each query as a transaction and the tablesudas items. Then, we mine closed frequent itemsets of tables; these itemsets include tablesudthat are mostly accessed together and hence should be treated as one unit in storage andudretrieval for better overall performance. We utilize social network construction andudanalysis to find maximum-sized sets of related tables; this is a more robust approach asudopposed to a union of overlapping itemsets. We derive the Jaccard distance between theudclosed itemsets and construct the social network of tables by adding links that representuddistance above a given threshold. The constructed network is analyzed to discoverudcommunities of tables that are mostly accessed together. The reported test results areudpromising and demonstrate the applicability and effectiveness of the developed approach.
机译:数据仓库技术已成功集成到主要组织的信息基础架构中,作为消除冗余和提供全面数据集成的潜在解决方案。意识到数据仓库作为组织内主要数据存储库的重要性,本论文致力于与数据仓库体系结构和性能问题相关的不同方面。 ud行业分析师和 udresearch组织已经提出了许多数据仓库体系结构。这些体系结构从以独立和物理 udbusiness单位为中心的数据集市到集中式两层中心辐射型数据仓库。 ud操作数据存储区是第三层,稍后将提供它来解决业务 ud的内部交互需求。日数据加载。虽然行业可用的体系结构都 udvalid,但我发现它们在效率(成本)和有效性(生产力)方面不是最优的。 ud在本文中,我正在提倡一种新的体系结构(Hybrid Architecture),其涵盖了业界所倡导的建筑。混合架构要求将企业原子数据和详细数据的获取,加载和整合到单集成企业数据存储(企业数据仓库)中,在同一实例中构建业务单元中心数据集市和运营数据存储(ODS) udof企业数据仓库。 ud为了突出显示不同 ud应用程序的数据仓库的作用,我们描述了为地理 udinformation信息系统(GIS)开发数据仓库的工作。通过评论RBC金融集团的案例,我们进一步研究了数据实践,质量和管理对金融机构的重要性。 udv ud基于 udHybrid体系结构的企业数据仓库的开发和部署产生了自己的问题和挑战。有机数据增长和业务需求以加载更多新数据将大大增加存储数据的数量。因此,用户数量将大大增加。企业 uddata仓库的肥胖,性能下降和导航困难是主要的问题。 ud本文采用了关联规则挖掘和社交网络来解决上述问题和挑战。我们描述了一种使用频繁模式挖掘和社交网络技术来发现数据仓库中不同社区 ud的方法。这些社区包括经常一起访问的表集,大多数时间一起检索的表集以及在查询中几乎一起出现的属性集。我们将重点放在讨论中的表格上; 不过,该模型足够通用以发现其他社区。我们首先通过将每个查询视为事务和表 udas项目来构建频繁模式挖掘模型。然后,我们挖掘关闭的频繁项目表集;这些项目集包括大多数一起访问的表 ud,因此应将它们视为存储和 udretrieval中的一个单元,以提高整体性能。我们利用社交网络的构建和 udanalysis查找相关表的最大集;这是更强大的方法,因为它适用于重叠的项目集的并集。我们导出 udclosed项目集之间的Jaccard距离,并通过添加表示 uddistance超过给定阈值的链接来构建表格的社交网络。分析所构建的网络以发现大多数一起访问的表的 udcommunity。报告的测试结果没有什么用,并证明了所开发方法的适用性和有效性。

著录项

  • 作者

    Rifaie Mohammad;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号