首页> 外文会议>International Conference on Enterprise Information Systems >MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES
【24h】

MINING THE RELATIONSHIPS IN THE FORM OF THE PREDISPOSING FACTORS AND CO-INCIDENT FACTORS AMONG NUMERICAL DYNAMIC ATTRIBUTES IN TIME SERIES DATA SET BY USING THE COMBINATION OF SOME EXISTING TECHNIQUES

机译:通过使用某些现有技术的组合在时间序列数据集中的数值动态属性中挖掘易感因子和共入因子的形式。

获取原文

摘要

Temporal mining is a natural extension of data mining with added capabilities of discovering interesting patterns, inferring relationships of contextual and temporal proximity and may also lead to possible causeeffect associations. Temporal mining covers a wide range of paradigms for knowledge modeling and discovery. A common practice is to discover frequent sequences and patterns of a single variable. In this paper we present a new algorithm which is the combination of many existing ideas consists of the reference event as proposed in (Bettini, Wang et al. 1998), the event detection technique proposed in (Guralnik and Srivastava 1999), the large fraction proposed in (Mannila, Toivonen et al. 1997), the causal inference proposed in (Blum 1982) We use all of these ideas to build up our new algorithm for the discovery of multivariable sequences in the form of the predisposing factor and co-incident factor of the reference event of interest. We define the event as positive direction of data change or negative direction of data change above a threshold value. From these patterns we infer predisposing and co-incident factors with respect to a reference variable. For this purpose we study the Open Source Software data collected from SourceForge website. Out of 240+ attributes we only consider thirteen time dependent attributes such as Page-views, Download, Bugs0, Bugs1, Support0, Support1, Patches0, Patches1, Tracker0, Tracker1, Tasks0, Tasks1 and CVS. These attributes indicate the degree and patterns of activities of projects through the course of their progress. The number of the Download is a good indication of the progress of the projects. So we use the Download as the reference attribute. We also test our algorithm with four synthetic data sets including noise up to 50%. The results show that our algorithm can work well and tolerate the noise data.
机译:时间挖掘是数据挖掘的自然延伸,具有发现有趣模式的额外功能,推断语境和时间接近的关系,也可能导致可能的均方福密关联。颞挖掘涵盖知识建模和发现的广泛范式。常见做法是发现单个变量的频繁序列和模式。在本文中,我们提出了一种新的算法,这是许多现有思想的组合由(Bettini,Wang等,1998)中提出的参考事件,如(Guralnik和Srivastava 1999),大部分所提出的事件检测技术在(Mannila,Toivonen等人1997)中提出的(BLUM 1982)中提出的因果推断,我们使用所有这些想法来建立我们的新算法,以便以易感因子和共同事件的形式发现多变量序列感兴趣的参考事件因素。我们将事件定义为数据变化的正方向或数据的负方向更改阈值。从这些模式从我们推断出关于参考变量的易感和共入因子。为此目的,我们研究从SourceForge网站收集的开源软件数据。在240多个属性中,我们只考虑十三个时间依赖属性,如页面视图,下载,错误0,错误1,Support0,Support1,Patches0,Patches1,Tracker0,Tracker1,Tasks0,Tasks1和CV。这些属性通过他们的进度来表明项目活动的程度和模式。下载的数量是项目进度的良好指示。所以我们使用下载作为引用属性。我们还测试了我们的算法,具有四个合成数据集,包括高达50%的噪声。结果表明,我们的算法可以很好地运行并容忍噪声数据。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号