A framework for mining evolving trends in Web data streams using dynamic learning and retrospective validation

Olfa Nasraoui; Carlos Rojas; Cesar Cardona

首页> 外文期刊>Computer networks >A framework for mining evolving trends in Web data streams using dynamic learning and retrospective validation

【24h】

A framework for mining evolving trends in Web data streams using dynamic learning and retrospective validation

机译：使用动态学习和追溯验证来挖掘Web数据流中不断发展的趋势的框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The expanding and dynamic nature of the Web poses enormous challenges to most data mining techniques that try to extract patterns from Web data, such as Web usage and Web content. While scalable data mining methods are expected to cope with the size challenge, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. The harsh restrictions imposed by the "you only get to see it once" constraint on stream data calls for different computational models that may furthermore bring some interesting surprises when it comes to the behavior of some well known similarity measures during clustering, and even validation. In this paper, we study the effect of similarity measures on the mining process and on the interpretation of the mined patterns in the harsh single pass requirement scenario. We propose a simple similarity measure that has the advantage of explicitly coupling the precision and coverage criteria to the early learning stages. Even though the cosine similarity, and its close relative such as the Jaccard measure, have been prevalent in the majority of Web data clustering approaches, they may fail to explicitly seek profiles that achieve high coverage and high precision simultaneously. We also formulate a validation strategy and adapt several metrics rooted in information retrieval to the challenging task of validating a learned stream synopsis in dynamic environments. Our experiments confirm that the performance of the MinPC similarity is generally better than the cosine similarity, and that this outperformance can be expected to be more pronounced for data sets that are more challenging in terms of the amount of noise and/or overlap, and in terms of the level of change in the underlying profiles/topics (known sub-categories of the input data) as the input stream unravels. In our simulations, we study the task of mining and tracking trends and profiles in evolving text and Web usage data streams in a single pass, and under different trend sequencing scenarios.

机译：Web的不断扩展和动态性质对大多数试图从Web数据中提取模式的数据挖掘技术提出了巨大挑战，例如Web使用情况和Web内容。尽管可伸缩的数据挖掘方法有望应对规模挑战，但以连续的方式应对嘈杂数据不断发展的趋势，而又没有任何不必要的停顿和重新配置，仍然是一个开放的挑战。这种动态的单遍设置可以在挖掘不断发展的数据流的框架内进行转换。流数据上的“您只能看到一次”约束所施加的严格限制要求使用不同的计算模型，这可能会给聚类甚至验证过程中某些众所周知的相似性度量的行为带来一些有趣的惊喜。在本文中，我们研究了在苛刻的单次通过需求场景中，相似性度量对采矿过程和开采模式的解释的影响。我们提出了一种简单的相似性度量，该度量具有将精度和覆盖标准明确耦合到早期学习阶段的优势。即使在大多数Web数据聚类方法中普遍使用了余弦相似度及其近亲（例如Jaccard度量），但它们可能无法显式地寻求同时实现高覆盖范围和高精度的配置文件。我们还制定了一种验证策略，并使植根于信息检索中的多个指标适应了在动态环境中验证学习流摘要的艰巨任务。我们的实验证实，MinPC相似性的性能通常优于余弦相似性，并且对于噪声和/或重叠量更具挑战性的数据集，可以预期这种出色表现会更加明显。输入流解散时，基本配置文件/主题（输入数据的已知子类别）的变化级别的术语。在我们的模拟中，我们研究在单一趋势下以及在不同趋势排序方案下，在不断发展的文本和Web使用数据流中挖掘和跟踪趋势和配置文件的任务。

著录项

来源
《Computer networks》 |2006年第10期|p.1488-1512|共25页
作者
Olfa Nasraoui; Carlos Rojas; Cesar Cardona;
展开▼
作者单位

Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292, United States;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
mining evolving data streams; web clickstreams; web mining; text mining; user profiles;

机译：挖掘不断发展的数据流;Web点击流;Web挖掘;文本挖掘;用户配置文件;

相似文献

外文文献
中文文献
专利

1. A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites [J] . Amsaveni.K, Vydehi.S International Journal of Computer Trends and Technology . 2012,第4期

机译：Web用法挖掘框架，用于挖掘动态网站中不断发展的用户配置文件
2. A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites [J] . Amsaveni.K, Vydehi.S International Journal of Computer Trends and Technology . 2012,第4期

机译：Web用法挖掘框架，用于挖掘动态网站中不断发展的用户配置文件
3. A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites [J] . Nasraoui Olfa, Soliman Maha, Saka Esin, IEEE Transactions on Knowledge and Data Engineering . 2008,第2期

机译：Web用法挖掘框架，用于挖掘动态网站中不断发展的用户配置文件
4. A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment [C] . Edmond H. Wu, Michael K. Ng, Andy M. Yip, International Conference on Intelligent Data Engineering and Automated Learning(IDEAL 2004); 20040825-20040827; Exeter; GB . 2004

机译：数据流环境中挖掘不断发展的Web用户模式的聚类模型
5. A data mining and semantic Web framework for building a Web-based recommender system. [D] . Haruechaiyasak, Choochart. 2003

机译：一种数据挖掘和语义Web框架，用于构建基于Web的推荐器系统。
6. A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance [O] . Ge Song, Yunming Ye -1

机译：一个动态集成框架用于在类不平衡的情况下挖掘文本流
7. A Dynamic Web Mining Framework for E-Learning Recommendations using Rough Sets and Association Rule Mining [O] . A. Anitha, Dr.N. Krishnan 2011

机译：用于使用粗糙集和关联规则挖掘的电子学习建议的动态网络挖掘框架

A framework for mining evolving trends in Web data streams using dynamic learning and retrospective validation

摘要

著录项

相似文献

相关主题

期刊订阅