Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering

机译：训练数据集收集和Web内容过滤功能选择方法的评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper focuses on the main aspects of development of a qualitative system for dynamic content filtering. These aspects include collection of meaningful training data and the feature selection techniques. The Web changes rapidly so the classifier needs to be regularly re-trained. The problem of training data collection is treated as a special case of the focused crawling. A simple and easy-to-tune technique was proposed, implemented and tested. The proposed feature selection technique tends to minimize the feature set size without loss of accuracy and to consider interlinked nature of the Web. This is essential to make a content filtering solution fast and non-burdensome for end users, especially when content filtering is performed using a restricted hardware. Evaluation and comparison of various classifiers and techniques are provided.

机译：本文关注于动态内容过滤定性系统开发的主要方面。这些方面包括有意义的训练数据和特征选择技术的收集。 Web快速变化，因此需要定期对分类器进行重新训练。训练数据收集的问题被视为集中爬网的特例。提出，实施和测试了一种简单易调的技术。所提出的特征选择技术趋向于在不损失准确性的情况下最小化特征集的大小，并倾向于考虑Web的互连性质。这对于使内容过滤解决方案对于最终用户而言快速且不繁琐是必不可少的，尤其是在使用受限硬件执行内容过滤时。提供了各种分类器和技术的评估和比较。

著录项

来源
《International conference on artificial intelligence: methodology, systems, and applications》|2014年|129-138|共10页
会议地点
作者
Roman Suvorov; Ilya Sochenkov; Ilya Tikhomirov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Dynamic content filtering; text classification; automatic topic identification; active content recognition; feature selection; TF-IDF; thematic importance characteristic; information gain; focused crawling;

机译：动态内容过滤;文字分类自动主题识别;主动内容识别;特征选择; TF-IDF;主题重要性特征;信息获取;集中爬行;

相似文献

外文文献
中文文献
专利

1. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features [J] . Wan Cen, Freitas Alex A. Artificial Intelligence Review: An International Science and Engineering Journal . 2018,第2期

机译：基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于基于词组的分层特征选择方法的实证评估
2. Heuristic filter feature selection methods for medical datasets [J] . Genomics . 2020,第2期

机译：Heuuristic过滤器功能选择用于医疗数据集的选择方法
3. Heuristic filter feature selection methods for medical datasets [J] . Mehdi Alirezanejad, Rasul Enayatifar, Homayun Motameni, Genomics . 2020,第2期

机译：Heuuristic过滤器功能选择医疗数据集的选择方法
4. Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering [C] . Roman Suvorov, Ilya Sochenkov, Ilya Tikhomirov International Conference on Artificial Intelligence: Methodology, Systems, and Applications . 2014

机译：培训数据集收集和评估Web内容过滤的特征选择方法
5. Effect of metasite selection on the quality of World Wide Web information: A collection development approach to the evaluation of Web-based consumer health information on the treatment of hypercholesterolemia. [D] . Hogan, Linda. 2001

机译：站点选择对万维网信息质量的影响：一种收集开发方法，用于评估基于高胆固醇血症的基于Web的消费者健康信息。
6. Application of feature selection methods for automated clustering analysis: a review on synthetic datasets [O] . Aliyu Usman Ahmad, Andrew Starkey -1

机译：特征选择方法在自动聚类分析中的应用：综述综合数据集
7. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features [O] . Wan, Cen, Freitas, Alex A. 2017

机译：基于基因本体特征的生物信息学数据集分类的分层特征选择方法的实证评估

Training Datasets Collection and Evaluation of Feature Selection Methods for Web Content Filtering

摘要

著录项

相似文献

相关主题

期刊订阅