首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >A Heuristic Approach for Website Classification with Mixed Feature Extractors
【24h】

A Heuristic Approach for Website Classification with Mixed Feature Extractors

机译:混合特征提取器的启发式网站分类方法

获取原文

摘要

We proposed an intelligent website classification schema based on deep neural networks using mixed featured extractors. With the guidance of supervised learning methods and iterative training, we use the gradient descent algorithm to model the website classification. This novel model is composed of four components, which includes a Website Encoder, a Text CNN Feature Extractor, a Bidirectional GRU Feature Extractor and a Fully Connected Classifier. It can extract multiple features at different granularities of a website. By using the concatenated mixed features taken from mixed feature extractors, our model can easily choose a suitable website class. We make extensive experiments on the realistic collected website dataset. The dataset is collected using domains extracted from DNS records of Telecom Operator. Compared the multiple widely used machine learning models and our novel model, results demonstrate the proposed classification schema outperforms the current models with the metrics precision, recall, F1, and accuracy. All of this can contribute to various web applications, such as malicious website detection, online advertising, etc.
机译:我们提出了使用混合特征提取器的基于深度神经网络的智能网站分类方案。在监督学习方法和迭代训练的指导下,我们使用梯度下降算法对网站分类进行建模。这个新颖的模型由四个部分组成,其中包括网站编码器,文本CNN特征提取器,双向GRU特征提取器和全连接分类器。它可以提取网站不同粒度的多个功能。通过使用从混合特征提取器获取的串联混合特征,我们的模型可以轻松地选择合适的网站类别。我们对现实收集的网站数据集进行了广泛的实验。使用从电信运营商的DNS记录中提取的域来收集数据集。通过比较多种广泛使用的机器学习模型和我们的新颖模型,结果表明,所提出的分类方案在度量精度,召回率,F1和准确性方面均优于当前模型。所有这些都可能导致各种Web应用程序的出现,例如恶意网站检测,在线广告等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号