Enhanced N-Gram Extraction Using Relevance Feature Discovery

机译：使用相关特征发现增强的N-Gram提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf~*idf and Rocchio.

机译：由于提取的特征数量众多，因此保证向用户或主题描述相关知识的提取特征的质量是一个挑战。现有的最流行的基于术语的特征选择方法遭受嘈杂的特征提取，这与用户需求无关（嘈杂）。一种流行的方法是提取短语或n-gram来描述相关知识。但是，提取的n-gram和短语通常会包含很多噪音。本文提出了一种减少n-gram噪声的方法。该方法首先提取更具体的特征（术语）以去除嘈杂的特征。然后，该方法使用扩展的随机集，根据n-gram在文档中的分布以及它们在n-grams中的术语分布，对n-gram进行精确加权。所提出的方法不仅减少了提取的n-gram的数量，而且提高了性能。对路透社语料库第1卷（RCV1）数据收集和TREC主题的实验结果表明，所提出的方法明显优于Okapi BM25，tf〜* idf和Rocchio支持的最新方法。

著录项

来源
《Australasian joint conference on artificial intelligence》|2013年|453-465|共13页
会议地点
作者
Mubarak Albathan; Yuefeng Li; Abdulmohsen Algarni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature selection; relevance feedback; terms weight; n-gram extraction;

机译：功能选择;相关性反馈;重量n-gram提取;

相似文献

外文文献
中文文献
专利

1. Apriori and N-gram Based Chinese Text Feature Extraction Method [J] . 王晔, 黄上腾上海交通大学学报（英文版） . 2004,第004期
2. Convolutional neural networks for relevance feedback in content based image retrieval A Content based image retrieval system that exploits convolutional neural networks both for feature extraction and for relevance feedback [J] . Lorenzo Putzu, Luca Piras, Giorgio Giacinto Multimedia Tools and Applications . 2020,第37a38期

机译：基于内容的图像检索的相关反馈的卷积神经网络基于内容的图像检索系统，用于利用特征提取和相关性反馈的卷积神经网络
3. Combining wavelet-based feature extractions with relevance vector machines for stock index forecasting [J] . Shian-Chang Huang, Tung-Kuang Wu Expert Systems . 2008,第2期

机译：将基于小波的特征提取与相关向量机相结合以进行股指预测
4. Enhanced N-Gram Extraction Using Relevance Feature Discovery [C] . Mubarak Albathan, Yuefeng Li, Abdulmohsen Algarni Australian Joint Conference on Artificial Intelligence . 2013

机译：使用相关性特征发现增强了N-GRAM提取
5. Kaizen Programming with Enhanced Feature Discovery: An Automated Approach to Feature Selection and Feature Discovery for Prediction Models [D] . Stelmack, John. 2020

机译：Kaizen编程，具有增强功能发现：用于预测模型的特征选择和特征发现的自动方法
6. Fault Feature Extraction and Diagnosis of Rolling Bearings Based on Enhanced Complementary Empirical Mode Decomposition with Adaptive Noise and Statistical Time-Domain Features [O] . Liwei Zhan, Fang Ma, Jingjing Zhang, 2019

机译：基于自适应噪声和统计时域特征的增强型互补经验模态分解的滚动轴承故障特征提取与诊断
7. MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System [O] . Muhammad Ali, Stavros Shiaeles, Gueltoum Bendiab, 2020

机译：玛格拉：机器学习和N-GRAM恶意软件特征提取和检测系统
8. Mining Specific and General Features in Both Positive and Negative Relevance Feedback. QUT E-Discovery Lab at the TREC'09 Relevance Feedback Track [R] . Li, Y., Tao, X., Algarni, A., 2009

机译：在正相关反馈和负相关反馈中挖掘特定和一般特征。在TREC'09相关反馈轨道上的QUT电子发现实验室

Enhanced N-Gram Extraction Using Relevance Feature Discovery

摘要

著录项

相似文献

相关主题

期刊订阅