Chinese Document Keyword Extraction Algorithm Based on FP-growth

机译：基于FP-Grang的中文文献关键字提取算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In view of the problems of the existing keyword extraction algorithm, such as large amount of computation and complex calculation process, this paper proposes an algorithm based on FP-Growth to extract keyword from Chinese documents. The FP-Growth algorithm mines word co-occurrence information, excluding the interference of noise words; semantic similarity computation using lexical chain eliminates the influence of synonyms; using TF-IDF and feature fusion method, considering frequency, part of speech and the position of the words, combine TF-IDF with "double comparing method" to calculate the weight of the characteristic factors, and build words weight function to calculate final weight of the candidate words. Experimental results show that the proposed method improves the accuracy rate and recall rate by about 10% compared to the traditional TF-IDF.

机译：鉴于现有关键字提取算法的问题，例如大量计算和复杂计算过程，本文提出了一种基于FP-Grower的算法来从中文文档中提取关键字。 FP-Granges算法挖掘Word Co-Feationence信息，不包括噪声字的干扰;使用词汇链的语义相似性计算消除了同义词的影响;使用TF-IDF和特征融合方法，考虑频率，语音部分和单词的位置，将TF-IDF与“双比较方法”组合来计算特征因子的重量，并构建单词权重函数来计算最终重量候选人的话。实验结果表明，与传统的TF-IDF相比，该方法提高了精度率并召回率约10％。

著录项

来源
《International Conference on Smart City and Systems Engineering》|2016年|1 v.|共4页
会议地点
作者
Meng Zhao; Wanjun Yu; Wenjing Lu; Quan Liu; Jinxiao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Speech; Itemsets; Semantics; Feature extraction; Vocabulary; Tagging;

机译：语音;项目集;语义;特征提取;词汇;标记;

相似文献

外文文献
中文文献
专利

1. Keyword Extraction Based on tf/idf for Chinese News Document [J] . LI Juanzi, FAN Qina, ZHANG Kuo Wuhan University Journal of Natural Sciences . 2007,第5期

机译：基于tf / idf的中文新闻文献关键词提取
2. Automatic keyword extraction from documents based on multiple content-based measures [J] . KunYue, Wei-Yi Liu, Li-Ping Zhou International Journal of Computer Systems Science & Engineering . 2011,第2期

机译：基于多种基于内容的措施自动从文档中提取关键字
3. A visual attention-based keyword extraction for document classification [J] . Wu Xing, Du Zhikang, Guo Yike Multimedia Tools and Applications . 2018,第19期

机译：基于视觉注意的关键词提取，用于文档分类
4. Chinese Document Keyword Extraction Algorithm Based on FP-growth [C] . Meng Zhao, Wanjun Yu, Wenjing Lu, International Conference on Smart City and Systems Engineering . 2016

机译：基于FP增长的中文文档关键词提取算法
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification [O] . Jie Hu, Shaobo Li, Yong Yao, 2018

机译：基于专利分类的分布式表示的专利关键词提取算法
7. Algorithm of Keywords Extraction about Power Documents Based on Hadoop [O] . Tong Wang, Yongzhi Wang, Liang Jin, 2016

机译：基于Hadoop的电力文档的关键字提取算法

Chinese Document Keyword Extraction Algorithm Based on FP-growth

摘要

著录项

相似文献

相关主题

期刊订阅