RESEARCH ON THEMATIC WORD EXTRACTION BASED ON HIGH QUALITY DATA SOURCES ON THE WEB

机译：基于Web高质量数据源的主题词提取研究。

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The data source selection is one of the most important processes for domain thematic word extraction.Most of the previous work mainly researched on how to the extract keywords from existing corpus with good algorithms.Meanwhile, there is very limited research on how to explore good data sources for text corpus collection.This paper researches on how to use the online web tools to identify high quality data sources.Then, considering the characteristics of subject keywords, we propose an improved TF-IDF weight calculation formula for keywords sorting, and extract the field keywords from the documents by recalculating the weights of candidate words with the improved method.Finally, taking the Chinese herbal medicine field as an example, our result shows that we can have large higher accuracy and higher recall rate at much lower cost with our method given in this paper.

机译：数据源的选择是领域主题词提取中最重要的过程之一。以前的工作主要集中在如何利用良好的算法从现有语料库中提取关键词方面进行的研究，而关于如何探索良好的数据的研究非常有限。本文研究了如何使用在线Web工具识别高质量的数据源。然后，考虑主题关键字的特征，提出了一种改进的TF-IDF权重计算公式，用于关键字排序，并提取了改进后的方法通过重新计算候选词的权重来对文档中的字段关键词进行重新计算。最后，以中草药领域为例，我们的结果表明，使用该方法可以以较低的成本获得较高的准确性和较高的查全率。在本文中给出。

著录项

来源
《International conference on computer technology and development》|2012年|1364-1370|共7页
会议地点
作者
DONGHUA PAN; JUN SUN;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
High quality data source identification; Subject terms extraction; An improved TF-IDF algorithm;

机译：高质量数据源识别;主题词提取;改进的TF-IDF算法;

相似文献

外文文献
中文文献
专利

1. Web News Data Extraction Technology Based on Text Keywords [J] . Kun Zhang Complexity . 2021,第a期

机译：基于文本关键字的网络新闻数据提取技术
2. Communicating Thematic Data Quality with Web Map Services [J] . Charles J. Roberts, Daniel D#xED, az, ISPRS International Journal of Geo-Information . 2015,第4期

机译：与Web Map Services交流主题数据质量
3. Web Data as Academic and Business Quality Estimates: A Comparison of Three Data Sources [J] . Liwen Vaughan, Rongbin Yang Journal of the American Society for Information Science and Technology . 2012,第10期

机译：Web数据作为学术和业务质量评估：三种数据源的比较
4. RESEARCH ON THEMATIC WORD EXTRACTION BASED ON HIGH QUALITY DATA SOURCES ON THE WEB [C] . DONGHUA PAN, JUN SUN International Conference on Computer Technology and Development . 2012

机译：基于高质量数据源对网上的主题词提取研究
5. Query -based selection and integration of semantic web data sources [D] . Qasem, Abir 2009

机译：基于查询的语义Web数据源选择和集成
6. A Web-Based Knowledge Translation Resource for Families and Service Providers (The F-Words in Childhood Disability Knowledge Hub): Developmental and Pilot Evaluation Study [O] . Andrea Cross, Peter Rosenbaum, Danijela Grahovac, 2018

机译：针对家庭和服务提供商的基于Web的知识翻译资源（儿童残疾知识中心中的 F词）：发展和先导评估研究
7. Structure based Data Extraction from Hidden Web Sources: A Review [O] . A. K. Sharma 2013

机译：基于结构的隐藏Web源数据提取：综述

RESEARCH ON THEMATIC WORD EXTRACTION BASED ON HIGH QUALITY DATA SOURCES ON THE WEB

摘要

著录项

相似文献

相关主题

期刊订阅