A text mining approach for automatic construction of hypertexts

Hsin-Chang Yang; Chung-Hong Lee

首页> 外文期刊>Expert systems with applications >A text mining approach for automatic construction of hypertexts

【24h】

A text mining approach for automatic construction of hypertexts

机译：自动构建超文本的文本挖掘方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional 'flat' texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is npt surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms.

机译：在过去的十年中，对自动超文本构建的研究迅速兴起，因为迫切需要将大量的旧文档转换为网页。与传统的“扁平”文本不同，超文本包含许多导航超链接，这些导航超链接指向一些相关的超文本或同一超文本的位置。传统上，这些超链接是由网页的创建者在有或没有某些创作工具的帮助下构造的。然而，每天产生的大量文件阻止了这种手动构造。因此，内容提供商需要一种自动的超文本构造方法来有效地产生可被网络冲浪者使用的足够信息。尽管大多数网页包含许多非文本数据，例如图像，声音和视频剪辑，但是文本数据仍然构成有关页面信息的主要部分。因此，令人惊讶的是，大多数自动超文本构造方法都继承自传统的信息检索研究。在这项工作中，我们将提出一种基于文本挖掘方法的新的自动超文本构造方法。我们的方法应用自组织映射算法对训练语料库中的文本文档进行聚类，并生成两个映射。然后，我们使用这些地图来标识这些培训文档中一些重要超链接的来源和目的地。然后，将构造的超链接插入培训文档中，以将其转换为超文本形式。此类翻译后的文档将构成新的语料库。传入的文档也可以转换为超文本形式，并通过相同的方法添加到语料库中。我们的方法已经在从新闻专线站点收集的一组at文本文档上进行了测试。尽管我们仅使用中文文本文档，但是我们的方法可以应用于可以转换为一组索引术语的任何文档。

著录项

来源
《Expert systems with applications》 |2005年第4期|p.723-734|共12页
作者
Hsin-Chang Yang; Chung-Hong Lee;
展开▼
作者单位

Department of Information Management, Chang Jung University, Tainan 711, Taiwan, ROC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
automatic hypertext construction; self-organizing maps; text mining;

机译：自动超文本构建;自组织地图;文本挖掘;

相似文献

外文文献
中文文献
专利

1. A method for automatic construction of learning contents in semantic web by a text mining approach [J] . Hsin-Chang Yang International journal of knowledge and learning . 2006,第1a2期

机译：一种基于文本挖掘的语义网学习内容自动构建方法
2. Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study [J] . Genghao Li, Bing Li, Langlin Huang, JMIR Medical Informatics . 2020,第6期

机译：基于微博的抑郁域词典自动构建：文本挖掘研究
3. Automatic construction of gene relation networks using text mining and gene expression data. [J] . Karopka T, Scheel T, Bansemer S, Medical informatics and the Internet in medicine . 2004,第2期

机译：利用文本挖掘和基因表达数据自动构建基因关系网络。
4. Automatic Hypertext Construction through a Text Mining Approach by Self-Organizing Maps [C] . Hsin-Chang Yang, Chung-Hong Lee Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining . 2002

机译：通过自组织地图通过文本挖掘方法自动超文本构造
5. An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measures [D] . Qahl, Salha Hassan Muhammed. 2014

机译：使用文本挖掘和相似度度量的神圣文本之间的自动相似度检测引擎
6. Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach [O] . Xiang Ren, Ahmed El-Kishky, Chi Wang, -1

机译：大规模文本语料库的自动实体识别和键入：一种短语和网络挖掘方法
7. Experiments On The Automatic Construction Of Hypertext From Texts [O] . Alan F. Smeaton, Patrick J. Morrissey 1995

机译：从文本中自动构建超文本的实验

A text mining approach for automatic construction of hypertexts

摘要

著录项

相似文献

相关主题

期刊订阅