首页> 外文会议>International Conference on Tools with Artificial Intelligence >Expanding Science and Technology Thesauri from Bibliographic Datasets using Word Embedding
【24h】

Expanding Science and Technology Thesauri from Bibliographic Datasets using Word Embedding

机译:使用Word Embedding扩展从书目数据集的科学和技术叙述

获取原文

摘要

The use of thesauri and taxonomies for science and technology information in scientometrics has been attracting attention. However, manual construction and maintenance of thesauri is expensive and requires significant time; thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method to expand an existing thesaurus using the abstracts of articles from state-of-the-art technological domains with limited structured information. Specifically, we consider a method for properly allocating new terms to the hierarchical structures of an existing thesaurus using rapidly evolving word embedding. In an experiment, word vectors of 500 degrees are constructed from 567,000 biomedical articles and are clustered after dimension reduction using principal component analysis. Then, semantic relations are estimated based on the spatial relations between the new term and any of the terms in the thesaurus. We then conducted a comparison of the results obtained from three experts. In future, we will develop a recommendation system for new terms related to the existing terms to support semi-automatic thesaurus maintenance.
机译:在科学资料学中使用叙词和分类学,科学和技术信息一直吸引着注意力。然而,手动构建和维护叙词是昂贵的并且需要很大的时间;因此,正在积极研究半自动构造和维护的方法。我们提出了一种使用来自最先进的技术领域的文章的摘要来扩展现有词库的方法,具有有限的结构化信息。具体地,我们考虑使用快速发展的单词嵌入将新术语适当地分配新术语的方法。在实验中,500度字向量从567000米的生物医学制品构造和利用主成分分析降维之后被聚类。然后,基于新术语与词库中的任何术语之间的空间关系估计语义关系。然后,我们进行了从三位专家获得的结果的比较。未来,我们将开发一个与现有术语相关的新条款推荐系统,以支持半自动词库维护。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号