首页> 外文会议>MEDINFO >'Hybrid Topics' - Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
【24h】

'Hybrid Topics' - Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words

机译:“混合主题” - 通过添加网格描述符来促进主题的解释到单词包

获取原文

摘要

Extracting and understanding information, themes and relationships from large collections of documents is an important task for biomedical researchers. Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unveil hidden thematic information within large sets of documents. In this paper, we added MeSH descriptors to the bag-of-words assumption to generate 'hybrid topics', which are mixed vectors of words and descriptors. We evaluated this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus. Our results demonstrated that the coherence of 'hybrid topics' is higher than that of regular bag-of-words topics in the specialized corpus. We also found that the proportion of topics that are not associated with MeSH descriptors is higher in the specialized corpus than in the general corpus.
机译:从大集合文件中提取和理解信息,主题和关系是生物医学研究人员的重要任务。潜在的Dirichlet分配是一种无监督的主题建模技术,使用袋式假设已经广泛应用于大量文档中的隐藏主题信息。在本文中,我们将网格描述符添加到单词袋假设以生成“混合主题”,这是单词和描述符的混合载体。我们评估了对一般语料库和专业语料库中主题的质量和可解释性的方法。我们的结果表明,“混合主题”的一致性高于专业语料库中的常规文字主题的一致性。我们还发现,与网格描述符相关联的主题的比例在专业的语料库中比一般语料库更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号