...
首页> 外文期刊>Journal of Computer-Aided Molecular Design >Examining unsupervised ensemble learning using spectroscopy data of organic compounds
【24h】

Examining unsupervised ensemble learning using spectroscopy data of organic compounds

机译:使用有机化合物的光谱数据检查无监督集成学习

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract One solution to the challenge of choosing an appropriate clustering algorithm is to combine different clusterings into a single consensus clustering result, known as cluster ensemble (CE). This ensemble learning strategy can provide more robust and stable solutions across different domains and datasets. Unfortunately, not all clusterings in the ensemble contribute to the final data partition. Cluster ensemble selection (CES) aims at selecting a subset from a large library of clustering solutions to form a smaller cluster ensemble that performs as well as or better than the set of all available clustering solutions. In this paper, we investigate four CES methods for the categorization of structurally distinct organic compounds using high-dimensional IR and Raman spectroscopy data. Single quality selection (SQI) forms a subset of the ensemble by selecting the highest quality ensemble members. The Single Quality Selection (SQI) method is used with various quality indices to select subsets by including the highest quality ensemble members. The Bagging method, usually applied in supervised learning, ranks ensemble members by calculating the normalized mutual information (NMI) between ensemble members and consensus solutions generated from a randomly sampled subset of the full ensemble. The hierarchical cluster and select method (HCAS-SQI) uses the diversity matrix of ensemble members to select a diverse set of ensemble members with the highest quality. Furthermore, a combining strategy can be used to combine subsets selected using multiple quality indices (HCAS-MQI) for the refinement of clustering solutions in the ensemble. The IR + Raman hybrid ensemble library is created by merging two complementary “views” of the organic compounds. This inherently more diverse library gives the best full ensemble consensus results. Overall, the Bagging method is recommended because it provides the most robust results that are better than or comparable to the full ensemble consensus solutions.
机译:摘要 选择合适的聚类算法的一个解决方案是将不同的聚类组合成一个单一的共识聚类结果,称为聚类集成(CE)。这种集成学习策略可以跨不同领域和数据集提供更强大、更稳定的解决方案。遗憾的是,并非所有集成中的聚类都对最终数据分区有贡献。聚类集成选择 (CES) 旨在从大型聚类解决方案库中选择一个子集,以形成一个较小的聚类集成,其性能与所有可用聚类解决方案的集合一样好或更好。在本文中,我们研究了四种CES方法,这些方法使用高维红外和拉曼光谱数据对结构不同的有机化合物进行分类。单一质量选择 (SQI) 通过选择最高质量的集成成员来形成集成的子集。单一质量选择 (SQI) 方法与各种质量指数一起使用,通过包括最高质量的集成成员来选择子集。Bagging 方法通常应用于监督学习,通过计算集成成员之间的归一化互信息 (NMI) 和从完整集成的随机采样子集生成的共识解决方案对集成成员进行排名。分层聚类和选择方法 (HCAS-SQI) 使用集成成员的多样性矩阵来选择一组具有最高质量的多样化集成成员。此外,组合策略可用于组合使用多个质量指数 (HCAS-MQI) 选择的子集,以优化集成中的聚类解决方案。IR + 拉曼混合集成库是通过合并有机化合物的两个互补“视图”而创建的。这个本质上更多样化的库提供了最佳的完整集成共识结果。总体而言,推荐使用 Bagging 方法,因为它提供了最可靠的结果,优于或可与完整的集成共识解决方案相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号