首页> 外文期刊>The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory >Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment
【24h】

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

机译:通过潜伏空间富集改善稀缺数据集的化学预测

获取原文
获取原文并翻译 | 示例
           

摘要

Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for pK(a) prediction of moderately sized molecular species using a combination of experimentally available pK(a) data and density functional theory-based characterizations of the (de)protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two data sets in a jointly trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.
机译:现代机器学习提供了加快新型化学物质的发现和表征的有希望的方法。然而,在许多领域,实验数据仍然昂贵且稀缺,并且计算模型对于有针对性的优点数据不可用。在这里,我们通过使用化学潜在空间丰富来报告一个有希望的途径来解决这一挑战,从而各个数据源在联合预测任务中组合,以便在数据稀缺应用中提高预测。使用基于(DE)质子化自由能的基于(DE)质子化的基于的基于的基于PK(A)数据和密度功能理论的特征的组合,对中等大小的分子物种的PK(A)预测进行了证明方法。一种新颖的AutoEncoder框架用于创建连续的化学潜空间,然后在单一和联合训练任务中使用的属性预测。通过将这两个数据集合结合在一个共同训练的AutoEncoder框架中,我们观察稀缺数据限制中的属性预测任务的相互改进。我们还展示了一个富集空间训练独特的丰富机制,由此对多余计算数据的培训可以减轻与稀缺实验数据相关的预测损失,并且有利地组织潜在空间。这些结果表明,不同的化学数据源可以有利地将具有潜在应用于数据稀缺化学学习任务的潜在应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号