Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

Iovanac Nicolae C.; Savoie Brett M.

首页> 外文期刊>The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory >Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

【24h】

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

机译：通过潜伏空间富集改善稀缺数据集的化学预测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for pK(a) prediction of moderately sized molecular species using a combination of experimentally available pK(a) data and density functional theory-based characterizations of the (de)protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two data sets in a jointly trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.

机译：现代机器学习提供了加快新型化学物质的发现和表征的有希望的方法。然而，在许多领域，实验数据仍然昂贵且稀缺，并且计算模型对于有针对性的优点数据不可用。在这里，我们通过使用化学潜在空间丰富来报告一个有希望的途径来解决这一挑战，从而各个数据源在联合预测任务中组合，以便在数据稀缺应用中提高预测。使用基于（DE）质子化自由能的基于（DE）质子化的基于的基于的基于PK（A）数据和密度功能理论的特征的组合，对中等大小的分子物种的PK（A）预测进行了证明方法。一种新颖的AutoEncoder框架用于创建连续的化学潜空间，然后在单一和联合训练任务中使用的属性预测。通过将这两个数据集合结合在一个共同训练的AutoEncoder框架中，我们观察稀缺数据限制中的属性预测任务的相互改进。我们还展示了一个富集空间训练独特的丰富机制，由此对多余计算数据的培训可以减轻与稀缺实验数据相关的预测损失，并且有利地组织潜在空间。这些结果表明，不同的化学数据源可以有利地将具有潜在应用于数据稀缺化学学习任务的潜在应用程序。

著录项

来源
《The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory》 |2019年第19期|共8页
作者
Iovanac Nicolae C.; Savoie Brett M.;
展开▼
作者单位

Purdue Univ Charles D Davidson Sch Chem Engn 480 Stadium Mall Dr W Lafayette IN 47906 USA;

Purdue Univ Charles D Davidson Sch Chem Engn 480 Stadium Mall Dr W Lafayette IN 47906 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类物理化学（理论化学）、化学物理学;
关键词

相似文献

外文文献
中文文献
专利

1. Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment [J] . Iovanac Nicolae C., Savoie Brett M. The journal of physical chemistry, A. Molecules, spectroscopy, kinetics, environment, & general theory . 2019,第19期

机译：通过潜伏空间富集改善稀缺数据集的化学预测
2. Using Transfer Learning for Improved Mortality Prediction in a Data-Scarce Hospital Setting: [J] . Thomas Desautels, Jacob Calvert, Jana Hoffman, Biomedical Informatics Insights . 2017,第1期

机译：在数据匮乏的医院中，使用转移学习来改善死亡率预测：
3. Dual assimilation of satellite soil moisture to improve streamflow prediction in data-scarce catchments [J] . Alvarez-Garreton Camila, Ryu Dongryeol, Western Andrew W., Water resources research . 2016,第7期

机译：卫星土壤水分的双重同化以改善数据稀缺流域的水流预测
4. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors [C] . Milo M. K. Martin, Pacia J. Harper, Daniel J. Sorin, 30th annual international symposium on computer architecture . 2003

机译：使用目标集预测来改善共享内存多处理器中的延迟/带宽权衡
5. Bayesian hierarchical spatial models to improve forest variable prediction and mapping with Light Detection and Ranging data sets. [D] . Babcock, Chad. 2014

机译：贝叶斯分层空间模型可通过“光检测”和“测距”数据集改善森林变量的预测和映射。
6. SAR by Space: Enriching Hit Sets from the Chemical Space [O] . Franca-Maria Klingler, Marcus Gastreich, Oleksandr O. Grygorenko, 2019

机译：太空搜救：从化学空间丰富命中集
7. Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment [O] . -1

机译：通过潜伏空间富集改善稀缺数据集的化学预测

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

摘要

著录项

相似文献

相关主题

期刊订阅