VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling

Nazarova Antonina L.; Nakano Aiichiro

首页> 外文期刊>Machine Learning and Knowledge Extraction >VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling

【24h】

VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling

机译：

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning represents a milestone in data-driven research, including material informatics, robotics, and computer-aided drug discovery. With the continuously growing virtual and synthetically available chemical space, efficient and robust quantitative structure-activity relationship (QSAR) methods are required to uncover molecules with desired properties. Herein, we propose variable-length-array SMILES-based (VLA-SMILES) structural descriptors that expand conventional SMILES descriptors widely used in machine learning. This structural representation extends the family of numerically coded SMILES, particularly binary SMILES, to expedite the discovery of new deep learning QSAR models with high predictive ability. VLA-SMILES descriptors were shown to speed up the training of QSAR models based on multilayer perceptron (MLP) with optimized backpropagation (ATransformedBP), resilient propagation (iRPROP(-)), and Adam optimization learning algorithms featuring rational train-test splitting, while improving the predictive ability toward the more compute-intensive binary SMILES representation format. All the tested MLPs under the same length-array-based SMILES descriptors showed similar predictive ability and convergence rate of training in combination with the considered learning procedures. Validation with the Kennard-Stone train-test splitting based on the structural descriptor similarity metrics was found more effective than the partitioning with the ranking by activity based on biological activity values metrics for the entire set of VLA-SMILES featured QSAR. Robustness and the predictive ability of MLP models based on VLA-SMILES were assessed via the method of QSAR parametric model validation. In addition, the method of the statistical H-0 hypothesis testing of the linear regression between real and observed activities based on the F-2,F-n-2 -criteria was used for predictability estimation among VLA-SMILES featured QSAR-MLPs (with n being the volume of the testing set). Both approaches of QSAR parametric model validation and statistical hypothesis testing were found to correlate when used for the quantitative evaluation of predictabilities of the designed QSAR models with VLA-SMILES descriptors.

著录项

来源
《Machine Learning and Knowledge Extraction》 |2022年第3期|715-737|共23页
作者
Nazarova Antonina L.; Nakano Aiichiro;
展开▼
作者单位

Univ Southern Calif;

Univ Southern Calif;

Univ Southern Calif;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
machine learning; deep learning; neural networks; SMILES; descriptors; QSAR; DRUG DISCOVERY; RATIONAL SELECTION; TEST SETS; PREDICTION; BEWARE; CHEMBL;

相似文献

外文文献
中文文献

1. Additive SMILES-based optimal descriptors in QSAR modelling bee toxicity: Using rare SMILES attributes to define the applicability domain. [J] . Toropov AA, Benfenati E Bioorganic and medicinal chemistry . 2008,第9期

机译：Additive SMILES-based optimal descriptors in QSAR modelling bee toxicity: Using rare SMILES attributes to define the applicability domain.
2. Monte-Carlo method-based QSAR model to discover phytochemical urease inhibitors using SMILES and GRAPH descriptors [J] . Kumar SambhavChopdar, Ganesh ChandraDash, Pranab KishorMohapatraBinataNayakMukesh KumarRaval Journal of Biomolecular Structure and Dynamics . 2022,第12期

机译：Monte-Carlo method-based QSAR model to discover phytochemical urease inhibitors using SMILES and GRAPH descriptors
3. The efficiency of ligand-receptor interaction information alone as new descriptors in QSAR modeling via random forest artificial neural network [J] . Mozafari Zeinab, Arab Chamjangali Mansour, Beglari MozhganDoosti Rahele Chemical biology and drug design . 2020,第2期

机译：The efficiency of ligand-receptor interaction information alone as new descriptors in QSAR modeling via random forest artificial neural network
4. 非线性系统的神经网络内模控制（Neural Network Internal Model Control of Nonlinear System） [C] . Chinese Control Conference vol.2; 20040810-13; Wuxi(CN) . 2004

机译：非线性系统的神经网络内模控制（Neural Network Internal Model Control of Nonlinear System）
5. Analyzing and Improving Compositionality in Neural Language Models =分析和改善神经语言模型的组成性 [D] . Yu, Lang. 2021

机译：Analyzing and Improving Compositionality in Neural Language Models =分析和改善神经语言模型的组成性
6. Bending invariant correspondence matching on 3D models with feature descriptor. [O] . 2010

机译：Bending invariant correspondence matching on 3D models with feature descriptor.

VLA-SMILES: Variable-Length-Array SMILES Descriptors in Neural Network-Based QSAR Modeling

摘要

著录项

相似文献

相关主题

期刊订阅