This is an archive of papers published by the staff and faculty of Fox Chase Cancer Center. For questions about content, please contact Talbot Research Library
Last updated on
Raddi RM , Voelz VA
Stacking Gaussian processes to improve [Formula: see text] predictions in the SAMPL7 challenge
J Comput Aided Mol Des. 2021 sep;35(9) :953-961
PMID: 34363562 URL: https://www.ncbi.nlm.nih.gov/pubmed/34363562
AbstractAccurate predictions of acid dissociation constants are essential to rational molecular design in the pharmaceutical industry and elsewhere. There has been much interest in developing new machine learning methods that can produce fast and accurate pKa predictions for arbitrary species, as well as estimates of prediction uncertainty. Previously, as part of the SAMPL6 community-wide blind challenge, Bannan et al. approached the problem of predicting [Formula: see text]s by using a Gaussian process regression to predict microscopic [Formula: see text]s, from which macroscopic [Formula: see text] values can be analytically computed (Bannan et al. in J Comput-Aided Mol Des 32:1165-1177). While this method can make reasonably quick and accurate predictions using a small training set, accuracy was limited by the lack of a sufficiently broad range of chemical space in the training set (e.g., the inclusion of polyprotic acids). Here, to address this issue, we construct a deep Gaussian Process (GP) model that can include more features without invoking the curse of dimensionality. We trained both a standard GP and a deep GP model using a database of approximately 3500 small molecules curated from public sources, filtered by similarity to targets. We tested the model on both the SAMPL6 and more recent SAMPL7 challenge, which introduced a similar lack of ionizable sites and/or environments found between the test set and the previous training set. The results show that while the deep GP model made only minor improvements over the standard GP model for SAMPL6 predictions, it made significant improvements over the standard GP model in SAMPL7 macroscopic predictions, achieving a MAE of 1.5 [Formula: see text].
Notes1573-4951 Raddi, Robert M Voelz, Vincent A Orcid: 0000-0002-1054-2124 R01GM123296/GM/NIGMS NIH HHS/United States R01GM124270/GM/NIGMS NIH HHS/United States Journal Article Netherlands J Comput Aided Mol Des. 2021 Aug 7. doi: 10.1007/s10822-021-00411-8.