Introduction
The pursuit of accurate prediction of solubility of drugs from molecular structure is still evolving and continues to be challenging [1-7]. It had been proposed that shortfalls have been due to the lack of high-quality solubility data from the chemical space of drugs. But there has been some pushback to that view [6]. Since 2011, we have been actively collating and harmonizing published values of pH-dependent aqueous solubility of drug-like and drug-relevant molecules of importance in the discovery-to-development stages of pharmaceutical research, aggregated in the Wiki-pS0™ database (in-ADME Research). A book tentatively entitled “Predicting Solubility of New Drugs - Handbook of Critically Curated Data for Pharmaceutical Research” is under review by a publisher [7]. It tabulates solubility data up to 2022. The collection comprises 3695 different substances, with 7619 entries of intrinsic solubility (uncharged form). Considerable effort has been put into deciding that the data are of high quality, based on carefully selected published sources, guided in part by the best-practices ‘white papers’ recently published [8-12]. Just about all entries in the database are referenced to primary sources.
The reproducibility of statistical methods to predict solubility at best has hovered around the root mean square error (RMSE) of 0.6 log unit but is typically RMSE > 1 in many studies [1-5]. On the other hand, it has been firmly estimated that the average interlaboratory reproducibility can be as low as 0.18 log unit in carefully curated databases, which includes correcting reported solubility for ionization (i.e. deriving intrinsic solubility, S0) and by normalizing for temperature (by transforming measurements performed in the range 10-50 to 25 °C) [7,13-20]. Consistent solubility unit conversions, methods of phase separation, and procedures for measuring pH also play critical underpinnings in data quality [8-12].
This review discusses a recent series of five interrelated publications [16-20], where the use of several computational methods to predict intrinsic solubility were explored: (a) Yalkowsky General Solubility Equation (GSE) [21-26], (b) Abraham solvation equation (ABSOLV) [27-29], (c) Avdeef-Kansy Flexible-Acceptor General Solubility Equation (a.k.a., GSE(Φ,B)) [18], (d) Consensus of ABSOLV and GSE(Φ,B) [19], and (e) Breiman Random Forest regression (RFR) statistical machine learning (ML) method [30-33]. The above data-driven methods were trained with the Wiki-pS0 database. The traditional GSE is often considered pre-trained. It is popular for its simplicity and ease of use. New prediction methods are often benchmarked against the GSE.
This review concludes with the introduction of a new variant method induced from the above five studies, called the Exclusive Or (‘XOR’) Decision Tree model, drawing on ABSOLV and/or GSE(Φ,B) models, depending on the value of Φ. The method may be useful when a large diverse database of intrinsic solubility values of drug-like or drug-relevant molecules is available. It mirrors the Consensus model [19].
Analytic continuity of methods to predict solubility of drug-like molecules
The solubility (intrinsic, log molarity units) calculated by GSE depends on the value of the octanol-water partition coefficient (measured log P or calculated clog P) and the measured (or calculated) melting point (mp / °C) of the molecule. No further training is required for this thermodynamically well-founded legacy equation.
The Abraham and Le [28] ABSOLV equation to predict solubility takes the form:
In the multiple linear regression (MLR) equation, the log S0 is the dependent variable (measured log intrinsic molar solubility) and the independent variables are the five solute solvation descriptors accounting for the energy of transfer of solute from solid to the solution phase: A is the sum of H-bond acidity (donor potential), B is the sum of H-bond basicity (acceptor potential), Sπ is the dipolarity/polarizability (subscripted here, so as not to be confused with solubility), E is an excess molar refractivity in units of (cm3·mol-1)/10, and V is the McGowan characteristic molar volume in units of (cm3·mol-1)/100. The a0 - a6 constants in Eq. (2) are determined by regression based on the training database of intrinsic solubility values. The five solvation descriptors may be calculated from 2D structure (provided in SMILES notation or as coordinates in a ‘mol’ file) using the program ABSOLV [29] (cf.www.acdlabs.com).
As detailed elsewhere [18-20], the Flexible-Acceptor model, GSE(Φ,B), was developed as a critical expansion of the legacy GSE, following an exhaustive search of descriptors in a principal components analysis. For the original three constants (0.5, -1.0, -0.01) in Eq. (1), the first two were fitted to an exponential function of the sum of two descriptors, Φ (Kier’s [34] molecular flexibility index) and B (Abraham’s [27-29] H-bond acceptor potential). The third crystal-lattice contribution term was best characterized as a linear function of Φ+B.
The molecular flexibility descriptor, Φ, has been defined [34] in terms of structural attributes (chains, rings, branches, atom counts) as Φ = 1κ· 2κ /NHA, where 1κ and 2κ are the first- and second-order topological shape indices, and NHA is the heavy atom count in the molecule (RDKit descriptors fromhttp://www.rdkit.org). In the Wiki-pS0 database, Φ values ranged from 0.4 to 43. For molecules with molecular weight > 500 Da, the 630 Φ values range from 4.9 to 43, averaging 12. B (cf. Eq.2) for the large molecules ranges from 0.4 to 12.9, averaging 3.6. In the development of the SGE(Φ,B) model, several combinations of Φ and single Abraham descriptors were examined. The sum of Φ and B improved performance over just Φ in Eq. (3) [18-20].
The eight b0-b7 constants in the GSE(Φ,B) model were determined in two steps. (i) The log S0 values in the training set were sorted on Φ+B. The sorted data were then divided into 20 bins, each of near-equal values of Φ+B. Seven of the lowest-(Φ+B) bins contained an average of nearly 1000 log S0 values per bin. The data in each bin were then analyzed to find the best-fit c0, c1 and c2 constants in the first line of Eq. (3), using PLS regression. (ii) The resultant 20 c0, c1, and c2 constants from the bins were then displayed in three plots against (Φ,B) values, to suggest possible nonlinear fitting equations. From this, the analytical expressions for the c-coefficients were determined by standard nonlinear least-squares methods, as detailed elsewhere [18-20].
Of the new ML statistical approaches, the RFR method is thought to be among the premier performers in prediction accuracy, although deep-learning neural network methods may be as good [3,4]. The RFR software is freely downloadable and is easy to use (cf. ’random forest’ library for the R statistical software [31-33]). The method works by constructing an ensemble of hundreds of decision trees based on a random selection of a portion of the training set of solubility measurements, using hundreds of randomly-grouped descriptors provided by the user. It is not possible to state a simple explicit equation, like Eqs. (1)-(3), for the RFR method.
In this study, all metrics expressed as the coefficient of determinations (r2) and root-mean square errors (RMSE) are of the ‘validation type’ [35], unless otherwise indicated (see Abbreviations and definitions). Also, all logarithm functions are with reference to base 10.
Random forest regression trained with the Wiki-pS0 Database
In the first of our recent solubility prediction studies [16], four test sets were examined:
Test Set 1 – Yalkowsky-Banerjee [22] set of 21 organic nonelectrolytes (6 solid and 3 liquid poorly-soluble pesticides, 11 older drugs, and a dye/laxative molecule), with the test set mean log S0 = -3.85 (log molarity). This set has been widely tested by other investigators, to assess the effectiveness of prediction models.
Test Set 2 – Hopfinger et al. [2] set of 28 well-studied drugs (all ionizable), with mean log S0 = -4.03. This test was part of the first solubility challenge [1,2].
Test Set 3 – Llinas-Avdeef [3] high-consensus ‘tight’ set of 100 drugs, with the mean log S0 = -4.03. This test was part of the second solubility challenge [3,4].
Test Set 4 – Llinas-Avdeef [3] low-consensus ‘loose set’ set 32 drugs, with the mean log S0 = -5.24. This test was also used in the second solubility challenge [3,4].
It was found that all the predictions generated negative bias (-0.08 to -0.65), in many cases suggesting that solubility of low-soluble compounds was overestimated. RFR outperformed the GSE and ABSOLV methods on all common metrics [35], averaged over the four test sets: r2avg = (0.69, 0.47, 0.42), RMSEavg = (0.92, 1.17, 1.17), MPPavg (measure of prediction performance [2]) = (51, 38 and 32 %), for RFR, ABSOLV and GSE, respectively. It is interesting that the RFR method predicted Yalkowsky-Banerjee Set 1 only marginally well: r2 = 0.82, RMSE = 0.83, MPP = 57 %, bias = -0.23. When 115 practically-insoluble agrochemicals were added to the Wiki-pS0 database (which had been devoid of such pesticides), the performance improved: r2 = 0.89, RMSE = 0.63, MPP = 71 %, bias = +0.02. This confirmed the importance of matching the chemical space of test molecules with those in the training set. However, the ability to predict the solubility of drugs to the level of the quality of measured data remained out of reach (RMSE 0.6 vs. 0.2). Improvements in the methods (e.g. more effective descriptors) [6] and better coverage of the clustered [36] chemical space of drugs were called for.
The principal component analysis (PCA) based on the 30-most important descriptors identified in RFR yielded a scores plot of the first two principal components for the training set solubility, showing a very dense symmetrical distribution about the origin for molecules with molecular weights (MW<500 Da). Large molecules (MW>800 Da) were sparsely (but near diagonally) represented in the bottom-right quadrant, giving the overall distribution a ‘comet-like’ head-tail appearance [16]. Small molecules in the Lipinski [36] Rule of 5 (Ro5) chemical space, populating the ‘head,’ thus appeared to have very different distributions than those beyond the Ro5 (bRo5) space in the ‘tail.’
Small Ro5 molecules can predict the solubility of large bRo5 molecules, using the RFR method
In our second study [17], the above ‘comet-like’ PCA distribution enticed us to explore whether small molecules (Ro5 space) in a training set could be used to predict the solubility of large molecules (bRo5 space), although at first, we were not optimistic of a good outcome. The molecules with MW>800 Da were selected as the test set, with the rest of the database molecules serving as the training set.
Test Set 5 – Avdeef-Kansy [17] - test set of 31 ‘big’ drug-like/drug molecules with MW 802-1882 Da (e.g. cyclosporine A, gramicidin A, leuprolide, nafarelin, oxytocin, vancomycin), with mean log S0 = -4.52.
It was found the RFR method spectacularly distinguished itself from the simpler GSE and ABSOLV models: r2 = (0.37, -5.24, -3.82), RMSE = (1.07, 3.36, 2.95), MPP = (42, 10 and 16 %), bias = (+0.30, +2.64, +0.16), for RFR, ABSOLV and GSE, respectively. In probing this further, the first two normally-fixed parameters in the GSE(classic) model (+0.5 and -1.0) were ‘re-trained’, to yield log S0GSESMALL = -0.28-0.83 clogP -0.01·(mp - 25). The modified versions of the Yalkowsky’s equation used as the training set produced metrics comparable to those of the RFR method: r2 = 0.33, RMSE = 1.10, MPP = 30 %, and bias = +0.04. A similar treatment for the large molecules yielded log S0GSEBIG = -1.77-0.40 clogP -0.01·(mp - 25). These modifications showed that c0 and c1 could be variable coefficients, rather than constants. It was this observation that pointed the way to the development of the Flexible-Acceptor model, GSE(Φ,B), which turned out to be quite an improvement over the classic equation, providing realistic coverage of the chemical space of drugs up to MW of about 2000 Da.
Flexible-acceptor GSE(Φ,B) for predicting the solubility of large molecules
When it became evident that the GSE might be beneficially re-trained to predict a broader class of compounds, especially large drug-like molecules, a search was made for analytical functions which could represent the three traditional constants in terms of meaningful property descriptors, guided by the PCA distribution mentioned above [16]. The novel Flexible-Acceptor model (cf. Eq. (3)), GSE(Φ,B), was the outcome of the effort [18].
As the last section indicated, it had been a challenge to predict solubility of large molecules using simple but easily transparent and interpretable models such as GSE and ABSOLV (as indicated by r2 < 0 and RMSE ≥ 3). RFR showed promise, but this ML method can be opaque, given that over 200 calculated descriptors are often used in the random mixing of portions of the learning set with randomly-selected subsets of descriptors. Nevertheless, there are good reasons to attempt the prediction of the solubility of large molecules. Many drugs (mostly derived from natural products) in immunosuppression, oncology, and for the treatment of infectious/viral diseases are large, lipophilic, and possess many H-bond acceptors. Literature discussions highlight the promise of therapeutic opportunities for ‘beyond the Rule of 5’ (bRo5) molecules [37-44]. Caron and colleagues [42-44] have paved the way for recognizing the importance of Kier’s molecular flexibility index, Φ, in predicting the physical properties of bRo5 molecules.
Flexible molecules with the potential to form intramolecular H-bonds may possess enhanced solubility in polar media (i.e. water), by adopting hydrophilic ‘extended’ conformations, as well as enhanced permeability across apolar cell membranes, by adopting hydrophobic ‘folded’ conformations [37-44]. Given that large molecules may pose pharmacokinetic (PK) risks due to low solubility, the need for caution is especially important. So, reliable, and actionable in silico models to predict solubility before such molecules are prioritized for synthesis could be a valuable contribution in PK risk assessment.
Test Set 6 – Avdeef-Kansy [18] – One additional molecule was added to Set 5 to comprise Test Set 6 of 32 drugs with MW from 802 to 1882 Da, MWavg = 1037 Da. Average log S0 = -4.61 (range -1.2 to -7.6); clogPavg = 3.3 (-3.6 to +17.9); Bavg = 5.8 (1.9-11.6); Φavg = 20 (11-41).
Using all the database molecules with MW < 800 Da as the training set, the GSE(Φ,B) prediction of larger molecules yielded promising statistics: r2 = 0.40, RMSE = 1.10, MPP = 41 %, bias = -0.08. By contrast, the traditional GSE generated r2 < 0, RMSE = 3.0, and MPP = 16 %. RFR, the ‘gold standard’ of accuracy in the minds of some computational chemists, generated r2 = 0.37, RMSE = 1.07, MPP = 38 %, bias = +0.30. Overall, RFR and GSE(Φ,B) performances were about the same. This encouraging result became the segway to our study of a more diverse class of test molecules: recently FDA-approved drugs (2016-2022).
GSE(Φ,B) works well for both big and small molecules, but the Consensus model, based on the average of GSE(Φ,B) and ABSOLV(GRP) can be even slightly better than the RFR Model
The pharma R&D productivity trended downwards from a high point in 1996 to leveling off by 2010, judging by the count of new molecular entities (NMEs) approved each year [45]. From 2011 to 2020, an upward recovery trend, albeit bumpy, can be discerned. More recently, a trend reversal may be taking place. Of the drugs approved in 2020 and 2021, 72 % are considered ‘small molecule’ NMEs. But even these are getting larger, less soluble, more lipophilic, and possessing more H-bond acceptors, when compared to older drugs. Large molecules may be burdened with PK risks, as noted above.
In our fourth [19] and fifth [20] studies, we directed our efforts to predict the solubility of newly-approved drugs, covering the period of 2016-2021. A few newly-approved drugs were added from 2022 [7]. It was of particular interest to see how the Flexible-Acceptor model, GSE(Φ,B), would perform, since many of the new drugs are large and draw on a diverse chemical space. In addition to predictions, the trends in physicochemical properties of these new drugs were quantitated [20] – property inflation was evident. To reduce method bias in ABSOLV prediction, the training set data were divided into six groups. The new operational variant [20], called ABSOLV(GRP), is described below.
Test Set 7 – The intrinsic solubility of 105 newly FDA-approved drugs (2016 to 2022) were added to the Wiki-pS0 database [7]. Average log S0 = -4.64 (ranging -8.5 to +0.6); MWavg = 465 Da (174-1215), clogPavg = 3.3 (-5.8 to +8.7); Bavg = 2.2 (0.8-7.7); Φavg = 6.7 (1.9-32.0).
The GSE(Φ,B) was originally developed to predict the solubility of large flexible drug-like molecules. It was shown to predict the solubility of drugs beyond Lipinski’s ‘Rule of 5’ chemical space (bRo5) to a precision matching that of the Random Forest regression (RFR) machine learning method [18]. Surprisingly, the GSE(Φ,B) appeared to work well also for Ro5 drugs [19]. As before, to add context to the GSE(Φ,B) model, GSE(classic), ABSOLV(GRP), and RFR models were also applied to predict log S0 of the newly-approve ‘small molecule’ NMEs, for which useable reported solubility values could be accessed (the majority from FDA New Drug Application published reports). The prediction models were retrained with an enlarged version of the Wiki-pS0 database. GSE(classic) was applied in its traditional form.
RFR and GSE(Φ,B) outperformed the GSE(classic) and ABSOLV(GRP) models in most of the metrics [35]: r2 = (0.64, 0.59, 0.47, 0.41), RMSE = (1.09, 1.15, 1.33, 1.40), MPP = (37, 32, 29 and 31 %), and bias = (-0.11, -0.34, +0.01, -0.33) for RFR, GSE(Φ,B), ABSOLV(GRP), and GSE(classic), respectively, for the 105 new drugs. The Consensus model [19] (average of GSE(Φ,B) and ABSOLV(GRP)), performed just about as well as the RFR model, with the metrics: r2 = 0.63, RMSE = 1.10, MPP = 34 %, and bias = -0.16.
The near zero bias of the ABSOLV(GRP) model in the most recent studies was largely achieved by dividing the training set into six sub-classes. Big molecules (MW > 800 Da) and quaternary ammonium compounds were first removed from the training set and were each analyzed separately, with the a0-a7 coefficients in Eq. (2) determined by PLS regression. The remaining training set molecules were divided into four classes, based on their net charge at pH 7.4 : acids (-), bases (+), zwitterions (±), and neutrals (0).Table 1 summarizes the results of the sub-class training.
The GSE(Φ,B) model was further re-trained, as more data had been added to the Wiki-pS0 database. For this step, the solubility data in the training set were sorted on Φ+B and grouped into twenty bins of increasing values [19]. In each Φ+B bin of about 700 solubility entries, the three GSE coefficients, c0-c2 in Eq. (3) were each determined by linear PLS regression. The resultant three groups of c-coefficients showed recognizable forms as a function of Φ+B. The trend in c0 was characteristic of a decreasing exponential function in Φ+B, suggesting that the solubility of a liquid solute in model lipid (octanol) [16-18] decreases as Φ+B increases. The trend in the c1 coefficients (lipophilicity factors) was that of an increasing exponential function, indicating a decreasing influence due to lipophilicity as Φ+B increases. The slightly increasing c2 coefficients (crystal lattice effect) could be fit to an ascending linear form as a function of Φ+B. Apparently, crystal lattice contributions are not appreciably altered by molecular flexibility and H-bond acceptor character, and trend near the traditional value (-0.01) in Eq. (1). Evidently, solubility dependence on flexibility and H-bond acceptor strength are mediated by solution-phase interactions [46].Table 2 summarizes the most-recently trained GSE(Φ,B) parameters.
The Consensus prediction equation is simple in form and can be easily incorporated into spreadsheet calculations (using the parameters inTables 1 and2), which is not the case for the RFR model.
Aside from opaqueness, there are other limitations to RFR. Since the prediction is the average log S0 value of several training set molecules with descriptors most like those of the test compound, the RFR model, as it is currently implemented, cannot extrapolate beyond its training space. So, for molecules much less soluble than those in the training set, the prediction always overestimates the solubility. Consequently, if a test molecule is also inadvertently included in the training set, RFR will very likely present the experimental value as the prediction.
Consensus vs. Exclusive Or (XOR) models
In prediction of the solubility of newly-approved drugs [19,20], the RFR and GSE(Φ,B) models outperformed the GSE(classic) and ABSOLV(GRP) models in most of the metrics, as noted above. As a bonus, the Consensus model based on the average of GSE(Φ,B) and ABSOLV(GRP), slightly outperformed the RFR method in one study [19]. In this last section, we put the RFR aside, after having used it as a valuable benchmark. Instead, we focus on developing a decision tree to identify simple mechanistically transparent and easy-to-understand models based mainly on GSE(Φ,B) and ABSOLV(GRP) [5].
As we checked specific molecules over the entire database ad hoc (or in the cases of outliers [5], discussed below), we found that the Consensus model was not always the best predictor. For example, we found that in cases of molecules with Φ < 1.66 (182 very rigid molecules), the Yalkowsky GSE(classic) was the best performing model, but just slightly so (cf.Table 3, andFig. 2). Also, it is uncommon to find an example of a better simple predictor than the GSE(Φ,B) model for the space beyond Φ > 10.83 (342 very flexible molecules) (cf.Table 3, andFig. 2). The span between Φ 1.66 and 10.83 (comprising 7144 entries– most of the database) revealed flip-flop in performance between GSE(Φ,B) and ABSOLV(GRP). Either the latter or the former was the best performer, i.e. an ‘exclusive or’ (XOR) behavior. How could one justify choosing the individual models over the Consensus model?
Insights to a possible answer to the above question may be revealed by further scrutinizing the performances of GSE(classic), ABSOLV(GRP) and GSE(Φ,B) models in the entire Wiki-pS0 database. To start, all the entries in the database were sorted on Φ into 19 bins of near similar values of Φ. In each bin for each of the models, the RMSE value was calculated.Table 3 shows the bin errors distributions.Figure 1 illustrates the distribution of Φavg counts in the bins.
Figure 2 compares the RMSE values distribution of the three models: GSE(classic) as red curves with triangle symbols, ABSOLV(GRP) as blue dotted curves with square symbols, and GSE(Φ,B) as green curves with circle symbols. A decision tree constructed based on the crossings inFigure 2 is shown inFigure 3.
Since the XOR Model inFigure 3 is dependent on the types of molecules within a particular bin of near constant flexibility, and since different databases may have molecules with differing properties in each range of Φ values, it may be prudent to select the Consensus model as the ‘best’ prediction. In our case of model training, the cross-over points may be sufficiently grounded to advance the XOR Model as the ‘best’ for the middle ‘Consensus Domain’ (cf. zoom view inFig. 2). This is especially so, since this middle domain is comprised of many values of solubility, averaging over 1100 entries/bin for bins 2-7 (Table 3).
Testing the decision tree model: Consensus vs. Exclusive Or (XOR) variants
The above Decision Tree considers the prediction potential from the view of the entire database. However, it is the performance of the predictions of test set molecules that really counts. So, we selected a slightly-increased set of 108 newly-approved FDA drugs to examine this last point.
Test Set 8 – Three additional log S0 values were added to Test Set 7 of newly FDA-approved drugs as more approvals were announced in 2022.
Table 4 summarizes the prediction metrics of the three simple models. Based on the properties of the newly-approved drugs in the test set, the ranking of the various methods is suggested. For of r2 and RMSE, the ranking of best performers follows: Consensus > GSE(Φ,B) > XOR Decision Tree > ABSOLV(GRP) > GSE(classic).
The model with the least bias is ABSOLV(GRP) and the model with the best MPP (more molecules within ±0.5 log of the correlation identity line) is the XOR Decision Tree (cf.Fig. 3).Figure 4 illustrates the differences between the XOR Decision Tree and Consensus models for the new drug predictions. The general scatter in the Consensus model is evidently less than that in the XOR model. Other minor differences are evident.
The summary inTable 4 may seem to be complicated, and different trends may be encountered for other test sets. In new test cases, the discussed simple models can be easily incorporated into an Excel spreadsheet (using the refined parameters inTables 1 and2) and compared for new cases.
Another test example in model selection based on outliers
Oja et al. [5] responded to the Second solubility challenge [3,4] with three data-driven MLR models for predicting intrinsic aqueous solubility, which were mechanistically transparent and easily understandable. They discussed the challenges posed by outlier molecules.
Table 5 shows a different pattern of performance ranking compared to that inTable 4. ABSOLV(GRP) performed consistently in the first position for the four most flexible molecules. For the least flexible molecules, folic acid and cisapride, the XOR and GSE(Φ,B) looked promising. Our Consensus model lagged the others in this outliers example.
Conclusion
Large, well-curated aqueous intrinsic solubility databases are available, with average interlaboratory reproducibility of less than 0.2 log unit. However, the distribution of drugs in the chemical space might not be uniform but may be sparsely populated in clusters [47]. Even a massive database might miss some clusters. Test compounds from such an under-represented portion of space may be poorly predicted, as the outliers inTable 5 could suggest. Also, the 32 ‘difficult-to-predict’ drugs in the Second Solubility Challenge [3,4] may be good examples of underpopulated cluster space, for which better representation is needed. It should also be noted that the 32 molecules are also ‘difficult-to-measure’ drugs.
The increasing use of ML/AI methods can lead to accurate predictions, as we have seen. However, these results may not readily suggest the steps to take to improve the properties of tested compounds. Our five studies in solubility prediction have attempted to match the performance of the Random Forest regression method, using relatively simple, mechanistically transparent, and easily applied models [5].
Increasingly, current trends in in-silico predictions of solubility use calculated input descriptors, which may be an advantage to explore properties of molecules yet to be synthesized. The risk may be that overall prediction approaches might be based on accumulated uncertainty, something that is often not emphasized [47]. The knowledge gained and predictive power applied to novel classes of test molecules can still be limited by the calculated descriptors.
Based on our latest findings, we recommend that both ABSOLV(GRP) and GSE(Φ,B) be calculated (e.g., by taking advantage of the refined parameters inTables 1 and2). For molecules with Φ < 11, the prudent choice is to pick the Consensus Model, the average of ABSOLV(GRP) and GSE(Φ,B). For more flexible molecules, GSE(Φ,B) is recommended.