Automorphism Groups of Alkane Graphs

The complete set of 618047 isomers of the alkanes with 4 to 20 carbon atoms has been created. From all isomers the automorphism groups have been calculated and evaluated in terms of size, number of asymmetric carbon atoms, and numeric properties of atom and bond orbits. The presence of a symmetric bond is related to the number of atom and bond orbits. Molecular descriptors based on automorphism data have been studied, including the known symmetry index, an entropy measure and the root of an orbit polynomial. These descriptors are closely related to the presence of symmetric substructures. The prediction performance of QSPR models for three molecular properties of alkanes and using binary substructure descriptors, is improved by adding descriptors based on automorphism data.


INTRODUCTION
HE concept of constitutionally equivalent atoms and constitutionally equivalent bonds is an essential subject in chemoinformatics [1] and in mathematics in chemistry, in particular for applications of the graph theory to chemical structures. [2,3] Constitutionally (topologically) equivalent atoms (bonds) have exactly the same neighborhood in terms of connectivity as described by atom types (elements) and bond types -considering the whole molecular structure. Constitutionally equivalent atoms, e. g., give approximately the same shifts of NMR signals. Other applications of this concept are in synthesis design, canonical numbering of atoms, isomer generation, determination of maximum common substructures, or characterization of the molecular symmetry.
The complete information about constitutionally equivalent atoms and bonds is contained in the data of the automorphism group of a graph representing the molecular structure. [4][5][6] Automorphism means mapping of a graph onto itself while preserving the connectivity (not cutting any bonds). Asymmetric structures have only a single mapping -the trivial identity mapping -while for highly symmetric structures several (sometime many) mappings onto itself exist. The size of the automorphism group is the number of possible mappings of a graph onto itself; it is a symmetry measure for the graph. For general graphs, the group size is unknown.
As commonly used, we represent chemical structures by colored graphs [3] with the atoms as vertices and the bonds as edges. In this work, however, only alkane structures are considered corresponding to uncolored, undirected graphs of the type trees. We use hydrogendepleted graphs. The vertex degree -which is the number of bonds (edges) per atom (vertex) -is in alkane graphs between one and four.
The complete sets of isomeric alkane structures for 4 to 20 carbon atoms -comprising in total 618047 structuresis created by the isomer generator program MOLGEN [7][8][9] with output in the Molfile format (SDF-files). [1] The complete automorphism group of each isomer has been determined by the software SubMat [10,11] applying a substructure search with the molecular structure itself as substructure and determination of all positions of the substructure. For the evaluation of automorphism data a set of functions in the programming environment R [12] has been developed. Note other software products available for the application of graph theory concepts, like nauty and Traces [13] or igraph. [14] An extension of this work towards colored graphs (molecular structures with hetero atoms and various bond types), together with the definition of topological descriptors based on automorphism data, is in progress.

THEORY AND METHODS
The chemical structure of 2,2,3-trimethyl-butane (Figure 1) serves for a demonstration of automorphism data and graph properties derived thereof as used in this work. The seven vertices (carbon atoms) are arbitrarily denoted by 1 to 7, the six edges (bonds) by a to f. Table 1 shows in the upper part the automorphism mappings of this graph. The first row is the identity mapping with the atom identifiers 1 to 7 (left part of the table, 'Atom mappings') and bond identifiers a to f (right part, 'Bond mappings'). In row i = 2 a non trivial mapping is defined with the atoms 6 and 7 exchanged, and consequently the bonds e and f exchanged. This mapping can be considered as the result of a substructure search with the molecular structure used as the substructure. In total 12 such mappings are possible (including the identity mapping) and thus defining the complete automorphism group with size α = 12, also denoted as |Aut(G)|, the order of the automorphism group of a graph G. For this simple graph α can be easily calculated from the numbers of permutations at the vertices; 3! for atom 2 and 2! for atom 5, giving α = 3! × 2! = 12. This structure is one of the nine isomers of C 7H16, and has the maximum α among these isomers.
The size of the automorphism group is 1 for asymmetric graphs and is an even number for other graphs. The parity property is based on the fact that in the automorphism mappings the atom positions are permutated and the number of permutations is a factorial always containing the factor 2. Actually, for alkane graphs with α > 1, α is always a product of the factors 2!, and/or 3! and/or 4!. Possible values for α > 1 are therefore 2,4,6,8,12,16,24,32,36,48,64,72,96,128,144,192,216,256,288,384,432,512,576,768,1024, and so on. The limited number of different values of α restricts the use of molecular descriptors that are based only on α. Theoretical aspects of the size of automorphism groups of simple graphs are discussed by Krasikov. [15] An atom (vertex) orbit is the set of constitutionally equivalent atoms. The column for atom 1 in Table 1 shows that atom 1 can be replaced by atoms 3 or 4, consequently atoms 1, 3 and 4 form an atom orbit (given in the lower part of Table 1, denoted by "Sets"). The same result appears in the columns for atoms 3 and 4. The orbit containing atoms 1, 3, and 4 is (arbitrarily) denoted by A (last row in Table 1). Furthermore, atoms 6 and 7 form an atom orbit B; orbits C and D consist of only one atom (2 and 5, respectively). In summary this graph has four atom orbits (A to D) with sizes 3, 2, 1, and 1, respectively.
In the same way bond (edge) orbits -consisting of constitutionally equivalent bonds -are defined. In this example three bond orbits exist: bond orbit X (bonds a, b, c; size 3); Y (bonds e, f; size 2), and Z (bond d; size 1).
Molecular descriptors based on automorphism data can be derived from the size of the automorphism group and from the distributions of the atom and bond orbits. These graph invariants are candidates for symmetry criterions of chemical structures, to be used for molecular Figure 1. Graph of the C7-alkane isomer with highest symmetry (size of automorphism group is 12, the maximum in this isomer set), representing the chemical structure of 2,2,3-trimethyl-butane, C7H16. Atoms (vertices) are denoted by 1 to 7, bonds (edges) by a to f (with arbitrary assignments). Table 1. Automorphism data for structure (graph) of 2,2,3trimethyl-butane shown in Figure 1.
descriptors in models for QSP(A)R -quantitative structure property (activity) relationships -as well as for structure similarity searches or cluster analyses of structures. A selection of these descriptors is described here and values are given in Table 2 for the structures shown in Figures 1 to 3. Applications to QSPR models are given in section 3.4. The size of the automorphism group, α, may be normalized by the number of atoms, nA, and/or number of bonds, nB, giving for the example structure in Figure 1 the descriptors αA = α/nA = 12/7 = 1.71; αB = α/nB = 12/6 = 2; and αAB = α/(nA + nB) = 12/(7+6) = 0.92. Because α spans a wide range of values, αLOG = log10(α) = 1.08 may be an appropriate descriptor.
Asymmetric carbon atoms can be recognized from the automorphism data as follows: The number of (single) bonds must be three (with one carbon-hydrogen bond) or four (quaternary carbon atom), and all bonds must belong to different bond orbits. As descriptors are suggested the absolute number of asymmetric carbon atom, nASYM, and the fraction fASYM = nASYM / nA.
A number of measures have been defined for characterizing the distribution of the sizes of atom and bond orbits. Let ai be the number of atoms in atom orbit i (i = 1 ... kA), with kA for the number of atom orbits. For the structure in Figure 1 we have kA = 4 (orbits A to D; i = 1 to 4) with a1 = 3, a2 = 2, a3 = 1 and a4 = 1. Analogously, we have for the kB = 3 bond orbits (X, Y, Z) the number of bonds per bond orbit b1 = 3, b2 = 2, and b3 = 1.
The number of atom orbits, kA (bond orbits, kB) of any graph with nA atoms and nB bonds is between one and nA (nB). The maximum number of orbits is reached for asymmetric graphs with each atom (bond) in a separate orbit. The smallest asymmetric alkane is 2-ethyl-pentane, C7H16, shown in Figure 2. Each of the seven atoms is in a separate atom orbit; each of the six bonds is in a separate bond orbit; the size of the automorphism group is one. In summary we obtain for asymmetric trees kA = nA; all ai = 1; α = 1. The alkane isomers C4-20 contain 28597 (4.63 %) asymmetric graphs.
The high symmetry extreme with all atoms in one atom orbit is not possible for alkane trees; however, graphs for rings, a tetrahedron, a cube, and others have all vertices in a single orbit. Also complete graphs (each vertex is connected to all other vertices -not relevant for chemical structures) have only one vertex orbit; in these graphs is α = n!, the maximum for graphs with n vertices. In the 618047 alkane isomers considered in this work the structure shown   EA, EB, entropy measure for atom and bond orbits, resp.; SA, SB, symmetry index for atom and bond orbits, resp.; δA, δB, positive real root of orbit polynomial for atom and bond orbits, resp ...
in Figure 3 has the largest automorphism group with α = 31104 mappings. It is the highly symmetric structure tetra-tert-butyl-methane, C17H36.
An entropy measure (structural information content) for atom orbits has been defined by Dehmer et al. [16] as The so-called symmetry index for atom orbits has been defined by Mowshowitz et al. [17] as Analogous measures can be defined for the bond orbits as EB and SB. Table 2 contains the values of E and S for the structures shown in Figures 1 to 3. Note that for asymmetric graphs EA = 0, and SA = log2(nA).
The orbit polynomial has been defined by using the frequencies of the different orbit sizes as coefficients, and the root of this polynomial has been suggested as a molecular descriptor characterizing the structural symmetry. [18] Let's denote the different orbit sizes of a structure by gj with j = 1 ... ng (ng is the number of different orbit sizes), and the frequencies of the different orbit sizes by hj (j = 1 ... ng). We obtain the orbit polynomial by We have to solve the equation 1 0 j g j h z − = ∑ to obtain the real and positive root δ. Investigation of the mathematical properties of δ proved that δ is less or equal one. [18] The structure in Figure 1 has four atom orbits with sizes 3, 2, 1, and 1; the maximum orbit size is ng = 3; for the orbit sizes (gj) 1, 2, 3 we have the frequencies (hj) 2, 1, 1. The corresponding orbit polynomial, and the resulting equation, is therefore We use the notation δA for atom orbit data and δB for bond orbit data. For the structure in Figure 1 is δA = 0.393, equivalent to the root of polynomial (4); for the bond orbit data we obtain δB = 0.544. The roots of polynomials have been calculated by the function polyroot provided in the programming environment R. [12] The minimum of δA for a structure with nA atoms appears for asymmetric graphs (atoms in separate orbits; ng = 1; g1 = 1, h1 = nA); the orbit polynomial is nA z 1 = 1; the root yields to δA = 1/nA. An example is the asymmetric structure in Figure 2 with nA = 7, δA = 1/7 = 0.143. Bounds of δA for distinct classes of graphs have been reported. [19] The maximum of δA appears if all atoms are in a single orbit with ng = 1; g1 = nA; h1 = 1; the orbit polynomial is A 1 1; n z = the root δA = 1. Such structures are not among the alkanes as discussed above. In the 618047 alkane isomers C4-20 the maximum of δA is 0.826, appearing for tetramethylbutane, C8H18, a highly symmetric and compact structure. The maximum value 1 for δB (all C-C bonds topologically equal) is present in one of the alkane isomers, in 2-methylpropane (isobutane, C4H10).

Size of Automorphism Groups
An overview of the used alkane isomers and the size of their automorphism groups is given in Table 3. The numbers of isomers, nISO, are identical to already published data, [3] reaching 366319 for C20H42. For alkane isomers with ≥7 carbon atoms (nC) at least one has an asymmetric structure, the smallest is 2-ethyl-pentane, shown in Figure 2. For example, among the isomers of C20H42, n(αMIN) = 15641 are asymmetric, which is 4.3 % of the isomers in this set. For nC 4 to 20, the maximum size of the automorphism groups, αMAX, is between 6 and 31104; the largest value of α has tetra-tert-butyl-methane, C17H36, shown in Figure 3. In general, αMAX appears only for one structure in an isomer set; the exception are the C10H22 isomers with two structures possessing αMAX.

Descriptors Based on the Distribution of Orbit Sizes
The correlation coefficients (Pearson) between the automorphism based descriptors entropy (EA, EB), symmetry index (SA, SB), root of orbit polynomial (δA, δB) and size of automorphism group as log10(α) are given in Table 4. The descriptors from atom orbit data are very highly correlated with the corresponding descriptors from bond orbit data; consequently, we omit EB, SB and δB from the further discussion. The absolute correlation coefficients between EA, SA, δA and log(α) are between 0.566 for SA vs. δA and 0.998 for SA vs. log10(α), indicating that different aspects of symmetry are characterized by these descriptors. The values in Table  4 have been calculated from all isomers and are dominated by the large sets with 19 and 20 carbon atoms. In Figure 5 selected correlation coefficients are shown separately for the isomer sets with 8 to 20 carbon atoms. We see only Croat. Chem. Acta 2021, 94(1), 47-58 weak dependence of the Pearson correlation coefficients on the size of the alkane molecules, thus indicating a stable relation between the descriptors for varying structure size.
Remarkably high correlation coefficients between SA and log10(α) for the considered tree graphs can be explained by small contributions of the first term in equation (2) for SA, compared to the second term log2(α).
The distributions of the automorphism based descriptors are characterized by boxplots in Figure 6. The values for the entropy (EA) increase with an increasing number of carbon atoms, while the symmetry index (SA), the root of the orbit polynomial (δA), and the logarithm of the size of the automorphism group (log10(α)) show only a small or no dependence on the number of carbon atoms. All distributions exhibit a tailing, EA on the low value side, the others on the high value side.   For the alkane isomer sets C4 to C20 Table 5  The number of atom orbits, kA, and the number of bond orbits, kB, are closely related. The complete set of 618047 C4-20 alkane isomers contains 878 structures with kA = kB; all these structures have an even number of carbon atoms (nC) and a highly symmetric shape. For all other alkanes kA = kB + 1. These relations are discussed together with the concept of a symmetric bond (also named symmetric edge or exceptional line) by Lygeros et al. [21] based on previous work by Harary [22,23] and Read. [24] A symmetric bond is present if the removal of the bond cuts the graph into two isomorphic subgraphs. An equivalent definition is "the atoms (vertices) forming a symmetric bond (edge) must be topological equivalent" -hence must belong to the same atom (vertex) orbit. Within the alkane graphs a symmetric bond can only be present if nC is even and it has been claimed that kA = kB. [21] The present numerical investigations show that alkane structures may contain a symmetric bond if kA is not equal to kB. In Figure 7 two of the 18 isomers of C8H18 are shown having a symmetric bond; for structure V kA = kB = 3, while for structure W kA = 5 and kB = 4. Actually, six isomers of C8H18 have a symmetric bond but in only four of them is kA = kB. Table 6 contains the results for the alkane isomers with 4, 6, ... , 20 carbon atoms. All considered structures with kA = kB have a symmetric bond, however, additional structures also posses a symmetric bond. For instance in the 366319 C20-alkane isomers 2115 (0.58 %) have a symmetric bond, with 507 of them exhibiting kA = kB.

Relation With the Presence of Substructures
Relations between the values of automorphism based descriptors (section 3.2) and the presence of certain substructures are discussed for the 4347 isomers of the C15-alkanes. The substructures considered are the trees with 3 to 8 carbon atoms (equivalent to the 38 isomers of the corresponding alkanes). Regarding a particular substructure, the 4347 molecular structures can be divided into a class 1 if the substructure is present, and a class 0 if absent. If the distributions of a descriptor are different for class 1 and 0, the descriptor is characteristic for the presence/ absence of the substructure. In Figure 8 two substructures, each with eight carbon atoms are considered: substructure S1 (2,2,3,3-tetramethylbutane) is symmetric and compact, and is present in 12.4 % of the C15-alkanes; substructure S2 (n-octane) is present in 87.6 %. The distributions for three descriptors in class 0 and 1 are presented as boxplots: entropy (EA), symmetry index (SA), and root of the orbit polynomial (δA). In general the distributions of class 1 and 0 are well separated with p-value of Mann-Whitney-u-tests < 0.001 in all cases, with some outliers appearing. For the compact substructure S1 the values for SA and δA are significantly higher in class 1 (substructure present) than in class 0. In contrary, EA is smaller in class 1 than in class 0 -this corresponds to the negative correlation coefficients between EA and SA or δA with values of -0.774 and -0.863, respectively (considering all 618047 structures, see Table 4). The chain substructure  Figure 7. Two isomers of the alkanes C8H18 containing a symmetric bond (marked by an arrow). In structure V the numbers of atom orbits (kA) and bond orbits (kB) are both three; in structure W, kA = 5 and kB = 4. S2 shows an opposite behavior with lower values for SA and δA in class 1, and higher values for EA. The significant discrimination of class 0 and 1 by the descriptors EA, SA and δA appears in the alkane isomer sets C12-20 with p-values < 0.001 in u-tests. These results demonstrate a relationship between the values of the three descriptors and the presence/absence of substructures S1 and S2 which mainly differ in their compactness.
In the considered 38 substructures we have 12 containing a quaternary carbon atom; presence/absence of these substructures have a marked influence on the values of the descriptors EA, SA and δA. For example, the smallest of the 12 substructures is tetramethyl-methane, S3. For all isomer sets C12-20 we obtain EA being significantly smaller in class 1 (S3 present) than in class 0. In contrast SA and δA, are significantly larger in class 1 than in class 0. The p-values of u-tests applied for EA, SA and δA using the 12 substructures are < 0.001 in about 95 % of the 108 cases (12 substructures times 9 isomer sets). This result again reflects the close relationship between the described molecular descriptorsbased on automorphism data -and the symmetry of chemical structures.

Application in QSPR Models
Linear, multivariate models for QSPR (quantitative structureproperty relationships) have been made by using two groups of descriptors (x-variables): group A containing eight descriptors based on automorphism data (as described in section 2), and group B containing 38 binary substructure descriptors defined by the alkane isomers with 3 to 8 carbon atoms (as used in section 3.3). Group A consists of the following x-variables: kA, number of atom orbits; nASYM and fASYM, number and fraction of asymmetric carbon atoms; α and log(α), size and its decadic logarithm of the automorphism group; SA, symmetry index; EA, entropy; and δA, root of the orbit polynomial; the last three descriptors for atom orbits. We compare the variable sets A, B and A together with B. The four chemical structure sets used are random samples with 1000 alkane isomers from each of the sets with 14 to 17 carbon atoms.
The three molecular properties modeled by the x-variables are: y1, the approximate surface area of the molecule (Angström 2 , code ASA); y2, the solubility in water (logarithm of mol/L, code log S); and y3, the octanol/water partition coefficient (logarithm of the concentration ratio, code log P). These properties are estimated by specific methods from chemoinformatics as implemented in the software CORINA Symphony. [25] The calculation of ASA is based on the geometry of partially overlapping van der Waals surfaces of the atoms of a molecule. [26] For log S the approximated 3D molecular structures and a set of eight physicochemical descriptors are used with multiple linear regression and neural networks. [27,28] The property log P has been derived by summing appropriate contributions of the atoms together with suitable correction factors. [29] The strategy used here for creating QSPR models is based on standard chemometrics as follows: [30][31][32][33]  (1) The applied variable selection consists of two procedures: First, variables are deleted that are constant or almost constant (meaning the same value in all but a maximum of ten objects). Second, a stepwise selection in forward and backward direction is applied using the Bayes information criterion (BIC) as performance measure. [32,34] (2) Partial least-squares (PLS) regression with repeated double cross validation (rdCV) is applied to the autoscaled matrix X (variables are mean-centered and scaled by the standard deviations). The rdCV approach [35][36][37] estimates the optimum model complexity (given by the number of PLS components, AOPT) separately from estimating the prediction performance for new objects. [38] Furthermore, the variability of the performance criteria is characterized by repeated random splits into calibration and test sets. The essential parameters used for rdCV are as follows: the numbers of segments in outer and inner loop are 3 and 7, respectively; the number of repetitions is 50, resulting in 50 test-set predictions ŷ for each object, and 150 estimations of the optimum number of PLS components, with the most frequent value taken as the final AOPT.
(3) Estimations of the prediction performance are derived from the prediction errors (residuals) ei = yiŷi from test-set predictions during the rdCV. Because these residuals are approximately normally distributed with a mean (bias) near zero, the standard deviation of ei (often called standard error of prediction, SEP) is a useful measure with ŷi ± 2 SEP defining a 95 % tolerance interval for predictions. For the comparison of models with different numbers of variables and different magnitudes of y, the adjusted squared correlation coefficient between y and ŷ is appropriate [32] 2 2 with n and m for the number of objects and variables, respectively, and R 2 for the squared Pearson correlation coefficient between y and ŷ. The measure ADJR 2 is independent from the units of y and penalizes models with large m. First, results of modeling the property ASA with data from the C15-alkane isomers are discussed (A), and then a summary is given of the properties and performances of the models made for all three properties and all four isomer sets defined in this section (B).
(A) A random sample with n = 1000 structures from the 4347 isomers of the C15-alkanes is selected. The y-values ASA of this set are between 431.0 and 442.3, with a standard deviation of 6.07; note that only 20 different values for y (rounded to 3 decimals) appear.
Using the variable set AB, containing 8 automorphism-based descriptors and 38 binary substructure descriptors, we have m0 = 46 variables and a matrix X0 (1000 × 46). The first step of the variable selection eliminates 10 variables that are constant or almost constant. In the following stepwise variable selection m = 14 variables are retained for PLS modeling. Ten of the selected variables are binary substructure descriptors (alkane structures with 7 or 8 carbon atoms), and four are based on automorphism data (nASYM, log(α), SA, δA).
The matrix X (1000 × 14) is autoscaled and the strategy PLS-rdCV gives an estimated optimum number of PLS components of five, and a SEP of 1.88 (equal to 31 % of the standard deviation of y). For a characterization of the prediction performance we consider ADJR 2 = 0.903 between the given y and the medians of 50 test-set predicted ŷvalues (right-hand side plot in Figure 9). The performance of QSPR models from using only automorphism based descriptors (A), is poor with ADJR 2 = 0.616 (left-hand side plot), from only binary substructure descriptors (B), mid plot, is better with ADJR 2 = 0.748, however, is clearly enhanced to 0.903 by combining A and B.   Table 7 summarizes the results obtained for the three considered properties and the four alkane isomer sets. The measure ADJR 2 is always higher for the descriptor set B than for the descriptor set A; however, combining A and B is always better than B. The prediction performance decreases from using C14 alkane isomers to C17 isomers; for instance for descriptor set AB and property ASA, the values for ADJR 2 range from 0.921 (C14) to 0.888 (C17). The number of variables (descriptors), m, after variable selection is between 12 and 18, the number of optimum components in PLS regression, AOPT, is between 2 an 7. Modeling the property ASA is fairly good, however, for logP and logS only semi-quantitative PLS models are possible with the used variables.
Finally, we discuss which descriptors are mostly selected from the set AB in the 12 jobs (3 properties times 4 groups of alkane isomers). Two binary substructure descriptors are selected in all 12 jobs: both are highly symmetric: 2,2,3,3-tetramethyl-butane (substructure S1 in section 3.3), and 2,3,4-trimethyl-pentane. The importance of binary variables j is connected to their information entropy Hj = -pj log2(pj) -(1-pj) log2(1 -pj) with pj for the probability of a descriptor value '1'. Actually, the nine descriptors selected in > 60 % of the 12 jobs have an entropy between 0.599 an 0.999. On the other hand, a high entropy is not always connected with a frequent selection.
From the eight descriptors based on automorphism data, the number of asymmetric carbon atoms, nASYM, is always selected; the decadic logarithm of the size of the automorphism group, log(α), and the root of the orbit polynomial, δA, are selected in 83 %; and the symmetry index, SA, in 50 %. The selection of automorphism based descriptors -in the presence of binary substructure descriptorsdemonstrates their potential utility in QSPR models.

SUMMARY
For alkanes with 4 to 20 carbons atoms all isomers are created, ranging from 2 isomers for C4H10 to 366319 isomers for C20H42, in total nALL = 618047 chemical structures. The atom-bond connectivity of these structures is represented in terms of graph theory by uncolored trees with vertex degrees between 1 and 4. For all isomers the complete automorphism groups are computed and evaluated.
The size of the automorphism group (α, number of mappings of the graph onto itself) is one or an even number; the maximum 32104 is present for the highly symmetric structure of tetra-isobutyl-methane, C17H36. The uniqueness of α is low with only 37 different values in the nALL isomers. The number of asymmetric graphs (α = 1) is 28597 (4.63 % of nALL). Most of the alkane structures contain at least one asymmetric carbon atom (97.6 %); the maximum per structure is eight, present in one isomer of C20H42.
The relation between the number of atom orbits, kA, and the number of bond orbits, kB, is kA = kB + 1 in 99.86 % of the nALL alkanes, and kA = kB for the rest of 878 structures. All 878 structures with kA = kB contain a symmetric bond; however, additional 2509 structures with kA = kB + 1 also have a symmetric bond.
The logarithm of α is highly correlated with the symmetry index SA and with the polynomial root δA, exhibiting Pearson correlation coefficients (R) for all nALL alkanes of 0.998 and 0.917, respectively. Entropy EA and symmetry measure SA are negatively correlated (R = -0.774). The correlation coefficients between the symmetry descriptors have only a weak dependence of the size of the alkane molecules. The uniqueness of the descriptors EA, SA and δA is low with the numbers of different values (rounded to five decimals) in the nALL alkanes of 530, 1247, and 625 (0.086, 0.202, and 0.101 % of nALL), respectively.
The descriptors EA, SA and δA discriminate well the presence or absence of certain substructures in alkanes. For instance the presence of substructure (CH3)3Cgives low values for EA, and high values for SA and δA, compared to alkanes not containing this substructure.
The descriptors nASYM, log(α), SA and δA were successfully applied in QSPR models -together with binary substructure descriptors -for the prediction of molecular properties of alkanes by linear PLS regression.
For the studied set of alkanes, we conclude that descriptors based on complete automorphism data are useful complements to other descriptors, for instance for QSPR models or structure similarity searches. An extension of this concept for general chemical structuresrepresented by fully colored graphs -is in work.