Predicting Corrosion Inhibition Effectiveness by Molecular Descriptors of Weighted Chemical Graphs

: Corrosion inhibitors are chemical substances used to alleviate the process of corrosion. The efficiency of a corrosion inhibi tor is determined by a corrosion inhibition effectiveness (IE). It can be calculated based on the experimentally obtained measures. The aim of this paper is to predict corrosion inhibition effectiveness from the molecular descriptors (topological indices). Since corrosion inhibitors are heteroatomic molecules, we use weighted molecular graphs to model them. Various distance -, degree-, and eigenvalue-based topological indices of weighted molecular graphs are calculated. Moreover, correlations between these topologi cal indices and corrosion inhibition effectiveness are examined. Finally, the topological indices that are the best predictors of corrosion inhibition effectivene ss are applied to obtain linear regression models.


INTRODUCTION
ORROSION cannot be completely prevented, but there are methods to mitigate it. One of these methods is the use of corrosion inhibitors. A corrosion inhibitor is a chemical compound dissolved in a corrosive medium that binds to the surface of the metallic material in some way and mitigates corrosion. A chemical compound is a possible candidate for a corrosion inhibitor if it contains binding centres such as N, O, and S atoms (and rarely P) and/or π-electrons in its structure. In particular, azoles are a class of such compounds that can potentially be used as corrosion inhibitors for various metallic materials, and also some plant extracts can be considered as corrosion inhibitors. [1,2] Many effective corrosion inhibitors form organometallic complexes on the surface or in the structure of the polymer. Currently, the estimated annual direct cost of corrosion in the United States is $276 billion, or about 3.1 % of the U.S. gross domestic product. [3] Various organic compounds can act as corrosion inhibitors, including acetylene alcohols, aromatic aldehydes, alkenylphenones, amines, amides, nitrogen-containing heterocycles (e.g. imidazoline based), nitriles, iminium salts, triazoles, pyridine and its derivatives or salts, quinoline derivatives, thiourea derivatives, thiosemicarbazides, thiocyanates, quaternary salts, and condensation products of carbonyls and amines. It is claimed that molecules containing nitrogen and acetylene alcohols can form a film on the surface of the metal and mitigate both the dissolution process of the metal (anodic reaction) and the evolution of hydrogen (cathodic reaction). There are cases, such as that for propargyl alcohol, which is soluble in acids, while the solubility of other acetylene alcohols decreases with increasing carbon chain length. On the other hand, the solubility of such acetylene alcohols can be increased when they are combined with quaternary ammonium surfactants. Acetylene alcohols are widely used due to their commercial availability and cost-effectiveness. Propargyl alcohol is commonly used as a standard corrosion inhibitor for acidification and often has a significant synergistic effect with other compounds. [4] In most cases, whether and how corrosion inhibitors work is determined empirically by trial and error experiments. Currently, the research community is focused on finding more convenient solutions based on computer simulation techniques. The latter are currently limited by the inclusion of solvent molecules in the computational process in most of the cases presented. Moreover, corrosive ions are also frequently not included in the simulation process. Without considering solvent molecules and corrosive ions, this practically means that the calculations are performed under vacuum conditions, where the metals naturally do not corrode. The latter disadvantage can be conveniently solved using the approach presented herein, where the corrosion inhibitor structure in a given medium and its corrosion inhibition effectiveness (IE) is considered. This work presents an interdisciplinary study using chemical graph theory approach based on the data obtained using corrosion tests to determine corrosion inhibition performance. More precisely, the corrosion inhibitors are modelled by weighted molecular graphs. Since the weights of vertices (atoms) can be defined in numerous different ways, we consider four different possibilities, using the atomic number in three of them. We compute various molecular descriptors for these weighted graphs by using the degrees of vertices, distances between vertices, and the eigenvalues of molecular graphs.
Our focus is on weighted-variants of some wellknown degree-and distance-based molecular descriptors as well as eigenvalue-based indices. The two oldest vertexdegree-based descriptors are the first and the second Zagreb indices, introduced in Ref. [5] where they were associated with the terms in a power-series expansion of the total π-electron energy. Then the Randić index and the ABC index are well known molecular descriptors, first one is measuring the extent of branching in a molecule and the second one is connected with the heat of formation. [6,7] For a comparative study on degree-based indices see Ref. [8], and some more recent results can be found in Ref. [9]. Regarding the distance-based molecular descriptors the Wiener index is the most famous, since it is considered to be the first molecular descriptor among all of them. [10] Distance-based topological indices are also the Schultz index and its modification, known as the Gutman index. [11,12] Another molecular descriptor motivated by the Wiener index is the Szeged index. [13,14] All these indices are either bond-or atoms-pair-additive indices and some mathematical aspects on them can be found in Ref. [15].
The correlation analysis between the obtained molecular descriptors and corrosion inhibition effectiveness of considered inhibitors (under different conditions) was performed. As a result, we obtained topological indices that are the most suitable predictors of corrosion inhibition effectiveness under certain conditions.
In the next section, we present the considered experimental data. Then, in Section 3, we define some distance-and degree-based topological indices as well those that are based on eigenvalues. Finally, in Section 4, the methodology is presented and the results are discussed.

EXPERIMENTAL DATA
In this paper, IE was determined using corrosion immersion tests in which the steel samples are immersed in the test solution for 24 hours. The change in mass (Δm) before and after immersion is used to estimate the corrosion rate, i.e. the higher it is, the higher the corrosion rate is. The samples of C15 steel were purchased from Rocholl, Aglasterhausen, Germany in the form of rectangles measuring 50 mm by 20 mm by 1 mm. The C15 steel contained 0.140 wt. % C, 0.200 wt. % Si, 0.470 wt. % Mn, 0.003 wt. % S and 0.006 wt. % P (as specified by the supplier). The samples were cleaned in an ultrasonic bath containing 50 vol. % ultrapure water/50 vol. % pure ethanol for 5 minutes. Ultrapure water with a resistivity of 18.2 MΩ cm was obtained from the Milli-Q system (Millipore Corporation, Billerica, MA, USA), and pure ethanol (for analysis-ACS quality) was purchased from Carlo Erba Reagents (Milan, Italy). After the immersion test, the samples were thoroughly rinsed with ultrapure water and brushed with a bristle brush to remove corrosion products, rinsed again with ultrapure water, dried with compressed air, and weighed. At least three replicate measurements were made for each system, and the data obtained were checked for possible outliers using Grubbs' statistical test. [16] If an outlier was present, further immersion tests were performed until at least three measurements contained no outliers. An average mass loss change was then calculated and used to determine IE. Tests were conducted at 25 °C and 70 °C to investigate the effect of temperature (T ). The corrosion inhibitors tested were dissolved at a concentration of 1 mM and 10 mM in a 3 wt. % NaCl solution with and without 0.5 wt. % or 2.0 wt. % KI. In the present case, KI represents a corrosion inhibitor intensifier -the compound that increases the performance of the corrosion inhibitor. [4] Equation (1) The corrosion inhibition effectiveness IE i , {1, ,8} i ∈ … , is calculated from the experimentally obtained data measured at 8 possibilities of different conditions, see Table 1.

TOPOLOGICAL INDICES
In this section, we introduce basic concepts from graph theory and define topological indices which will be considered later.

Graph Theory Preliminaries
The edge e = {u,v} between vertices u and v will be also denoted as e = uv. All the basic concepts from graph theory can be found in Ref. [17].
For a vertex u, the open neighbourhood Nu is defined as the set of vertices that are adjacent to u. Moreover, for u,v ∈ V(G), the distance between u and v, denoted by d (u,v), is the length of a shortest path between vertices u and v. Figure 1. Corrosion inhibitors.  an edge-weight of G and (G,w') is the edgeweighted graph.

Degree-and Distance-based Topological Indices
Let (G,w) be a vertex-weighted graph. First, we define some degree-based topological indices. The degree of a vertex u in (G,w) is defined as The first Zagreb index M1(G,w), the second Zagreb index M2(G,w), the Randić index (or connectivity index) R(G,w), and the ABC index ABC(G,w) are defined as: is the number of adjacent vertices of u, which gives standard degree-based topological indices denoted by M1(G), M2(G) for the first and the second Zagreb index, [5] R(G) for the Randić index, [18] and ABC(G) for the atom-bond connectivity index. [19] Next, we define three distance-based topological indices. The Wiener index was introduced by H. Wiener to predict the boiling points of alkanes. [10] Its weighted version, the Wiener index of (G,w), was presented in Ref. [20] and is defined as The Szeged index was defined by I. Gutman [13] for any connected graph G. To define the weighted version of this index, we need to introduce the following notations for any edge e = uv in a vertex-weighted graph (G,w):  In addition, we consider the Harary index, [21,22] which is for a graph G defined as Finally, we mention two distance-and degree-based topological indices of a vertex-weighted graph (G,w). The first one is the degree distance (or the Schultz index). It was formally introduced in Ref. [23], but it had been already known a few years earlier. [11] For the vertex-weighted graph it can be defined as The second one is the Gutman index, [12] which is for a vertex-weighted graph (G,w) defined as

Eigenvalue-based Topological Indices
The eigenvalue-based indices are calculated from the adjacency matrix of a graph, more precisely, from its eigenvalues.
Since in this paper we consider vertex-weighted graphs, we need to define adjacency matrix of these graphs.
The adjacency matrix of a vertex-weighted graph , is a square matrix of order n whose (i, j )-element is defined as: 1 ; if and the vertices and are adjacent, ( , ) 0 ; if and the vertices and are not adjacent, 1 2 , , , n λ λ λ … be the eigenvalues of A(G,w). Then, the graph energy [24] for vertex-weighted graph is defined as: The graph energy made a substantial impact and induced the introduction of many other topological indices. [25] Another well-known index based on the eigenvalues is the Estrada index, [26] which is for our graphs: The graph energy and Estrada index are extensively studied and there are numerous papers on these indices (e.g. see Refs. [27][28][29][30][31]). Recently, a modification of EE was proposed and the Gaussian Estrada index was introduced. [32] This index is, in our case, defined in the following way:

METHODOLOGY AND RESULTS
In our model, every corrosion inhibitor is represented by a molecular graph in which vertices represent atoms, and two vertices are adjacent whenever there is a bond between the corresponding atoms. As usual, all hydrogen atoms are omitted. Since the considered molecules are heteroatomic, containing atoms of carbon, oxygen, sulfur, and nitrogen, we introduce the weights on vertices. However, there are several different ways how to assign the weight to a vertex. Therefore, we consider four different models for the weights using the atomic numbers, see Table 3.
We have calculated the topological indices of the considered molecules using different models. A topological index TI in the i-th model will be denoted by TI i for {1,2,3,4} i ∈ . Note that H 1 is just the Harary index. The correlation coefficients between topological indices and corrosion inhibition effectiveness are listed in Table 4. Since the molecules contain different number of atoms, we have normalized the values of topological indices by dividing it with the number of vertices.
The best results for degree-and distance-based topological indices are obtained between 4 IE and 2 2 / M n (R = -0.81), between 4 IE and 2 / ABC n (R = 0.71), and between 2 IE and 2 2 / M n (R = -0.72). The predicted values  4 IE obtained by the linear regression between 4 IE and 2 2 / M n (see Figure 2) are calculated as where n is the number of vertices. The determination coefficient is R 2 = 0.66. In addition, we test the obtained model by the leaveone-out analysis. Therefore, we exclude one molecule and calculate the determination coefficient (denoted by 2 Q ) for the remaining 14 molecules. We do this for every molecule and obtain 15 determination coefficients 2 i Q , {1, ,15}. i ∈ … It turns out that the average 2 Q is 0.65, which is very close to R 2 . This shows that the obtained model is stable.
Finally, we use another measure to evaluate the obtained model, i.e. the root-mean-square-error [33,34] defined as where N is the size of the data set, i y is the experimental value, and ˆi y is the predicted value. For the above model, the root-mean-square-error is S = 14.00.
For the eigenvalue-based topological indices, the highest correlation is observed between 4 IE and 2 / GEE n (R = 0.78) and between 2 IE and 2 / E n (R = -0.74). The predicted values  4 IE obtained by the linear regression between 4 IE and 2 / GEE n (see Figure 3) are calculated as where n is the number of vertices. This simple model based on 2 GEE is able to describe more than 60 % of variation in experimental data (R 2 = 0.61). This is satisfactory result considering high diversity of investigated inhibitors and the fact that this performance is conceived by applying only one descriptor. Also, with the average 2 Q equals to 0.61, obtained in leave-one-out validation, this model operates with good stability. Moreover, the root-mean-square-error is S = 14.83.

CONCLUSION
We have calculated correlation coefficients between different distance-, degree-, and eigenvalue-based topological indices of weighted molecular graphs and corrosion inhibition effectiveness of modelled inhibitors. Based on this, we have developed linear models for prediction of corrosion inhibition effectiveness. It turns out that the effectiveness of around 2 3 of corrosion inhibitors can be satisfactory determined from the second Zagreb index and the Gaussian Estrada index (at certain conditions), which can be computed quite easily.