A Comparison of Artificial Neural Networks and Ordinary Kriging depth maps of the Lower and Upper Pannonian stage border in the Bjelovar Subdepression , Northern Croatia

Computerised mapping of subsurface strata is possible with a wide range of methods and techniques, such as geostatistical interpolation and stochastic simulations, but also with geomathematical methods. Geomathematical methods are, for example, the use of statistics in geology and the use of artificial neural networks. Artificial neural networks are primarily used in the case of flawed data and data that is in a non-linear relation. The set hypothesis of successful mapping of depth data using this original artificial neural network algorithm is confirmed using statistical analysis and comparison with geostatistical interpolation methods. The algorithm is made in „R“, an open source statistical computing software, and is used on the mapping of depth of the e-log marker „Rs5“ in the Bjelovar Subdepression, Northern Croatia, that is the border between the Lower and Upper Pannonian stages in the Croatian part of the Pannonian Basin System. The neural network architecture that produced the best responses is a network with two hidden layers, with 10 and 6 neurons, respectively. A backpropagation algorithm is used. Two methods were compared by cross-validation and the neural network produced a mean squared error as 16294.5, and Ordinary Kriging produced 14638.35.


Introduction
The main objective of this paper is to prove that artificial neural networks (ANN) can be used for the mapping of any geological variable (in this case depth) as successfully as geostatistical interpolation methods.The data is processed and mapping is made using an original "R" source code.Since the data can be flawed or in a non-linear relationship, ANN can unite the data into one complex dataset.The characteristic of this method is that it simulates the learning process of human beings by training and optimizing parameters in a number of repetitions.The used geostatistical interpolation technique was Ordinary Kriging.The two methods were applied on depth mapping of the e-log marker "Rs5".The e-log marker "Rs5" represents the border of the Moslavačka Gora Formation (Lower, Middle Miocene and Lower Pannonian sediments) and the Ivanić-Grad Formation (Upper Pannonian sediments), deposited inside the Bjelovar Subdepression, i.e. southwest of the Drava Depression (see Figure 1).This border had been previously regionally mapped by several authors (e.g.Malvić, 2003;Malvić, 2011;Špelić et al., 2014).The e-log marker "Rs5" as mapped in Malvić (2011) is presented in Figure 2.  (Malvić, 2011) The Mining-Geology-Petroleum Engineering Bulletin, 2016, pp.75-86 © The Author(s), DOI: 10.17794/rgn.2016.3.6

Basic geography and geology of the mapped area
The Croatian part of the Pannonian Basin System is located in the southwest of the geological macrounit called the Pannonian Basin System.Existing macrounits, i.e. depressions, in the Croatian part of the Pannonian Basin are the Mura, Drava, Sava and Slavonija-Srijem Depressions.
The Drava Depression is located in the northeastern part of Croatia and also extends to Hungary (see Figure 1).The total area is approximately 12000 km 2 , out of which 9100 km 2 is in Croatia (e.g., Malvić & Cvetković, 2013).The Bjelovar Subdepression, as a part of the Drava Depression, covers about 2900 km 2 (Malvić, 2003) in the Southwest.Geographically, it is surrounded with Kalnik Mountain in the Northwest, Bilogora Mountain in the Northeast, Papuk Mountain, Ravna Gora Mountain and Psunj Mountain in the East and Moslavačka Gora Mountain in the South (see Figure 1).
Lithology of the Drava Depression consists mostly of Neogene and Quaternary rocks and deposits.The total thickness of the deposits can be more than 7000 m in the middle part of the depression (Velić, 2007).Besides sedimentary rocks, volcanic Middle Miocene rocks can be found, along with Lower Miocene fluvial and lacustric sediments.Exploratory drilling in the Bjelovar Subdepression has revealed rocks that have been systematized in two lithological and chronostratigraphical groups.First, the younger, consists of Neogene-Quaternary rocks, and the second, the older, consists of Mesozoic and Palaeozoic magmatites, metamorphites and carbonates (e.g., Malvić, 2003).
Following the lithostratigraphic division (Šimon, 1980), the Lower and Middle Miocene rocks belong to the Moslavačka Gora Formation, which is further divided into the Mosti Member (of Badenian, Lower Miocene and Sarmatian age) and the Križevci Member (of Lower Pannonian age).The border with the Palaeozoic and Mesozoic rocks on the bottom is defined with the e-log border "Tg" and on the top with the Ivanić-Grad Formation with the "Rs5" e-log marker (see Figure 2).The next, younger, lithostratigraphic unit of the formation rank in the Drava Depression is named the Ivanić-Grad Formation.It borders with the Moslavačka Gora Formation on the top, with the e-log marker "Rs5" (analysed here), and with the Kloštar Ivanić Formation on the bottom.The E-log marker "Z'" defines the bottom border.The Ivanić-Grad Formation is further divided into the Lipovac Marl member and the Zagreb Member or laterally equivalent Okoli Sandstones (e.g., Šimon, 1980; Malvić, 2003, see Figure 2).Lower Pontian sediments comprise the Kloštar Ivanić Formation.It borders with the Ivanić-Grad Formation on the bottom and the Bilogora Formation on the top with the "Δ" e-log marker.The units of the member rank are the Lepsić Marl, followed by the Poljana Sandstones, the Graberje Marl, the Pepelana Sandstones and the Cabuna Marl (e.g., Šimon, 1980; Malvić, 2003, see Figure 2).The Bilogora Formation is of Upper Pontian age, and is not further divided into lithostratigraphic units of lower rank as the other older formations.It borders on the bottom with the Kloštar Ivanić Formation and on the top with the Lonja Formation with the e-log marker "D'".The youngest deposits are defined as the Lonja Formation.Its age is defined as approximately Pliocene (Dacian and Romanian) and Quaternary.The border on the bottom is with the Bilogora Formation and on the top it is defined with the present terrain.

Mapping of the e-log marker "Rs5" using Artificial Neural Networks
The e-log "Rs5" marker has large significance in the interpretation of the Neogene depositional environment in the Croatian part of the Pannonian Basin System.Its regional character made it easy to recognise and map through almost the entire area of Northern Croatia.Also, as a chronostratigraphical border (Lower and Upper Pannonian) it could be considered as an approximate border between brackish and fresh-water lake environments as remnants of the Central Paratethys.This is the reason why the mapping methods were frequently tested on that regional marker bed, and the newest such evaluation of approach and mapping results was done with a neural network algorithm (ANN).
Since the ANN algorithm needs a large dataset for the learning process, the e-log marker "Rs5" depth data is obtained from Malvić (2003) and Malvić (2011).The dataset can be obtained if new values for ANN are collected using a regular grid (e.g., using the methodology described in Špelić et al., 2014), which consists of x and y coordinates and depth value.Figure 3 depicts the map with the position of every point that has depth value associated.Figure 4 shows the analysis flow chart of the ANN algorithm.The first step of successful mapping using the ANN algorithm is to gather and pre-process the dataset.To get the best output, i.e. the error as low as possible, Gauss-Krüger coordinates have been converted to relative coordinates, which didn't change the spatial relationship, but enabled easier handling with coordinate values, while the depth values were logarithmized with the base of 10 and numerically diminished, but the relationship remained the same.After optimization, the network calculates outputs.The new dataset with minimal total error will be used as final output and mapped.The new dataset is an artificial one with x and y coordinates from the existing dataset increased by 1 (i.e., x i +1, y i +1) to get a larger number of points on the map and larger resolution.So, the resulting map includes twice as much point data as the map shown in Figure 3.When the depths are predicted with respect to the coordinates, the outputs of the old and the new datasets are combined and the map is made using the "ggplot2" package (see Figure 5).The final map consists of 1024 cells.The best network is chosen with regards to the resulting lowest total error.The chosen network has two hidden layers with 10 and 6 neurons, respectively.Table 1 shows randomly chosen network outputs and a comparison with the original data.The value of the most overrated data is placed in the coordinates x=6431993 and y=5059986.Instead of 900 m, the algorithm has predicted 2089.34 m.The most underrated data is the one placed in coordinates x=6429993 and y=5081986.Instead of 3060 m it has predicted 2086.28 m.Table 2 shows the results of reliability and correlation analysis.Based on the reliability analysis i.e. comparison of the output to the input data, ANN algorithm has an extremely high reliability coefficient of 0.99 with 99% confidence interval in the range from 0.988 to 0.991 (p<0.001).Using the correlation analysis, it is visible that there is a statistically significant (p<0.001)positive correlation with the correlation coefficient of 0.91 with a 99% confidence interval in the range from 0.895 to 0.92.The diagram in Figure 6 presents the relation of input and output data.Clear consistency can be seen in the lower values, but there is some dispersal in the higher values.Some outliers are visible, but thorough data analysis showed that they are a consequence of extreme changes in depth values on small distances and the ANN algorithm could not adapt to them.High reliability and correlation are indicators of the excellence of this algorithm and of a potential application of neural networks in strata mapping in an analysed area.

Variogram analysis
A variogram that will be used in further interpolation has been made in the "Variowin" program (Pannatier, 1996).The first step was defining parameters with respect to the input dataset, and then the experimental variogram was made.The experimental variogram is presented in Figure 7.

Interpolation using Ordinary Kriging
The map was made in Surfer 9, and the input variables were coordinates of the e-log marker "Rs5" (x and y) and the depth values.The variogram value is the most important input Ordinary Kriging equations.The two methods (Kriging and ANN) were compared using cross-validation (e.g., Davis, 1987).Their mean squared errors (MSE) have been compared and the most underrated and overrated data presented.

Discussion and conclusion
In this paper, I confirmed the hypothesis of successful mapping using ANN.The mapping is based on depth mapping of the e-log marker "Rs5" in the Bjelovar Subdepression, as part of the Drava Depression in the Croatian part of the Pannonian Basin System.The best results are obtained by ANN architecture of two hidden layers with 10 and 6 neurons, respectively.A multilayer perceptron has been used with every layer fully connected to the next one.A backpropagation algorithm has been used.
The input dataset that have been used were coordinates converted from the Gauss-Krüger coordinate system to relative coordinates so that better output is gained and the error is smaller.The depth values range from 260 to 3140 and are diminished by logarithmizing with the same purpose, to diminish the error.If using datasets marked with a linear relationship, neural networks should be compared with other, often successful, linear mapping methods.The most wellknown of them is Kriging.Although intended for analyses of non-linear relationships, here the ANN in comparison with them gives good results which are supported by reliability and correlation analysis.
The ANN algorithm applied here could probably give valid output in mapping any other geological parameter (porosity, permeability, saturation, depth) in the area of the Bjelovar Subdepression.Of course, it needs to be trained on valid datasets and in that case the error could be low.The largest problem in successful network building is (1) the dataset editing and (2) network parameter optimization.Both can be time-consuming.Here, the largest error in network training resulted from highly different depth values on relatively small horizontal distances (a few dozen meters) that mostly came from a location placed on opposite fault walls (footing and hanging).
The ANN method used 511 input values of the original dataset and another 511 data from the new dataset (x+1,y+1).The Ordinary Kriging method used the original dataset with 511 data.The absolute value of the most overrated and the most underrated value in the ANN method is 2163.06,and in Ordinary Kriging 2050.11.The ANN contains a relatively small amount of data in comparison with Kriging, where the additional data is obtained with interpolation.Because of that, the ANN has a somewhat larger MSE (16294.5 in respect to 14638.35 in Ordinary Kriging).The absolute value of the most overrated and the most underrated value in the ANN method is 2163.06,and in Ordinary Kriging 2050.11.Due to a different way of mapping (cell estimation vs. stratoisohypses interpolation) the neural map is grainy.Both problems can probably be solved with an increase in the number of data cells in neural networks.In that case, there would be more data on a smaller distance, which would decrease the difference in adjacent values and a "transitional" trend can be observed more easily by ANN.Such network would "learn" and in spite of increasing the overall MSE, it would be smaller than that in Ordinary Kriging.Here, I show that the application of ANN in the mapping of depth of the Neogene geological strata in the Bjelovar Subdepression can be a reasonable approach.It is recommended as probably the best approach if the data (e.g., geological variable and depth) is not strongly linearly related, i.e. linear correlation is not high and/or significant.

Figure 1 :Figure 2 :
Figure 1: Geographical location of the Bjelovar Subdepression in the Drava Depression

Figure 3 :
Figure 3: Location map of input values across the regular grid used on the existing regional map of the EL-marker "Rs5" in the Bjelovar Subdepression

Figure 4 :
Figure 4: Flow chart of the ANN algorithm

Figure 5 :
Figure 5: E-log border "Rs5" depth map made using the ANN algorithm

Figure 6 :
Figure 6: Input and output relation diagram

Figure 7 :
Figure 7: Experimental variogram with the line connecting calculated values at variogram classes (lags)

Figure 8 :
Figure 8: Approximation of the variogram with spherical theoretical model

Table 1 :
Randomly chosen ANN input and output data

Table 2 :
Reliability and correlation analysis results *Cronbach alfa, †Kendall tau