Predictive analytics as a tool of controlling in decision making process in the marina industry

This paper is dealing with predictive modeling based on predictive analytics using computer application system and the usage of the prediction results for decision-making processes. Usually the prediction is based on the experience of decision makers, but the aim of this study is to explain and proof higher predictive efficiency when using predictive analytics based on machine learning as well as more accurate future-oriented business decisions. The marina industry in Croatia is used for this research because of its complexity and necessity to predict future events that influence company success with reliable accuracy. The information for decision-making were obtained from the customer database recorded manually over the past 30 years and according to data from December 2020. The optimized prediction by the vector machine and statistical theory based on the Bayes theorem is used to support more accurate prediction. The quantitative research was carried out using the SAP Predictive Analytics (SAP PA) computer application. The results of prediction models are a perfect basis for making future-oriented strategic and tactical decisions. This research proves that, with knowledge obtained from the results of prediction models it is possible to improve the identification of the target group among applicants and customers that contribute to company success. The research provides a theoretical and an empirical contribution in the usage of predictive analytics in the marina industry in Croatia.


Introduction
The development of information technology leads to the extensive digitalization of data and the use of digital data technologies in administration, management and controlling of the enterprises all over the world. This does not apply to routine work, but also tactical and strategic planning and the resulting management decisions supported by controlling. The tasks of controlling have changed significantly from the historical collection and evaluation of data to a decision support system. The possibility to use big data, the Internet of things, machine learning and other potential tools of the data science gives controllers a role of predictors. [37], [54,, [19]. However, this also requires controllers to ensure the consistency and comparability of diverse data and analysis models within the company [48,[8][9][10][11][12][13][14][15][16][17][18][19], [49,160], [50]. Artificial intelligence systems and experts who supply these systems with decision-relevant data, parameterize them correctly based on their own experience. The interpretation of predicted results by specialists using their knowledge leads to the development of effective and efficient tactical and strategic plans and decision proposals [15,59], [22,[94][95]. Modern controlling must therefore break new ground. Experience and expertise must be supplemented by the usage of expert systems [23,[362][363][364][365][366], [44,43]. This study follows this path. Pattern recognition in the data and the related development of forecast models are the basis for a system of quick and sustainable decisions.
The figure above gives an overview of the progress of reporting in controlling. Traditionally, controlling management received standard reports. This flood of numbers in the reports was outlined using graphics to visualize the core of the statements. With further development of the reporting system the possibility was created to select ad hoc figures and calculate key figures. Additional information systems have been created for better visualization and faster recording of key figures relevant to decisionmaking. This area also includes the Balanced Scorecard developed by Norton and Kaplan [29]. With the development of the systems based on a business warehouse system, a data pool was available in which data can be extracted from different systems. This enables interdisciplinary reporting. The entire know-how of a company was incorporated in a single reporting system, called Business Intelligence (BI). Given that, only the characteristics and key figures are extracted from the source systems, the name Online Analytical Processing (OLAP) was defined. The usability of the reporting systems was developed to such an extent that experts were no longer necessary for reporting. Management could serve itself. User-friendly data exploration and agile visualization with dynamic dashboards support the fast and up-to-date evaluation of business developments [25,83]. However, in all cases, it is a question of reporting on historical data. A look into the future is essential if a company wants to remain competitive. This brings up the question of why predictive analytics is better. The availability of big data makes it possible to discover relationships between the data and be prove them [31], [46], [47,863]. Another aspect is the availability of data from social media. With these data-as-a-service offers, companies can make their marketing campaigns more intelligent. Interrelationships between the company and market processes can be made visible and useful for business decisions. In this way, suitable target groups for the company can be better identified and marketing budgets can be used in a target group-oriented manner.
The subject of this research is the marina industry in Croatia. Marina industry is a dynamic and complex industry, a very important competitive product of Croatia which tend to be sustainable [21], [26]. Considering the length of the coast nautical tourism is very competitive branch in Croatia with high turnover in the past few years [21], [26,[59][60][61][62][63][64][65][66], [28,[175][176][177][178][179][180][181][182][183][184][185][186], [33], [41], [45]. However, it is necessary to improve management of marinas in the direction of more affordable pricing of berths, raising the level of service in the marinas and making other improvements necessary for a better positioning in the Mediterranean. Several marina companies are competing. Therefore, offers must be made available to the interested parties in a short time. Long-term decision-making processes about the allocation of berths lead to applicants turning to competitors. The decision as to which prospective customer should be offered a berth depends on how reliable the prospective customer is and what agreement duration and agreement volume can be expected. Two tasks had to be mastered to achieve this goal. The first task was to un-. The first task was to understand the data available. With an understanding of the  data, the parameters for generating prediction models in the computer application could be set in such a way that the explanatory variables with the highest significance could be found. The second task was to create the prediction models and evaluate the results. The evaluation of the prediction models referred to performance indicators measuring the prediction confidence and prediction power of a model. These indicators are explained in the following chapter.

Theoretical background
The explanations of the theoretical background firstly refer to the statistical foundations of machine learning, developed as a statistical learning theory by Vladimir N. Vapnik. In the second step, the statistical learning theory was implemented in a programmed algorithm, the machine learning of the support vector machine. The third aspect is the transfer of the statistical learning theory to the machine learning algorithm which is the support vector machine. Machine learning is therefore used for pattern recognition in big data. Pattern recognition is an important theoretical subject for recognizing hidden information in the data. It combines operational research and machine learning. Operational research simulates and optimizes business processes using algorithms and quantitative methods. The machine learning uses the socalled learning algorithms to generalize recognized patterns in the data. The generalization leads to the fact that the recognized patterns can also be transferred to future data. Therefore, reliable statements can be made about future events when future data occurs [1,22]. The relationships between the observed characteristics contained in the data can be very complex. This high complexity reduces the informative value of identified relationships since the associated boundary conditions are too special. Complexity reduction required in the statistical theory was developed by Vladimir M. Vapnik through his statistical learning theory. The basic idea is to replace the true risk in the forecasting through the empirical risk. If reality is described by a model that is too complex, the model cannot be applied to new future data because the model does not generalize enough. Lack of transferability of complex models was solved by Vladimir N. Vapnik, who found a way to reduce non-robustness and non-reliability of prediction models with the invention of the statistical learning theory [55], [56], [57,. It leads to a compromise of data quality and robustness. The next step was the implementation of statistical learning theory in programmable learning algorithms. The algorithm used in this research in the computer application is the support vector machine. In the simplest of terms, pattern recognition is the separation of marked patterns in two classes. The support vector machine (SVM) is an algorithm with linear programming doing just that [10,, [11,[2][3][4][5][6][7][8][9][10], [17], [24], [30,323], [32,22]. Using the classification for pattern recognition, a set of Q classes that will consist of several classes q i will be used. The classification approximates the assignment of input variables x i to discrete output variables y i , which defines the class q i . The variable x is then assigned to the class with the highest probability. The assignment of characteristics to a class is done by a classifier, which is an algorithm in the form of a mathematical-statistical function. The results of the prediction models are used to investigate decision-making behavior. Following the Bayes' theorem, a decision based on previous knowledge (a priori) has lover correctness than a decision which is based on additional decision-relevant information and therefore defined as a posteriori decision [3,, [4], [5,[80][81][82][83], [9,, [12,[260][261][262][263][264][265][266][267][268][269][270][271][272][273][274], [16,376], [51], [52,[117][118][119][120][121][122], [53].
Subject of this research is the marina industry in Croatia. The scenario concerns management decisions and controllers support to these decisions regarding the conclusion of berth agreements with applicants. The objective of this research is the creation of predictive models about the behavior of applicants and customers of the marina industry in Croatia, providing a future-oriented data basis. The expected customer behavior relates to the conclusion of long-term berth rental agreements, and the parties credit rating. A distinction is made between corporate customers and private customers. These research objectives are also the target variables of the generated prediction models.
The database refers to data which include customer data of a marina company in Croatia. These data basically represent the explanatory variables for a prediction model, such as the age and citizenship of the applicants but also the length and age of their vessels. When the model is generated, the program runs the learning algorithm which is the Support Vector Machine. When the machine learning process was initiated, the computer application methods like clustering or classification were set. Characteristic values that are in a similar correlation to the target variable are grouped into clusters or classes. Classification is what is known as supervised learning since the classes are predefined. On the other hand, in the case of clustering, on the other hand, the very algorithm forms the clusters. This is called unsupervised learning. If the target variable is a continuous variable (e.g. sales volume), the computer application uses regression. Classification is used, if the target variable is a binary variable (e.g. extension of the contract YES or NO).
The research aims to calculate the probability that, under certain conditions, applicants and later customers of the marina company will extend their rental agreement for a berth. In generating prediction models, agreement extension is the central target variable of the model. The explanatory variables result from customer data, which were kindly made available by the marina company in Croatia. The explanatory variables result from the customer data, which was kindly made available by a marina company in Croatia. For the prediction of agreement extension, the main explanatory variables turned out to the vessel length, vessel age, vessel type and certain marinas preferred by the customers.
The applied quantitative research was performed using the computer application called SAP Predictive Analytics (SAP PA). The support vector machine algorithm was programmed by the company KXEN into the KXEN algorithm. Using SAP PA, the prediction models were created with setting the parameters in such a way that best quality prediction results could be generated. The parameterization consisted of setting the target variable, excluding irrelevant variables that could contaminate the result. This was done through the successive evaluation of the key performance indicators KI and KR, which calculate the quality of the prediction model. The accuracy of the model indicates the influence of explanatory variables on the target variable. The performance of the predictive models can be measured by two key performance indicators [6], [7], [8]. However, not all available explanatory variables of the database should be used. Only the essential variables that contribute to the target variable should be considered. This are the variables with the highest correlation to the target variable. 2. Prediction confidence (KR) or reliability of the prognosis characterizes the statistical robustness of the model. KR has a range of 0.0 to 1.0. Only a value above 0.95 shows that it is a robust model [8,124]. A model is robust, when the same quality of prediction can be expected using new data with an unknown result [1,31]. The abbreviations KI and KR come from the American company KXEN INC., which was founded in 1998. Therefore, KI stands for "KXEN Information Contribution". KR stands for "KXEN Robustness". The results of the forecasts are visualized by graphics to the model. The estimation curve graph results from the model calculation. The random curve graph is the result of randomly observed values. This means that, forecast-wise, it is coincidental whether an observed value was found or not. The probability of a "hit" is therefore 50% for the random curve. The definition of a forecast model should increase the probability of finding a "hit", if possible, by more than 95%. The robustness of the forecast model is the applicability to statistical data, the result of which is unknown, which occurs through the KR calculation.
The performance indicators KI and KR are explained using the numerical example in the figure above. A forecasting model contains the parameters with which the target customers, i.e. those who are interested in the company's products, should be found. The model parameters include personal characteristics of the person, for example, age, education, occupation, marital status, etc. The forecast model is calculating the influence of the parameters on the interest in their own products. Once the key parameters have been identified, product marketing can be targeted. In the following example, the total population is 18 million people. The target group in the population is about 6 million people, which is 33%. If the selection of members of the population to find the target group was random, only 50% of the target group would be found. If the prediction results were used to select members of the population, the hit rate would be significantly higher. If the perfect prediction model were to be generated, the whole target group of 33% of the total population would be found with the selection of 33% of the total population. From a business perspective, target customers The areas A + B + C shown in the previous figure represent the population. The ratio of the area of the forecasting model C and the perfect model B to the population A + B + C is the measure of the predictive accuracy of the model and thus the result of model training (estimation). The ratio of the area of the forecasting model C to the population A + B + C is a measure for the verification of the forecasting accuracy of the model (validation). As part of the validation process, it is intended to examine, regarding to the known statistical results, whether the predictions calculated by the model on the known statistical results can be confirmed. The robustness of the forecast model, is the applicability to statistical data whose result is not known, takes place via the calculation of KR. KR results as:

Prediction Confidence (KR) ≈ 1 -B/(A + B + C)
The areas are calculated mathematically using the determined integral. The computer application uses the socalled rectangle methods as an approximation. For this purpose, the area of the function is divided into large number of rectangles whose areas are summed up. It results in a sum of rectangles, which are above the function and a sum of rectangles that lie below the function. The mean value is formed from both values. The curve therefore is named ROC -curve (Receiver Operating Characteristics) [34].

Methods
To be able to calculate a reliable forecast about the sales progress, it is of a decisive advantage for the competitive Croatian marina industry should be able to make reliable predictions about future customer behavior. The results of this research show that reliable statements about customer behavior can be made by using predictive analytics. The results show that from the large number of customer characteristics, those having a significant influence on the customers decisions. It is important to filter out less important features, because no decision-making constellations would be possible with an unmanageable number of influencing factors.
Partitioning method of clustering. The partitioning method of clustering divides a quantity of data into k clusters, whereby an observed characteristic belongs to exactly one cluster. The features observed are points in an n-dimensional Euclidean vector space. A cluster is represented by a centroid. The goal of the partitioning is to divide the data set into k partitions in such a way that the sum of the squared distances from the cluster centroids μ i is minimal. Mathematically, this corresponds to the optimization of the function [14,51]: SSE -Sum of Squared Distance to Euclid Center. C i -cluster i.
x -observed variable value of the explanatory variables. μ i -cluster centroid dist(x, μ i ) 2 -Squared distance of observed variable x j to the cluster centroid μ i The smaller the sum of the distances between the observed values and the centroid of a cluster, the more compact the cluster is. The variables used in this research are: x 1 = Vessel Type; x 2 = Vessel Age; x 3 = Vessel Length; x 4 = Marina. No Filter. With the use of the learning machine SAP Predictive Analytics, the contract extension was set as the target variable.
Method of Machine Learning. The process in generating the models in this research is divided into a learning phase (ESTIMATION) and a validation phase (VALIDATION). The system devideds the data set randomly into thwo subsets: -Estimation subset. In the first step, the variables which has not been excluded manually are tested about its significance to the target variable. If the significance is zero or very low, the algorithm will exclude the variables with the lowest significance as explanatory variables. A parameter can be used to set the degree of correlation at which an explanatory variable should be excluded. In the next iteration, a new model will be created using the remaining explanatory variables. -Validation subset. In the next step, the created models are evaluated using the evaluation data set. The best model with the highest values for predictive power KI and prediction confidence KR will be selected for the third step. The cutting strategy could be set manually. Charbert et al. [8,163] propose to use the default cutting strategy. Generally, 75% of data records are assigned to the esitmation data subset and 25% to the validation daa subset. At model with minimum complexity is created first. The errors related to the prediction of the target variables with the actual values of the target variables known in the training data-set are calculated. After the first generated model, a second model is generated by including additinal variables, which has the complexity of h + 1. The second model is validated. If the error rate decreases, a third model is generated with the complexity h + 2. As soon as the error rate increases with another generated model, the previous model is determined by the algorithm as the best predictive model. In the third step, the performance indicators are calculated for the best model using the tested data set [30], [31], [32].
Gaining knowledge from predictive models. In times of a global pandemic, which has been spreading worldwide since 2020, it has become more difficult to ac-quire new customers. In this context, and primarily with the goal to maintain long-term contractual relationships with customers, it is interesting to analyze the characteristics of private applicants and customers, who tend to extend existing agreements. With knowledge of the customer characteristics, marketing can be designed for this specific target group. The management gets a better basic for the calculation of the expected sales volume in the future if they know which of the existing agreements will be extended [35]. If, in the current difficult times of a pandemic, the focus is on extending the existing agreements, it is important to identify the customers who are most likely to extend their agreements. It is hypothesized that applicants and customers have similar characteristics with which the target group -customers with the intention of long-term contracts -can be identified. The investigation is focused on the explanatory variables for vessels. This occurs because the system has put customer groups PRIVATE and FIRM together in a single group and therefore does not have any influence on the target variable from this feature.

Research results
The significance of explanatory variables for the target variable "agreement extension". The sailing boat vessel type is predominant in terms of long-term customer relationships. This could be explained by the fact, that not every marina is suitable for sailing boats. If customers did find a marina with good wind conditions, they are more likely to hold onto the berth. The catamaran and motorboat vessel types have a negative impact on the extension of the existing agreements. Catamarans play a subordinate role regarding the proportion of vessel types. Motor yachts are less dependent on wind conditions than sailing boats. Regarding the size of the vessels, the result is the same as in previous investigations. Medium-sized vessels in the range of 9 to 15 meters are clear favorites when it comes to extending existing agreements. Boats of this length represent 77% of the total. This could also be an input for berth dimensioning. Owners of sailing boats, who prefer marinas suitable for sailing boats, could show that they are interested in free berths with a compatible berth length. Another aspect is the inclination towards charter boats. A study of the average length of chartered boats would be helpful here. In addition to owners of medium-length boats, owners of middle-aged boats are also the strongest group of clients who tend to extend existing agreements. Owners of newer boats are obviously more inclined to explore new areas and call at other ports accordingly.
It is interesting to examine the ports with an advantage in agreement extension. The preferred ports for agreement extensions are, for example, marinas with good wind conditions. It can be concluded that good wind conditions are of great importance for sailboats. A marketing campaign aiming to extend the agreements should be addressed to clients who are using sailing boats with a length between 9 and 15 meters long and between 11 and 34 years old. The marinas which are preferred should be advertised. Note: The names of the marinas are not given exactly here to keep confidence. However, the properties shown should not discourage constant progress monitoring and result updating, while making available new data set and correcting them if necessary. Decisions and tendencies on the allocation of berths as well as the concept of marketing campaigns can be made according to the target group. Finally, controlling can provide a better basis for company planning of the expected sales volume.

Discussion
Predictive analytics is the innovative tool that helps controlling take future-oriented decisions. Reliable knowledge of customers decisions regarding the conclusion of agreements and agreement extensions provide a better basis for calculating future sales. Secured sales forecast and sales planning, in turn, provide a very good basis for calculating the development of financial opportunities and the associated investment opportunities. A company that can calculate its business development more precisely using innovative methods has a clear competitive advantage.
Outlook: The following projects could further develop these research results: -With the knowledge of the prediction model, a guideline-based questionnaire could prof the a-posteriori better decisions with the knowledge of the prediction model [3], [4], [5], [9], [28]. -Continued analysis of further decision-making processes, for example in the planning of marketing campaigns, the search for suitable locations for new marinas or the question of a combination of individual products such as a charter boat, berth, skipper and maintenance service for the ships. -Surveying other marina companies, national and international. Such a survey coverage could further increase the applicability of stochastic decision models. -Use of modern forecast information systems, for example SAP ANALYTICS CLOUD. However, it should be emphasized that the concept of a decision logic in the form of stochastic decision models enables the models to be programmed in an information application system and thus used as a decision support system for controlling and management. This would make a further contribution to digitalization in controlling.

Funding:
The research presented in the manuscript did not receive any external funding.