INVOLVING ENVIRONMENTAL INFORMATICS IN CROATIAN TECHNICAL STUDIES

Original scientific paper The paper presents research degree students' knowledge about the Environmental Informatics. The research covered a sample of 31 students who anonymously responded to 10 survey questions about the environmental informatics. The survey results were statistically analysed to arrive at the conclusions about the students' knowledge in the technical field of Environmental Informatics. The results are not satisfying, so the survey was expanded by surveying their university teachers, to investigate the reason for the given results. Research shows the insufficient use of methods and algorithms for data management such as data structures, data mining and storage, and special statistical methods and algorithms for business rules modelling for the collection and manipulation of data. The role of big data in this area is important to manage a time series of data, and a comparative analysis of the observed systems post hoc analysis of data mining is not available, especially when it comes to recognising the behaviour of systems and their components.


Introduction
The article's main investigations research problem is involving Environmental Informatics in Croatian Technical Studies.
Environmental informatics emerged in early 1990 in Central Europe as current initiatives to effectively manage, share, and reuse environmental and ecological data [13].Examples of these initiatives are the National Science Foundation DataNet projects, DataOne and Data Conservancy [19].
Information science provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences for processing and storing data and information for integration of knowledge about the environment and the environmental impact assessment [6].
The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information" [18].
Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs [10].Karatzas explains that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information technology and software methods and tools for serving environmental engineering needs."[19].
There are many scientific institutions that deal with environmental informatics, to mention only a few most There is a lack of applying Environmental Informatics in Croatia, including conceptualisation and parameterisation for modelling environmental objects and systems.Currently, most Croatian scientific journals and environmental institutions sporadically deal with environmental informatics.
The authors' contribution is to promote Environmental Informatics in technical science and professional society to establish models friendly to environment.Investigated results in this research pointed the need of increasing knowledge and involving Environmental Informatics with technical science.

Environmental informatics applying
According to this methodology, the research outlined below is planned and carried out.The study is planned to be carried out on 31 students.Due to the size of the sample applied, the statistics based on the Student distribution is used.
A number of computer-based modelling techniques were developed for studying environmental management systems, providing connection related decisions support base on conceptualisation, parameterisation and optimisation complex system.Numerous comprehensive decision support systems (DSS) are describing real-world complex engineering systems and problems.Such computer-based systems have, interactive graphics to catch dynamic characteristics of research systems, directly addressing issues of conceptualisation and parameterisation system in problem-solving processes [7].
Each process begins with the data collection.Information technologies, such as knowledge acquisition, data mining, uncertainty analysis, and expert system technologies, are helpful in increasing data integrity and reliability of decision-making.Environmental management systems generally have multi-objective, interactive, dynamic and uncertain features.Complexities exist in determination of system parameters, reflection of interactive relationships, formulation of modelling approaches, interpretation of research outputs, and implementation of recommended policies.
In collection of data for environmental management there are many challenges.Most of environmental models have a problem of data quality and availability.The collection of statistical data from the environment is fraught with difficulties, due to the wide range of environmental phenomena, data sources, participating agencies, as well as the complexity of their temporal and spatial characteristics [1].
The insufficiency of environmental data and records of its quality, as well as the lack of information often hinder the development of environmental management strategies.Online environmental database management system and decision support system have impact on policy decision-making and friendly system environment.However, in Croatia, there are online systems such as: NATURA 2000, Environmental Protection Agency (cro.Agencija za zaštitu okoliša -AZO), Meterological and Hydrological Service (cro.Državni hidrometeorološki zavod -DHMZ).All those systems are on the government level.

Collecting data and data processing
Fundamental methods of data collection use appropriate databases and data available from the relevant literature or survey.For the purposes of this study, survey was used as a method of data collection.
Databases are often filled and updated manually, which causes errors including missing data [3].The development of Information Technologies with a special emphasis on research methods of gathering and analysing data, their storage and access, has significantly enhanced the laboratory methods and their reports, together with the computer aided laboratory analyses, allowing carrying out more sophisticated and accurate analysis and easier data management.All these affect the quality of data including the research itself, and provide a stable base for the development and replacement of missing data (incorrect handling missing data and their implausible generation can lead to erroneous conclusions).Therefore, it is important to use adequate methods to handle missing data [3].Missing data statistically occur when some value of the observed variables is not present.In the survey, for example, refusing to answer the question results in a lack of data.During the data collection process by laboratory analysis or measurement, data may be lost during a certain period of time.This can happen either because the examiner dropped some measurement data, or the data is not collected during a period of time, or measurement is not at all conducted (e.g.holidays, Sundays ...).The lack of some data reduces the representativeness of the sample and therefore can distort derived conclusions.Therefore, it is necessary to take more action to prevent the missing of the actual value from the collected data [3].
G. B. Durrant groups classified missing data according to the reasons why data is missing [5]: -missing completely at random (MCAR), -missing at random (MAR) and -not missing at random (NMAR).
Knowing and understanding causes why data is missing may help the analysis of the remaining data.If missing values are missing at random, the sample is still representative.However, if the values are systematically missing, the results will probably be incorrect.It is therefore important to determine the type of missing data to the cause of their deficiency.The values in the set of missing data completely at random (MCAR), if the events are due to occur completely independently, do not depend on the observed variables and parameters of interest [14].
The values missing at random (MAR) approach is an alternative, and is applied when the missing data is related to a particular variable, for example, accidentally skipped answer from the questionnaire [12].Not missing at random (NMAR) is data category describing data that is missing for a reason.Such as, for example, intentionally skipped questions in the questionnaire related to income, health and etc.Moreover, the missing data can be univariate, which means that the missing data occurs only in one response variable, or multivariate if the missing data appears in more than one response variable [5].
Research results on MAR environmental data, using goodness tests Listwise Deletion Method and six Single Imputation Methods: Last Observation Carried Forward (LOCF), Hot-Deck Imputation, Group Mean Imputation, Estimated Mean Value Imputation (Regression), Mode Imputation Method and Median Imputation Method, shown in Gotal Dmitrović, L. et al. [3]: 1) When a small number of values are missing, all methods show good agreement probability distributions according to descriptive statistics.However, when 25 % of all values are missing, imputation methods of mean, median or mode show a large deviation.Thus, the probability distribution of the data obtained does not fit the empirical distribution.
2) The interesting fact is that the Listwise Deletion Method, which is the simplest method, provides very good matching results with the probability distributions, followed by Hot-Deck Imputation Method and Last Observation Carried Forward method.Regression method strongly levels the probability distribution by decreasing the maximum probability, as well as increasing the minimum probability values distributions.Last Observation Carried Forward (LOCF), as well as Listwise Deletion methods, closely follows the observed distributions, with the exception of a deviation from the actual value at the peak of the distribution [3].

Conceptualization and parameterization -theory of complex systems
To achieve the authenticity of the model, it is necessary to incorporate the actual system behaviour using the values which were determined from the theoretical distribution of probability for every particular event frequency.Using the same applications, the characteristic theoretical distribution of the real data, as well as the basic characteristics of descriptive statistics, were obtained at each checkpoint [4].
After processing the data (determined from the theoretical distribution), the conceptual model was developed that used the Diagram cycle activity -DCA (or Activity cycle diagram -ACD) as well as Ishikawa diagram.
There are many known modelling paradigms to describe the dynamics of system from the processoriented, event-based and activity-based viewpoints.In activity-based modelling the dynamics of system is represented as an ACD (activity cycle diagram) which is a network model of the logical and temporal relationships among the activities [15].The activity cycle diagram (ACD) is a method to describe the interactions of objects in a system.It uses the common graphical modelling notation to explain series of activities in real-life diverse circumstances.
Ishikawa diagrams (also called fishbone diagrams, herringbone diagrams, cause-and-effect diagrams, or Fishikawa) are commonly used diagrams created by Kaoru Ishikawa (1968) that show the causes of a specific event [8].Causes are usually grouped into major categories to identify sources of variation.Cause and Effect Analysis gives a useful way of considering all possible causes of a problem, rather than just the ones that are most obvious [9]).Although it was originally developed as a tool for quality control, one can use the technique for other purposes.For instance, Ishikawa diagram can be used to: 1) discover the root cause of a problem; 2) uncover bottlenecks in processes.
A mathematical model of real system is usually based on a system of differential or difference equations.The Runge-Kutta or higher order numerical method is commonly used to solve the system of differential equations [4].Mathematical modelling framework may be applied even to develop a system of model components using differential and difference equations simultaneously.
System dynamics is an approach to understand the nonlinear behaviour of environmental systems over time using stocks and flows, internal feedback loops and time delays [17].John Sterman, in his book "Business Dynamics: Systems Thinking and Modeling for a Complex World" wrote: "System dynamics is a perspective and set of conceptual tools that enable us to understand the structure and dynamics of complex systems.System dynamics is also a rigorous modeling method that enables us to build formal computer simulations of complex systems and use them to design more effective policies and organizations.Together, these tools allow us to create management flight simulatorsmicro worlds where space and time can be compressed and slowed so we can experience the long-term side effects of decisions, speed learning, develop our understanding of complex systems, and design structures and strategies for greater success" [16].
Developed model must be verified by means of statistical tests to determine whether the model describes real system well.Tests involve the calculation of a statistics, whose distribution is known under the null hypothesis.After confirming the goodness of the model, the actual system was designed to experiment with parameters or data.The environmental data was transferred to the adaptive model within the various conditions, by means of visualisation and manipulation of basic system parameters, which are portable and adaptable to possible crisis situations [4].Advances in performance computing have shown great potential to improve the prediction accuracy in the practice of environmental systems modelling.The design and implementation of effective environmental policies need to be informed by a holistic understanding of the system processes (biophysical, social and economic), their complex interactions, and how they respond to various changes.Models, integrating different system processes based on a unified framework, are seen as useful tools to help analyse alternatives with stakeholders, assess their outcomes, and communicate results in a transparent way.The approaches considered are: systems dynamics, Bayesian networks, coupled component models, agentbased models and knowledge-based models (also referred to as expert systems).
When choosing the type of modelling approach to be used it is important to consider three main questions: What is the purpose of the model?What types of data are available to develop and specify the model?And, who are the model users and what requirements are there on the scales and formats model outputs?Kelly et al. [13] point out that the most appropriate modelling approach depends on: model purpose, types of data available, system conceptualisation (treatment of space, time and entities or structure) and treatment under uncertainty environment.The guiding framework for selecting the most appropriate modelling approach is represented in the form of a decision tree depicted in Fig. 1 [11].
Information technologies are important in the sustainability-based decision-making process.A typical computer-based technology that has been widely used in assisting environmental systems analysis is geographic information system or geographical information system (GIS).GIS is effective in handling complicated spatial information that is essential for many environmental studies.Many GIS-aided environmental modelling and decision-support systems have been developed such as ECOLECON, TERRA-Vision, SYLVATICA, ALBE GIS and AVS, RELMdss and UVIEW etc. Technical Gazette 24, 6(2017), 1869-1875 Figure 1 Decision tree for selecting the most appropriate integrated modelling approach under standard application [11] Figure 2 Attributes and indicators in questions for students Remote sensing (RS) is another important computerbased technology for supporting environmental systems modelling to perform systems analysis.Most of RS projects produce large volumes of spatial information, while GIS is an effective tool for storing, manipulating and analysing them.Consequently, a number of integrated environmental modelling and RS-GIS studies have been reported.Recent advances in the technical integration of GIS and RS in connection with global position system (GPS) and database management systems (DBMS) successfully streamline the information flows among stakeholders such as Client, BAPIS, Python for Operations Research etc.This research is preliminary, and decision-making under uncertainty.The system processes are understood and because of that knowledge based models method is used for model developing.

Data collection
The basic of preliminary research is based on the sample of 31 students.Measuring instrument (survey) was checked on validity (consistency) of survey by experts.
Five experts assessed all involved indicators.For each indicator every expert opinion is given on a scale of 1-4 defending their importance.Reviews are done by using Lickert scale: 1 -obligatory, 2 -desirable, 3unnecessary, 4 -I cannot estimate.All modifications indicators are accepted with degree greater than <2.5.
Final year students were given survey form with 10 questions, 9 of which were a closed-ended question (yes / no), and 1 was Scale question.The survey was anonymous with unlimited time to fill.The sample was large (31 students).
The questions are (Fig. 2): 1) Have you heard of the term Environmental Informatics?(YES/NO) 2) Have you ever thought to deal with Environmental Informatics?(YES/NO) 3) Is modelling (creation and use of models) an integral part of the Environmental Informatics?(YES/NO) 4) Does GIS belong to Environmental Informatics?(YES/NO) 5) Have you attended and passed the course: GIS? (YES/NO) 6) What grade have you got from the GIS course?Rating with 2 is below average, and 5 is excellent.
(2/3/4/5) 7) Does the statistical analysis of data belong to the Environmental Information?(YES/NO) 8) Have you encountered, while browsing the internet, the terms related to Environmental Informatics?(YES/NO) 9) Do you use an application that you would set in Environmental Informatics?(YES/NO) 10) Have you used differential equation in practice during your education?(YES/NO) The expected and obtained answers are presented in Tab. 1.The values of the results for question 6 are: the mean course grade value is 3.33, median is 3 and mode is 3.

Research results
All students in survey were required to attend GIS courses (compulsory course 2 nd year of study) and only one student had not passed (answer 6).Mean of students rates in GIS course is 3.33 (average).
All students learn the basics of differential equations in the first year of study, and apply them within next two years (only 35.48 % of students know that -answer 10).
It is surprising that only a small number of students (12.90 % -answer 9) classified applications, using a model to calculate the load of the building, due to wind or rain (model uses differential equations), in the Environmental Informatics.Even though, they have used this application three weeks prior to the surveying.
Confusing fact is that 75 % of students, who received an excellent at GIS course, are not interested to use it in the future in any part of the Environmental Informatics in general.The same percentage shows students with good grade of GIS, while students with the worst scores do not even show desire (0 % positive response) to work in the environmental information technology.
Following analysis of student results, similar research was conducted with university teaching staff.
After evaluation of the preliminary research, investigation started surveying student's teachers.The sample compressed 11 teachers.Both studies were examined separately by experts.Since the samples were analysed by the same methodology, the comparative analysis of the research results (χ 2 test) was performed.The sample of university teachers contains 11 individuals (Fig. 3).Questions   The same percentage of university teachers considers that a statistical analysis of the data is not part of the Environmental Informatics (answer 16).
Nearly 3/4 of university teachers do not recognise that the applications they use, belong to the Environmental Informatics (answer 18).Using the χ 2 test it was examined whether there is a statistically significant difference in the responses between students and teachers.For comparison the response to question that was the same in both surveys (1 to 11, 2 to 12, 3 to 13, 4 to 14, 7 to 16, 8 to 17 and 9 to 18).The results are shown in Tab. 3 and in Fig. 5.According to Fig. 5, in responses to questions "Have you encountered, while browsing the internet, the terms related to Environmental Informatics?" and "Do you use an application that you would set in Environmental Informatics?" there is a statistically significant difference between students' answers and teachers' answers.

Conclusions
In 20 th Century developed countries are faced with the rapid industrialisation and urbanization.As the population grows over time, the public has progressively made greater demands on the environmental resources.Information technologies are becoming more and more important for environmental management.
The scientific community ( [7,2]), has recently highlighted the need for the conceptualisation and parameterisation of the system model before the construction of the system itself.Environmental informatics enables scientists, engineers and managers to provide insightful planning and formulate environmental policy such that effective decision-making scheme can be identified.
It turns out that in Croatia it is important to inform and educate students about the benefits of using information and communication technologies (ICT) in the technical fields of science.However, all indicators suggest that the results are not satisfactory.To investigate the cause and source of these results we have expanded our research on university teachers.But they also strongly suggest that we should start with more intensive use of environmental informatics for sustainable development and advancement with all the challenges and barriers that occur in practice.
Environmental protection is not only the protection of "water and air", it is also protections of building, cultural and historical heritage.A true and full environmental protection is not possible without information and communication technology (ICT).
Research shows the insufficient use of methods of operations research and simulation, organisation engineering, including conceptualisation and parameterisation complex models, followed by appropriate algorithm and procedures for the collection and manipulation of data.Students and their teachers are not engaged in environmental protection so they do not see the value in organizing data collections for model construction and analysis.The role of big data in this area is crucial for a time series of data and a comparative analysis of the observed system.Post hoc analysis of data mining is not available, especially when it comes to recognising the behaviour of systems and their components.

Figure 3
Figure 3 Attributes and indicators in questions for teachers

Figure 5
Figure 5 The results of the χ 2 test

Table 1
The expected and obtained answers of students

Table 2
Expected and real answers of university teachers.All expected answers were taken as YES.It is therefore understandable that the notion is recognized by only 29 % of surveyed students (answer 1).

Table 3
The results of the χ 2 test