ONTOLOGY-BASED REASONING FOR ENTITY-RELATIONSHIP DATA MODEL SEMANTIC EVALUATION

Original scientific paper Conceptual modeling is one of the most important activities in the modeling phase of information systems development most commonly presented by entity-relationship data model. This paper presents a system for entity-relationship data model semantic evaluation that is based on comparing ontology with data model elements. This approach is based on domain ontology and data model formalization at predicate calculus form that is suitable for reasoning. A set of reasoning rules for ontology to data model mapping was defined. The whole process is empirically verified and confirmed. For this purpose it has been developed a software tool for ontology and data model transformation to predicate logic form and then to a set of Prolog-like clauses. After integration of these sets of clauses and rules, a Prolog-system was used for reasoning in order to quantitatively express the quality of data model with appropriate metric.


Introduction
Building new or improving existing information systems is always a complex project, with many decisions [12] in the design and set of models at different levels of abstraction to manage.Most commonly used models in information system development include analysis phases models (business process models, conceptual data models) and design phase models (data models -entityrelationship, conceptual and relational model, class diagrams, functionality models -use case, sequence, activity diagrams, implementation models -components and deployment diagrams) [9].Results from analysis are mapped to elements of design.It is very important to focus on quality in results of early phase's development, since cost of removing the same defect that could be allocated in early phases increases significantly if allocated in later phases of development [5].
One of the most important phases in information system development is conceptual data modeling (CDM), since it is a basis for other development activities and project management activities such as software size estimation [4].Conceptual data models are most commonly presented by entity-relationship (ER) modeling [7].This is the most widely accepted way of data requirement description at the conceptual level [15].
In the field of information systems models, Van Belle [25] introduces a general metrics framework related to syntax, semantic and pragmatic aspect of model quality evaluation.Data quality research [2] is related to development of methodologies, frameworks and tools for measurement and improvement of data models and data in databases.Results in this field propose frameworks that define the set of quality characteristics, the metrics that could measure the level of quality characteristics achievement in particular case and the set of activities to perform measurement and metrics for model evaluation.
In recent years, researches in the field of data model evaluation resulted with various proposed methods, metrics and frameworks that are published but a small number is empirically tested [16,18,19,20,21].None of these tested methods and metrics are oriented to the semantic validation of a model quality what was the main reason for starting this project and research.

Data models evaluation
Methodologies and frameworks for data model quality evaluation could be generally classified as [2]: data-driven vs. process driven methodologies; measurement vs. improvement methodologies and general vs. specific (related to particular model types or notations) methodologies.Batra and Antony [3] presents conceptual modeling errors as human errors at three performance levels: skill-based, rule based and knowledge based.
Research [18] shows analysis of proposed solutions to evaluation of conceptual data models.More than 50 various proposals to conceptual data modeling evaluation are published, but less than 20 % of them are empirically validated.None of proposed solutions is accepted in practice, outside the research environment.These solutions are at different level of generality (the research ones are more general and difficult to be implemented in practice, while the practically motivated ones are more focused on particular modeling notation).The proposed solutions show lack of agreement of terminology, lack of consistency with related fields and standards, lack of measurements metrics and evaluation procedures, lack of Technical Gazette 24, Suppl.1(2017), 39-47 guidelines for improvement (proposed solutions are mostly focused on error detection), lack of attention to process quality (i.e.process of creation of conceptual data models and prevention of errors), but are rather oriented to product quality detection (and some of them: correction), lack of empirical studies from practice (i.e.studies on how conceptual data model evaluation is made in practice).Other empirical validation included action research with collaboration of researchers and practitioners in the field and with practical projects and issues in conceptual data modeling evaluation.
Metrics in [14] are defined with the aim to enable comparison of equivalent models so as to direct designer toward a better design.
Recent researches in the field of automating conceptual data models evaluation consider conceptual data model as a "product".Certain software tools are developed as prototypes that enable: • Analysis of conceptual data model elements quality, based on domain ontology [24], • Comparison of created conceptual data model with other models [19], • Automated reasoning on quality of conceptual data models [8].
Combining action research with practitioners and laboratory research with both experts and novices in conceptual data modeling, progress is made toward generality and applicability of proposed conceptual data model evaluation framework in practice [19].Still, empirical verification of the proposed framework is subjective in quality criteria metrics ranking, i.e. ranking of created conceptual data models is performed by qualified persons and it is not automated.Recent research results are related to automation in evaluation of conceptual data model [15,16,18].
Other prototypes consider process of conceptual data model creation and improve it by enabling assistance or complete automation in: • Consulting support to novice designers related to conceptual data model elements quality [3], • Automated creation of conceptual data model design [6].

The reasoning system for data model semantic evaluation based on ontology
Motivated by previously presented problems we started a project related to ER data model semantic evaluation.The main idea was integration of automated reasoning system, ontology, data model and reasoning rules with the aim to evaluate the ER data model semantic quality.The ontology is proved to be the adequate technique for dealing with semantic of data [17].The approach is formulated in the context of data model quality measurement and formal theories mentioned in [13,14,16,18,26].

System features and architecture
Our research goal was to develop and empirically verify an automated system for reasoning that will have features such as: • Rule-based system, • Enable automated reasoning on ER data model quality.
• Provide answers related to particular element of a created conceptual data model and an overall data model quality evaluation.
• Enable evaluation of semantic aspect of the created ER model and therefore should be based on comparison with "semantically rich" models (such as ontology models [25]) that enable presenting semantic variations.
• Scalable, i.e. should be applicable to any size of the conceptual model.• Prolog as a core reasoning system that computes answers to queries.

Ontology and data model formalization
Data model is a formal abstraction through which the real world is mapped in the database [25].It enables representation of a real world concepts and elements through a set of data entities and their connections.They can be represented in various ways: graphical representation with schemas, data dictionary representation and formal languages representation, such as predicate logic calculus.
Formal presentation of ER data model is extension of formalization presented in [11] where data model is represented as S = (E, A, R, C, P), where:  Ontology is often used to capture and share knowledge in a specific domain of interest [24].Ontology describes the concepts in the domain and also the relationships that hold between those concepts [25].The basic characteristics of ontology are hierarchy of concepts/objects, which is established by using different semantic links [13].Ontology elements like type, class, subclass, property, sub-property, domain and range could be mapped to predicate logic form according to [1].Predicate logic form of ontology could be written in Prolog-like form like ER model elements.Structure of ontology is a collection of OWL/RDF elements that are transformed into RDF expression as a collection of triplets, each consisting of subject, predicate and objects [27].Facts that are described with RDF triplets represent a relation between things denoted by subject and object of the triplet, or even their properties: RDF (Subject, Predicate, Object).List of these predicates for an abstract ontology shown in Fig. 3 is listed below: rdf(class1, type, class).rdf(class2, type, class).rdf(relation1, type, object property).rdf(relation2, type, object property).rdf(object1, type, named individual).rdf(object2, type, named individual).rdf(object1, relation1, object2).rdf(object2, relation2, object1).

Reasoning rules
Model evaluation in this system is performed by applying a set of reasoning rules to formalized representation of ER data model and ontology with the aim to compare them.Technical Gazette 24, Suppl.1(2017), 39-47 Mapping ontology to data model elements is based on research [10], where: Rule 5 -Data properties and data properties ranges from ontology that are covered by attributes with defined data types in conceptual data model.For each attribute in data model from set of attributes there is a restriction with data type name [10]: D) Mapping classes and subclasses from ontology to IS_A hierarchy relationship in ER data model: Rule 10 -Ontology classes and subclasses that are covered by IS_A hierarchy entities in conceptual data model.According to [10] for each class from ontology must be defined a named entity super-class type in data model, and each ontology subclass is presented with entity subtype, with restriction that subtypes in data model must be different objects:

Ontology mark calculation for a data model
For each ER data model final rank evaluation from the aspect of ontology mapping (OM) is quantitatively represented as a sum of ontology mapping evaluation points for each element of the data model.These particular marks for elements are measured by handling the Prolog answers on goals.For each data element is given a "weight factor" K T , where T represents an ER element type.Weight factor, according to [16], represents a quantitative expressed significance of an element in the analysis of the whole conceptual data model.
An ontology point for entities is calculated as: , An ontology point for attributes is calculated as: , 4 3 An ontology point for relationships is calculated as: Total ontology mark for entire ER data model is calculated as: , 4 Explanation for Eqs. ( 12) ÷ ( 16) elements: − OM is ontology points for each data model, − OM E is ontology points for entities, − OM A is ontology points for attributes, − OM R is ontology points for relationships, − OM SC is ontology points for super-classes entities and sub-classes entities, Minimum values for OM, OM E , OM A , OM R and OM SC particular marks are 0, while maximum value could be 100 for particular and also for total ontology mark for a whole data model.

Process of using the system
The proposed system is implemented by using the following software tools: • Ontology editor Protégé developed at Stanford University for creating ontology.
• CASE tool Sybase Power Designer for projecting ER/conceptual data model.
• Amzi! Prolog as a reasoning system that computes answers to queries.
For the purpose of files transformation and integration to appropriate Prolog program needed for Amzi!Prolog, special Data Model Valuator (DMV) tool was created by using Microsoft Visual Studio.NET development environment.The process of using this tool starts with creating ontology by using an ontology editor.The ER model is created in a CASE tool.DMV tool could be started.A user could start an option for loading ER model and an option for formalization of data model that will parse elements of data model to a set of Prolog-like clauses and present them in user interface.Another option is loading ontology for its transformation to a set of Prolog-like clauses that are also presented.Third step is loading a set of defined reasoning rules.An example of using DMV tool up to this point is presented in Fig. 4 with created ontology (Fig. 6) and ER data model (Fig. 5) for this system usability empirical validation.
After all clauses are created and ready in integrated list (i.e.conceptual model's clauses, ontology's clauses and reasoning rules), we used Prolog as a core reasoning system for computation of answers to queries related to particular data model and ontology.Answers from reasoning system must be included in previously defined metrics (12), ( 13), ( 14), ( 15) and ( 16) for ER data model semantic evaluation.
On this mode must be calculated ontology marks for all elements of ER model by ( 12), ( 13), ( 14), (15) and then the final ontology mark for entire ER model by (16).The initial empirical testing of the system has been made with a case study in application of initial set of reasoning rules to a single ER data model.Empirical research is conducted as a laboratory experiment with students' data models collected from the practical exam.Participants in this research are students from University of Novi Sad, Technical faculty "Mihajlo Pupin" in Zrenjanin, Serbia.They are all students of the second year of undergraduate (bachelor) studies of information technology engineering.
These 132 participants were given the same exam, i.e. a textual specification of a case study for organizing international conferences (shown in Fig. 5).A single ontology is created to represent the specified case study and domain of problem area (shown in Fig. 6).
Each of students' data models was loaded in DMV tool to be integrated with ontology and set of reasoning rules presented from Eq. (1) to Eq. (11).Integrated programs were individually loaded in the Amzi!Prolog listener environment for executing queries according to rules (1) to (11).Prolog listener has shown results of each query answer computation as we presented for rules (1) and (2).After mapping ontology in empirical study with DMV tool into the Prolog-like clauses we create over 330 facts in RDF triplets.Students' data models result with minimally 160 to more than 250 facts in Prolog sentences.Integrated program for reasoning with rules has from 500 to almost 600 clauses that were all individually loaded into the AMZI!Prolog to be processed.
Statistics is performed upon all results data used for overall evaluation of each ER data model by using equation ( 14) and K T = 1 (which means that each "weight factor" is 1 for any of evaluated model, i.e. all considered are equally significant).

Empirical results
Overall statistics related to each reasoning rule accomplishment in all models is presented in Tab.1.Analysis of statistics on empirical results shows that ontology classes are covered by entities in ER data model with more than 92 %, ontology data properties are covered with 54 % appropriate attributes, while object properties are covered by relationships in ER model with 41 %.Ontology classes are covered by only 30 % of appropriate super-class type entities.At the end it can be seen that ontology sub-classes are covered by 30 % subtype entities.Ontology data properties and data property ranges are covered by 41 % of attributes and data types in data model.A result of computation of each model's ontology mapping evaluation mark is presented in Tab. 2. From the sample of 132 models that have been tested five data models are shown that have the greatest semantic completeness and suitability to ontology and the five worst created data models.
Analysis of empirical results for each ER data model ontology mapping evaluation shows that the best models Technical Gazette 24, Suppl.1(2017), 39-47 do not have better than 89 % of evaluation points, while the worst done models are approximately at 35 %.Average result of all tested and evaluated data models is almost 64 % of semantically correctness, i.e. completeness and suitability to domain ontology.

Conclusion
From introduction of ER modeling as conceptual data modeling methodology, many research efforts have been focused on creating methodologies and frameworks for evaluation of conceptual data model evaluation, especially in the last decade.Still there is no consensus in creating a unique or integrated framework or standard in this field.Most of the proposed frameworks are still in the domain of theory and less than 20 % of them are empirically evaluated.
This paper shows results in a project of developing a reasoning system for ER data model evaluation based on domain ontology.This system integrates results of using CASE tool for data model creation, ontology editor for ontology creation, reasoning rules for data model evaluation based on mapping with ontology within an automated reasoning system that computes answers needed for metric.Overview of the system is presented, with theoretical contribution that is reflected in formalization of data model and mapping with ontology in a form of clauses.Results of empirical testing and verification of developed system are given.
There are many contributions of the presented research.It has been shown that it is possible to evaluate a semantic aspect of an ER data model.The proposed solution is based on mapping of data model with ontology.This approach is applicable in situations where ontology is created as a basis for evaluation of a group of data models related to the same semantics.This system is scalable and flexible, with ability to separate reasoning rules from reasoning logic.Within DMV tool, an extended formal representation of ER data model is implemented.
Future work could include adapting system to other types of data models, extension of reasoning rules to enable both syntax and semantic verification, with the aim to enable more complete data model verification.This system must be empirically tested with large data models.One further step could be development of consultation expert module that would provide presentation of conceptual data modeling errors and suggestions to improvements.

Figure 1
Figure 1 Proposed system for ER data model evaluationThe developed reasoning system consists of several modules, i.e. software tools integrated to a complex system.These modules are:• Ontology editor/tool for creating ontology,

Figure 2
Figure 2 ER data model schema Formalization of an ER model includes creating sets of elements that are written as Prolog-like clauses.Predicate names for elements of S set are: ent for E set, atr for A set, rel for R set, res for C set and p for P set.Set of formalized elements of conceptual data model schema from Fig. 2 is listed below: E = {ent(e 1 ), ent(e 2 )} A= {atr(a 1 ), atr(a 2 ), atr(a 3 ), atr(a 4 ), atr(a 5 )} R = {rel(r 1 )}

Figure 4
Figure 4 Data Model Validator software tool

Figure 5
Figure 5 ER data model schema that was used in empirical study

•
[10]logy class is mapped to entity type, Ontology data property is mapped to attribute, Ontology object property is mapped to relationship, Rule 1 -Ontology classes that are covered by entities in ER model.For each class from ontology must be defined named entity set in data model[10]: • • Rule 6 -Object properties from ontology that are covered by relationships in conceptual data model.For each object property from ontology must be declared named relationship in ER data model: Explanation of symbols used in reasoning rules: − x, x 1 , x 2 ,xc,xc 1 ,xc 2 ,xe 1 ,xe 2 ,y,yop, yr, z, zcd 1 , and zcd 2

Table 1
Empirical results for data model elements semantic evaluation

Table 2
Empirical results for ER data model semantic evaluation (the best five and the worst five models score)