Nouns in the Conceptual Framework "Node of Knowledge"

: The "Node of Knowledge" method is one of the elements of the conceptual framework "Node of Knowledge (NOK)". It enables knowledge representation in graphical and formalized (textual) form and makes it possible to store formalized records of natural language sentences in a relational database. To enable correct transformation of all words from natural language sentences to formalized records, it is necessary to design a metamodel of a language, i.e. to analyse all word types of each particular natural language and define rules for transformation of sentences in natural language into formalized records. This paper analyses nouns in Croatian and English language. It presents rules for transformation of nouns and noun phrase structures into formalized records, with examples in both languages. The system is preliminary tested with a small set of sentences (used as input knowledge) and questions. Testing results are presented and discussed.


INTRODUCTION AND RELATED WORK
Development of knowledge representation began within the field of artificial intelligence in 1970s. It had a big role in development of artificial intelligence and remained one of the strongest fields of artificial intelligence [1]. Knowledge representation searches for methods for formal description of information and knowledge, which implies representation in an unambiguous language or notation with well-defined syntax and semantics.
The conceptual framework "Node of Knowledge (NOK)" is a set of methods, rules, corresponding analysis tools and the representation of semantics contained in natural language sentences. The conceptual framework NOK includes NOK method, formalism for graphic representation (Diagram Node of Knowledge, DNOK), formalism for displaying knowledge in textual form (Formalized Node of Knowledge, FNOK) and formalism for representing questions in textual form (Question Formalized Node of Knowledge, QFNOK) [11].
Initial research [12][13][14] showed that it is possible to model sentences in natural language using conceptual framework NOK. Additional research showed that NOK is applicable in different languages, without adaptation of algorithms of the question answering (QA) system, under condition that rules are well defined [15].
It is necessary to analyse all word types of a natural language to design a metamodel of a language and to define rules for transformation of sentences into FNOK record. In previous work, analysis of adjectives [16] and verbs [17] was already made in the NOK method.
In this paper the focus is on the nouns in Croatian and in English language. The paper defines rules and solutions for the transformation of nouns into nodes in a FNOK record. The rules analyse nouns depending on the relationship between the noun and the verb (noun in the service of the subject or object), predicate noun, appositive, and cases of the noun. Additionally, the article (a, an, the) was analysed and the paper provides a solution for transforming an article into an FNOK record.
Word in a sentence can be analysed according to the word category or word class (or parts of speech) [18,19].
Hinddle [20] describes a method of determining the similarity of nouns based on a metric derived from the distribution of subject, verb and object in a large text corpus. There are different approaches to word analysis in English language, such as word-to-word similarity function [21], adjective-noun composition (AN) for corpus-based distributional semantics [22], etc., but only a few for inflective languages with almost free word order. Croatian, which is a subject of this research, belongs to the group of inflective languages. Nouns and verbs are in the focus of some other research [23][24][25], but the question of semantics still remains.
Research presented in [26] investigates the "attribute" relation between the constituents of adjective-noun phrases. Within this research some attributes (such as SIZE, WEIGHT or COLOR) are added to the nouns. They use CBOW word embeddings to represent word meaning and learn a compositionality function that combines the individual constituents into a phrase representation.
In the NOK method, semantics between two nodes is contained within the role. Role name is a data that belongs to the node and questions the role of linking this node with another node [10]. A role is a touch point of a link and a node. Most often, role is a wh-question.
This research covers two languages: Croatian, as authors' mother language and English, as widely spoken and leading world language. Two languages were selected for initial implementation to ensure the QA system will have the ground to adapt to requirements of different languages. To achieve this goal, language model should include characteristics of all word types in different languages. Similar research covers some other languages in addition to English (e.g. Punjabi [27]) but for Croatian we have not found previous research by other authors.
The paper is organized in the following way. After introduction and presentation of related work in Section 1, Section 2 gives research motivation. Section 3 describes the research methodology. In Section 4 authors analyse nouns in Croatian and English language. Examples and rules for transformation of natural language sentences into formalized records in Croatian and English language are presented. Results of preliminary testing of a QA system with a focus on nouns and noun phrase structures are also briefly presented. Section V provides conclusions and plans for future research.

RESEARCH MOTIVATION
Over time, people have incorporated knowledge (e.g. about the world) into language, so today "human" (natural) language contains all the knowledge that describes the world as well as its relationships with its elements. In our research, we try to extract this knowledge from the language, incorporate it into knowledge base models and build a knowledge base so that we can answer the questions about the world by questioning. However, many questions and problems arise: how to acquire this knowledge and how to incorporate existing knowledge into the knowledge base.
Our basic research motivation is to create a knowledge network (knowledge base) that will allow us to extract knowledge from the model. To be able to do it, it is necessary to consider and analyse all word types and design a model for each word type. After analysing the verbs that are the bearers of statements using this method and adjectives, in this paper we deal with nouns.
The goal of our broader research is to develop an intelligent information system that can accept textual knowledge (knowledge expressed in sentences of natural language), store this knowledge in a certain form and generate an answer to a question that was set. To achieve this goal, it is necessary to design a metamodel of the language and enable insert of sentences in natural language into a database. To store the sentences in natural language in a database, it is necessary to transform them into a formalized record (FNOK record) [28,29]. To create a metamodel, analysis of each word type is necessary, as well as definition of rules for transformation of sentences and words into formalized records that will be in a form suitable for insertion into relational database.
Verbs are used to express actions, states and events. They are very common in natural language sentences, and in the NOK method they are usually process nodes. Next to verbs, nouns are very common in natural language sentences, as well. It is most common to find them in a sentence as subjects or objects, and in NOK they are ordinary nodes. Rules for transformation of sentences into FNOK records are important for metamodel design. In this paper nouns in English and Croatian language are analysed.
The advantage of this approach and method is to enable access to knowledge saved in its verbalized form. Knowledge written in textual or verbalized form can be stored in a knowledge base that is semantically rich enough to give us an answer to a question asked in natural language.
A limitation of this approach is the lack of precise semantics in verbalized knowledge. Since verbalized knowledge has a lot of contextual and hidden knowledge, additional text enrichment is needed. An intelligent system is needed that will transform knowledge from natural language into a knowledge base. Additional limitations are related with difficulties in constructing a questioning system given the complexity of the language.

METHODOLOGY
The basic concepts of the NOK method are node (label: rectangle), process node (label: ellipse) and link (label: straight line) with a single role. Two ordinary nodes, two process nodes or one ordinary and one process node are connected with a link. Each link between the nodes has only one role. The link between the two nodes is accomplished by asking a question related to the first node, and the response results in another node (which is on the other side of the link). By using the NOK concept, DNOK is created.
According to rules defined in the NOK method [10], each word in a sentence of natural language corresponds to a node in the FNOK record. Verb in a sentence of natural language corresponds to a process node in FNOK. Transformation of a sentence in natural language into its formalized FNOK record is preceded by analysis of that sentence. The first step is to find verbs in it, because verbs are most common process nodes and they belong to the highest level of hierarchy. Other nodes belong to lower levels of hierarchy. Open bracket marks descending to lower levels of hierarchy, while closed bracket marks return to the previous level. Question or a question word is in front of each node in the hierarchy and indicates a role that links a node on a lower level with a node on a higher level, i.e. node on a lower level is an answer to the asked question. Question (Role) is marked by quotation marks and a question mark ("?"). Some of the questions that can be asked are: "who?", "what?", "when?", "how?", "where?", "how much?", "how many?" etc., depending on the word type. Questions asked with roles are usually from the group of wh-questions.
Let X and Y be two words in a sentence, and with a syntactic and semantic relation between them (according to grammar rules, these words should be in a certain order which reveals additional semantics). If the link between words X and Y can be identified using a wh-question Z?, then this is a formal record of their relation: X("Z?" Y). In other words, it is possible to ask such a wh-question for a word (in a sentence), i.e. node X in FNOK, to receive word, i.e. node Y as an answer to it. General FNOK representation: process node ("role 1?" node 1, "role 2?" node 2 ("role 3?" node 3 ("role 4?" node 4, …), …), "role n?" node n) where role 1? to role n? are wh-questions, node 1 to node n are nodes, and process node is the process node.
Each node can have its own hierarchy.

RESEARCH RESULTS
This chapter presents rules for transformation of nouns to nodes in a FNOK record.
In English, nouns also have gender (masculine, feminine, neuter, common), number (singular, plural), cases and articles (a, an, the) [30]. In modern English, there are three cases: subjective (he), objective (him) and possessive (his), or in their old English form nominative, accusative and genitive [31]. Noun is typically used in a sentence as subject or object of a verb, or as object of a preposition [32].

The Relation Between a Noun and a Verb
Since verbs are always in the beginning of FNOK records (marked as process nodes), there is a question of relation between nouns and verbs in FNOK record. For nouns hierarchically dependent on a verb, the following rule is applied: Rule 1. Nouns are below verbs in the hierarchy (subordinate to verbs). We list them after the open bracket and the question on which this noun is the answer to (depending on the verb). A noun hierarchically dependent on a verb (i.e. process node) can be shortly represented as: where V stands for verb, N stands for noun Or more precisely, V, N  V ("role?" N), where V stands for verb, N stands for noun, and role? stands for a wh-question, symbol "" marks the mapping of the sentence into FNOK record. For example, the sentence Peter reads a book. (cro. Petar čita knjigu.) has one verb, reads (cro. čita). Reads (cro. čita) is a process node in a FNOK record and it is placed at the beginning of the FNOK record. After discovering a process node, we will search for other words connected to the verb. In the starting sentence we notice nouns Peter (cro. Petar) and a book (cro. knjigu). Noun Peter (cro. Petar) is a subject and a book (cro. knjigu) is an object. This sentence can be divided in two parts: subject + predicate and predicate +object. In the first part subject responds to the question "who?" (1.a, 1.b.), and in the second part object responds to the question "what?" (2.a, 2.b).

Predicate Noun
Predicate noun is a predicate which consists of an auxiliary verb "to be" and a noun [18]. For predicate noun the following rule is defined: Rule 3. A noun as a part of predicate noun is hierarchically dependent on auxiliary verb (i.e. process node) with which it creates a predicate noun. In hierarchy it is positioned below the auxiliary verb, but in the same level as words that are a subject in the sentence. Shortly, it can be represented as: AuxV, Np  AuxV (Np), where AuxV stands for auxiliary verb, Np stands for noun as a part of predicate noun.
More precisely, AuxV, Np  AuxV ("role?" Np), where AuxV stands for auxiliary verb, Np stands for noun as a part of predicate noun and role? stands for a whquestion.
If the previous record is expanded with a noun, pronoun, or similar which present a subject, the rule is: Np), where AuxV stands for auxiliary verb, S stands for subject, Np stands for noun as a part of predicate noun.

Appositive
Appositive indicates a relationship between two or more words or phrases in which the two units are grammatically parallel and have the same referent. If there are two or more nouns in a sentence, and one of them hierarchically dependent on a verb, and others refer to a noun dependent on a verb (they serve as a complement to a noun (appositive) and usually are compliant with its gender, number and case), then nouns are organized hierarchically as well. Initially, a noun on the highest level of hierarchy is stated (the one related to the verb), and it is followed by other nouns related to it (and other word, i.e. nodes). With each noun it is necessary to define a question (role) that refers to the noun superior in hierarchy. Relation between nouns is defined by the following rule: Rule 4. Appositive and appositive phrase formed by one or more appositives and their attributes, are transformed to FNOK as a noun (marked as N ap ) hierarchically dependent on another noun (i.e. node N): N, More precisely: N, N ap  N ("role?" N ap ), where N is a noun, N ap is noun complement (appositive), role? whquestion.
In this example prijatelj (engl. friend) is an appositive, and prijatelj Tom (engl. friend Tom) is an appositive phrase.

Cases
Nominative indicates subject of a finite verb (Who? or What?) and it corresponds to English's subject pronouns. and it corresponds to English prepositions by, with, via, and words such as using, by use of and through.
In English language, nouns do not change in form for the cases. Cases are expressed with a help of prepositions in front of the noun (of, to, about, with). Rule for prepositions used with nouns to express cases is defined as follows.
Rule 5. Prepositions used with nouns to express cases are transformed to FNOK in a hierarchic relation dependent on a noun below which they are put. In a question that requires noun as an answer preposition is not included, but instead the noun is connected to the preposition (preposition is on the lower hierarchy level than the noun. To define the role the same question is used, and it will offer noun as the answer. This can be represented as the following mapping: Adj, N  N (Adj), where N stands for noun, Adj stands for adjective.
More precisely, X, Adj, N  X ("role?" N ("role?" Adj)), where X stands for any other word type, N stands for noun, Adj stands for adjective, and role? stands for a whquestion.

Article -A, an, the
There are two different types of articles in English language: indefinite (a, an) and definite (the). Indefinite article a is used before singular nouns or with certain expressions (a few, a little, a lot, such a, quite a, half a). If a noun starts with a vowel, instead of a, another form of indefinite article is used: an. Definite article the is used before a noun to indicate that identity of the noun is known to a reader, for example as a known entity or when the noun is mentioned for the second time in a certain context [30,33,34]. Rule for articles is defined as follows. Rule 6. In FNOK an article is transformed as hierarchically dependent on a noun. To define the role, the question uses additional mark "art?". This can be represented by the following: Article, N N (Article), where N stands for noun and Article stands for article.
Examples for articles are presented in DNOK in Fig. 1 to Fig. 4.
In the development of the first system prototype the model presented in this paper was used. Preliminary testing included 30 sentences and 55 questions in English language, and 30 sentences and 54 questions in Croatian language. These sentences contained nouns and noun phrase structures described by the rules defined in this paper. For the given questions, the system offered 93% of correct answers in English and 94% of correct answers in Croatian language.
Based on the rules presented in this paper and by using the entity relationship method, ER diagram for nouns is built (Fig. 5). This model will become a part of the metamodel for the language that will present all word types and their relations. According to rules presented in this paper, after transforming a sentence to a formalized NOK record, a noun can be related to the following word types: verbs, adjectives, pronouns and prepositions. It is also directly related to articles. The focus of this paper is on nouns. Other word types can be mutually related too.

CONCLUSION AND FUTURE WORK
Previous research by the authors has covered an analysis of the transformation of adjectives and verbs in the NOK method. This paper presents analysis of nouns and noun phrase structures in Croatian and English language. According to the analysis results, rules were set, that are used for transformation of these elements of the sentence into formalized FNOK records. FNOK records are suitable for insertion into relational database. The paper defines 6 basic rules for nouns (used as subject, object, appositive, predicate noun, etc.), articles (used in English as defined and undefined) and prepositions before nouns (used to express cases). Nouns are closely related to articles and prepositions according to the rules of the language, and therefore they are all analysed in this paper. Each rule is followed by several examples and one DNOK in both Croatian and English. The presented rules are part of the metamodel of the language that is necessary for development of a QA system.
We are aware of some of the limitations of this research. System prototype was tested with a small set of sentences and questions, therefore future research will include a larger set of sentences and questions. Problems encountered during this testing will be the focus of further research. Metamodel needs improvements and further development for other word types: pronouns, conjunctions, adverbs, etc. Future work on the second prototype will focus on improvements of algorithms and implementation of additional rules to receive higher percentage of correct answers. Algorithms of the QA prototype need improvement and upgrade as well. Further work will include analysis of more complex sentences and general improvements of the conceptual framework "Node of Knowledge (NOK)". One of the next steps is to expand the research to other languages.