Construction of Event Knowledge Graph based on Semantic Analysis

: At present, the research and application of enterprise credit event information mainly takes the data of enterprise credit events as a dimension of enterprise credit evaluation, and lacks in-depth analysis and mining of the content of special events. On the basis of sorting out the connotation of enterprise credit events, the article firstly proposes a model with evolutionary features, network structured features and unstructured features of text data for the knowledge graph of enterprise credit events; then, the events in enterprise credit events are extracted in the form of case study, the named entities and dependency relationships in the text statements are analyzed, and the events with subject-predicate object relationship as the main form are extracted; secondly, the statements are analyzed. Finally, the extracted events and relationships are matched to form a knowledge graph of corporate credit events. The study applies the mapping research method to the field of corporate credit event research, and realizes the process analysis of the evolution of corporate credit events using knowledge graph.


INTRODUCTION
Corporate credit crisis can be considered as a social event which is closely related to the public interests. It appears in response to a special event and spreads rapidly within a certain period of time; being affected by the multiplier effect, it may cause stronger social repercussions on the network. If not resolved in time, credit crisis may quickly turn into a collective emergency with both views and conflicting views [1]. In reality, with the rapid development of Internet technology, especially the popularity of various social platforms, the new media environment is getting more open, convenient and diverse, which promotes and adulterates the wide dissemination of various voices of consensus. And a good consensus environment can provide sustainable and high-quality development of ideological guarantee for enterprises [2]. Although credit event crisis is a new form the enterprises face in network environment, if we can take effective measures, it is possible to turn crisis into opportunity, not only improve the traumatic corporate image, but also reshape a better corporate image. So, it can make the credit event crisis become a driving force of corporate image building and a catalyst of corporate visibility improvement.
By analyzing the evolution process of many corporate credit event cases and summarizing the evolution rule of crisis which occurred in the business process, it can provide decision support for the business management of enterprises; meanwhile, the monitoring system of corporate credit events can also help the society monitor the changes of enterprise operation in time, guarantee health and safety of the network environment, and prevent the deterioration of enterprise operation from causing bad social impact [3,4]. In the background of the network era, it is necessary to establish a mechanism of rapid response to credit events and to improve corporate credit monitoring methods. Enterprises should learn to use data to process financial disclosure information and build a corresponding corporate credit risk response system. When encountering a network crisis, it is needed to establish friendly network relationships, when the enterprise is in a consumer trust crisis, it is needed to establish an effective strategy to repairing trust [5][6][7]. With events, relationships between events, and then representation in an organizational system, an event knowledge base can be constructed and a matter mapping can be formed. Therefore, by sorting out and visualizing the corporate crisis events and events relationships behind specific corporate credit events on the Internet [8,9], using natural language processing technology to semantically analyze the corporate credit event texts, structuring the cause-effect relationships among corporate credit events, and demonstrating the evolution process of corporate credit in the form of visualization, this article aims to summarize the evolution law of corporate credit.
The rest of the paper is structured as follows: constructing a research framework for corporate credit event knowledge graph based on basic theory; extracting the events of financial data based on semantic analysis; causally extracting events based on cause-effect relationships; forming a corporate credit event knowledge graph; and finally, a research summary.

ENTERPRISE CREDIT EVENT METHOD DESIGN 2.1 Knowledge Graph Analysis Framework
A corporate credit event describes the beginning, evolution and outcome of corporate crisis event, which consists of one or more corporate crisis events. If the corporate crisis event continues to develop, it will not only have a negative impact on social reputation and credibility of enterprise, but seriously endanger the survival of the enterprise. Knowledge graph is the description and display of a specific event of an enterprise. Knowledge graph of enterprise credit events refers to the news of enterprise credit events occurring within a period of time, extracting the events and relationships from them, mining the veins of the development of credit-related events and displaying the data.
The construction of corporate credit event knowledge graph mainly includes two parts: event extraction and extraction of event relationship. Event extraction mainly refers to extracting the behavior of enterprises in corporate credit events. The enterprise credit events mainly come from different channels and belong to text type data, which needs mining technology to turn the unstructured data into structured data. Therefore, this paper adopts dependency parsing analysis and named entity recognition technology to extract topics and events with subject-predicate relationship respectively. Relationship extraction mainly refers to extracting cause-effect relationships in corporate credit events by using cue words and different cause-effect patterns.
Because causal relationships may consist of several events, the extracted events need to match with relationships to form a knowledge graph based on public opinion, which is clarified in terms of connotation and extension, as well as the interaction with other public opinions, online and offline public opinions, print media, airwave media (Fig. 1).

Figure 1
The construction process of enterprise network public opinion event graph

Data Preparation
This study collects negative credit event information from enterprises which have significant credit event events and have financial operation difficulties, default, break the contract and other major breach of trust in daily operations. The corporate credit event data are collected using python, and the sources of collected credit event information include government websites, news websites, industry websites and other formal news media websites, shielding data sources such as WeChat and Weibo, which are mainly personal and self-published.
[/w ， ofo/nx ， why/ryv ， repeatedly/d ， stuck into/vi ， capital chain/nz，crisis/n，rumors that/n，?/w，Dai Wei/nr，still/d，can/v ， "/w ， arbitrary/a ， "/w ， how long/ryt ， ?/w ， ]/w ， since/p ， ofo/nx，refuse/v，DiDi/q，of/ude1，acquisition/v，offer/v，then/f ，，/w，about/vn，ofo/nx，of/ude1，negative/b，news/n，has/d ， time to Most of the words obtained in the above-mentioned word separation stage are useless and have little impact on the analysis. Therefore, in this paper, these words are defined as deactivated words, including orientation words, prepositions, quantifiers, auxiliary words, punctuation, nonsemantic words and tone words, such as "of, ah, this, if, that, so", relying on the existing deactivation thesaurus of Harbin Institute of Technology, the deactivation thesaurus of Sichuan University Machine Learning Intelligence laboratory, and the deactivation word list of Baidu, and other deactivation thesaurus. This paper integrates the words into a more comprehensive word list, with 1598 deactivated words.

Text De-Duplication
Since these media platforms often reproduce the same articles, there are many duplicate contents in the obtained financial text information, which will affect the accuracy of the text analysis and cause large deviations in the analysis results, such as the frequency of "fake" to improve the occurrence of core words. Therefore, de-duplication is required before text analysis. The process of text deduplication includes: generating a list of subwords; building a dictionary based on the text set to obtain the number of features; building a corpus based on the dictionary to generate a sparse vector of search words; training a TF-IDF model with corpus to calculate the similarity between texts (Fig. 2), and removing the texts with high similarity. -Building a dictionary based on the text set and obtaining the number of featuresu; -Building a corpus based on the dictionary and generating a sparse vector of search terms; -Training TF-IDF models with the corpus; -Calculate the similarity between texts and remove the similarity between texts and remove the texts with high similarity.
Taking the financial text data of OFO as an example: "Miss that old forest [original microblogging June 11 news, recently a media report said ofo currently owes 1.5 billion yuan, deposit." The balance is only about 3.5 billion yuan, with less than 500 million yuan of cash available on the books. It is understood that ofo founder and CEO Dai Wei has publicly stated that ofo already has 200 million users. ofo started with a deposit of 99 yuan, and in June 2017 increased the deposit for new users to 199 yuan. If we follow ofo's official claim of 200 million users, each user deposit at 99 yuan before the price increase and the cumulative free deposit of nearly 3 billion." " The article can obtain the number of de-duplicated texts by generating subwords, building dictionaries, forming sparse vectors and calculating the similarity of texts (Tab. 1).

EVENT EXTRACTION BASED ON SEMANTIC ANALYSIS 3.1 Text Dependency Analysis
This paper adopts the method of dependency syntactic analysis to select Chinese candidate terms. The main steps are: firstly, constructing a dependency tree by syntactically analyzing the collected text sets of corporate credit events; secondly, pruning the dependency tree to remove the nonconforming dependencies and generate a dependency subtree; finally, selecting the subject-verb and verb-object relations with the named entity as the core, and obtain the subject-verb and verb-object pairs.
Based on this theory of syntactic parsing, this paper represents the dependency tree as T = (V, A, R): Here V denotes the set of nodes, specifically the words in the utterance; A is the set of directed arcs, which refers to the inter-word dependency relations, where the starting point of the arc is the dominant word of the dependency relation and the ending point of the arc is the dominated word of the dependency relation; R is the root node of the dependency tree, which refers to the core verb of the word, and T meets the following three conditions: (1) The entry degree of the R node is 0. (2) the entry degree of nodes other than R is 1; (3) there is a directed path from R to any node. The dependency relations analysis results of the statement are obtained by performing dependency analysis on the example sentence (Fig. 3).

Triadic Event Extraction
This study focuses on identifying the names of people and organizations, and the named entities were identified for the example sentences in the previous section to obtain the results shown below. From this sentence, the name of an organization "ofo" and the name of a person "Zhang Yanqi" can be identified. In this paper, the word segmentation is also modified in a certain way in syntactic annotation. The process is to use the established entity dictionary to syntactically annotate the whole. This approach not only allows the syntactic tree structure to be simplified, but also allows the next obtained triad of dependencies, with some complete semantic information.

Named entity recognition results
Suppose that the document,   represents the set of all edges in the graph. This paper is a description of the features in the sentence using the relational structured triad of the two words connected by an edge in the graph corresponding to this edge. That is, if an edge in the graph is eij, the eij two ends of the word wi, wj, the edge eij indicates the dependency relationship as rij, here, wi denoted by the dominant word, wj denoted by the dependent word, the ternary relationship can be described as (wi, wj, rij). According to the definition of the ternary dependency group, forming the example sentence dependency display, a total of 15 ternary relationship groups can be constructed. From these, the better dependency triads are selected, and the following principles are used in this paper: 1) select triads with strong dependency relations; 2) give priority to dependency triads containing entity words. In this paper, we focus only on the relationships between verbs and nouns in sentences, and use these dependency pairs to mine user intentions. A total of subject-verb-object relationship events exist in the financial text data, and some of the results formed are shown in Tab. 2.

RULE-BASED CAUSALITY EXTRACTION
The extraction of cause and effect relationships mainly consists of the extraction of cause and effect clauses as well as event tuples from the clauses. The cause and effect clauses are extracted according to syntactic relationships and matching rules, and the two types of financial text data are stored in a structured manner. The extraction of causal relations consists of two tasks: 1) the identification of causal clauses and 2) the extraction of causal relations. This paper focuses on summarizing the forms of causal relations corresponding to each type of cue word by using causal cue words, including conjunctions, verbs, prepositions and adverbs, based on lexical and grammatical features.
Let W be a collection of words, then the collection S = {w 1 , w 2 , …, w n } can be represented as a sentence S, and S after lexical annotation the collection S' = {w 1 /d 1 , w 2 /d 2 , …, w i /d i w n /d n } can be obtained, where w i is the lexical nature of d i , w i  W, i = 1, 2, …, n. According to the causal syntactic model, the rules for judging causal sentences are clarified (Tab. 3). In this chapter, four steps are designed to carry out the extraction of event tuples: the first step performs lexical filtering. The words closest to the position of the causal cue word and of verbal nature are taken as the trigger words; the second step performs dependent syntactic analysis. The main task is the subject and the object corresponding to the trigger word; the third step is component filtering. The related components of the trigger word, subject and object are extracted respectively; the fourth step determines the event tuple representation. The format {subject and its related components, trigger word and its related components, object and its related components} is used. The causal extraction rules are transformed into regular expressions for causal extraction, which ultimately leads to Tab. 5 by pattern matching methods. Cause: problems with ofo's operations Tag: Causes Effect: It needs to give up some of the things, ofo has chosen to give up overseas markets, in July has withdrawn from overseas small cities, Australia, Germany and other cities will be difficult to see the small yellow car shadow for a while RULE 3 Cause: failed to get its investment in OFO as hoped for with limited funds Tag: Because Effect: Ali's stake was intended to gain more voting rights but failed to do so

RULE 4
Cause: There is a limited amount of bicycle sharing in each city. Tag: because -so Effect: After Ali harvests the remaining assets of ofo, then these shares will be Ali's as well.

RULE 5
Cause: Drip has not launched a formal takeover bid Tag: Why -Because Effect: Rejected Drip's offer

ENTERPRISE CREDIT EVENT KNOWLEDGE GRAPH GENERATION
Event knowledge mapping focuses on dynamic events and the cascading, temporal and causal relationships among them, and represents them in the form of structured graphs for more efficient management of massive data. In particular, the mining of dynamic event information and logical event relationships. Corporate credit event group {e 1 , e 2 , …,e i , …, e n }, e i represents the i event extracted from the financial text of this enterprise credit event, the causal event group of enterprise credit event {[C 1 , represents the causal event in the enterprise credit event, C i represents the cause event and E i represents the effect event. Determine if e i is in C i or E i . If e i  C i , then e i is a causal event; if, e i  E i then e i is a fruitful event; if e i  C i and e i  E i , e i is a parallel event. Construct causal event pairs by judging the category to which the group of events belongs. e.g. if e i = a  C i , e i = a is a causal event; e i = b  E i , e i = b is a fruitful event, then consider e i = a and e i = b to be a pair of causal event pairs and record as (e i = a , e i = b ). The above example is a one-cause-one-effect event pair. There are also cases of one cause and many effects, multiple causes and effects, and multiple causes and effects, respectively, expressed as (1) One cause, many effects: (e i = a , (e i = b , e i = c )). The events in the case were first extracted to obtain the event (Tab. 6). A total of 16 events were extracted, of which 13 were subject-verb-object events, 2 were subject-verb events, and 1 was a stative-verb-object event. The cause-effect relations of the cases were then extracted to obtain the cause-effect events of the cases (Tab. 7).
The event pairs were matched with the cause and effect in the causal relationship to obtain the causal event pairs for the disclosed financial data, as shown in Tab. 8. cause although it did drag down many other bikes, but also caused ofo and mobi's own internal conflict, although in the market to take the initiative, but the later operation is still the key to burn money tag therefore effect has led to many capital businessmen choosing to withdraw their shares.
5 cause All sorts of negative news follows, ofo touches bottom line with deposit tag The effect of the failure to refund the deposit has caused a lot of criticism from users. By extracting events and causality matching for ofo credit events, a total of 22 valid causal event pairs were obtained, and the causal pairs were put in temporal order to generate the knowledge graph shown in Tab. 9. Corporate credit event evolution analysis is to build a corporate credit event knowledge map to show the whole process of corporate crisis event development according to the chronology of events and the cause-and-effect relationship between events. Taking ofo as an example, from the knowledge map of ofo's corporate credit events, it can be analyzed that ofo's crisis events evolved through three stages as follows.
(1) From March 2017 to June 2017, due to the fierce competition in the bike-sharing market, ofo and Mobay, which are the industry giants, began to engage in a price game in order to compete for users. In order to expand its market scope, ofo increased the number of bicycles by controlling the cost of bicycles, resulting in a poor riding experience for users and a decline in word-of-mouth. At the same time, limited management capacity resulted in the occurrence of corrupt practices among employees within the company.
(2) From July 2017 to April 2018, due to the preliminary price game and market expansion, it caused a shortage of funds for ofo and a large-scale financing was conducted to replenish the funds. Drip entered into ofo as an investor, and a fight for control with Dai Wei, the founder of ofo, occurred. On the one hand, Dai Wei rejected the acquisition by Drip, and on the other hand, Drip also blocked other financing for ofo through its veto, further contributing to ofo's funding shortage.
(3) From May 2018 to March 2019, ofo's funding chain broke, leading to a series of problems, including the misappropriation of user deposits to repay suppliers, layoffs, senior management departures and business shrinkage, and the founder, Dai Wei, was also subject to complaints over unpaid debts and was eventually listed as a "rogue".
It can also be seen from Tab. 8 that some of the causal events are not in chronological order, as some of the crisis events are hidden and not immediately revealed, it appears first in the credit events and in the subsequent credit events through media digging.

CONCLUSIONS
There is a certain degree of unpredictability and uncertainty in corporate crises, and as the economic environment is constantly updated, enterprises will continue to face new risks and challenges, and at the same time, new samples of corporate crises are emerging. In the context of the Internet, negative credit events have become an essential source of information for identifying abnormal business conditions and play an important role in the early warning process of corporate crises. However, at present, China's corporate credit management is not perfect, and some enterprises, in order to evade policy restrictions in order to achieve such purposes as illegal operation, capital expansion and stock price manipulation, enterprises usually adopt a cover-up or misleading approach to relevant negative news in order to avoid the management and the public from understanding and being informed of the real business situation of the enterprise.
The article collects enterprise credit event news data through crawlers and performs text pre-processing such as word separation and de-duplication on the disclosed financial texts to form a corpus for text analysis; secondly, it performs text dependency analysis and named entity recognition through semantic analysis techniques to extract the event triad in the disclosed financial text data; then it constructs event causality rules to extract the causality relationships in the financial texts; finally, it matches the events matched with the causal relationships to generate a knowledge graph of corporate credit events, and the evolution process of corporate credit events is analyzed and displayed through the graph.
There is quite a lot of information that will not be spread on the network. In the future, we will collect more complete data, such as WeChat microblogs, to make the model and results more scientific.