NODE IMPORTANCE MEASURE FOR SCIENTIFIC RESEARCH COLLABORATION FROM HYPERNETWORK PERSPECTIVE

Original scientific paper Collaboration has become main stream and trend in interdisciplinary fields. In research collaboration organizations, to evaluate the contributions of researchers to the organization and then to identify core researchers is an important issue to carry out performance appraisal and crisis management of brain drain. Scientific research collaboration network is a basic model to investigate this question, but under the context of increasingly complex collaborative behaviour, it shows its limitations for semantic representations. In this paper, by introducing hypernetwork, a more powerful modelling tool than traditional network, and taking scientific paper co-authorship as object to construct scientific research collaboration hypernetwork (SRCH), we measure the importance of researchers in two aspects, as collaborative relationship structure and collaborative achievement value from a hypernetwork perspective. An additive weighting method with adjustable parameters is utilized to integrate the evaluation indicators of the two aspects, and then the synthetical importance evaluation of researchers is obtained. Analysis of data instance verifies that our node importance measure for scientific research collaboration from hypernetwork perspective is reasonable and effective.


Introduction
Under the support of information technology, the rapid development in interdisciplinary fields makes deep collaboration necessary and possible, while collaboration has been becoming the mainstream ways for scientific researches.In scientific research organizations, core researchers are those who own key competences and are indispensable for the organization to implement innovation strategy or to build scientific research alliance.To evaluate and identify these talents is a crucial issue for organizations to implement performance appraisal and management of knowledge employee, and it is also a key problem to be solved for the crisis of brain drain.
Recently, with growing maturity of theories, methodologies and technologies of network analysis, to investigate big data by network has become a hotspot in many disciplines [2].The collaboration relations between scientific research organizations or researchers undisputedly form a complex network, which is composed of knowledge, researchers, achievements, carriers and organizations themselves.Consequently, in the complex network of scientific research collaboration, we can take advantage of network analysis methods to describe the relations of paper citations, researchers' co-authorships, and the influence of each other in researcher network [5].
In the context of complex collaboration behaviour, in this paper we try to introduce hypernetwork, a tool with more powerful modelling capability than ordinary network, and evaluate the importance of researchers in scientific research organizations in two aspects of collaborative relationship structure and collaborative achievement value from a hypernetwork perspective.Concretely, taking scientific research paper co-authorship as object to construct scientific research collaboration hypernetwork, we measure the node importance by weight information of nodes and hyperedges, i.e., quantificationally analysing and evaluating the academic standings, academic contributions, scientific research collaborative betweenness, and output competences of all authors integratedly, which finally provide evidences for the identification and evaluation of core talents in research organizations.

Related works and review 2.1 Scientific research collaboration network
Scientific research collaboration network is a kind of social network particularly for describing the collaborative relationships of research participants (researchers or research organizations).In scientific research collaboration network, a research participant is usually represented as a node, if there exists collaborative relation between two different participants, then a line is drawn connecting the corresponding two nodes.Thus, a network with complex structure and dynamic evolution characteristics is formed.
Reviewing literature of scientific research collaboration network on different aggregation level, there are researches on collaborations in different fields on personal level [14], and researches on organizational level, which investigate collaboration inside an organization [15] or cross-organizations [9].In addition, according to research focus, overall network and individual network researches are included.Overall network mainly concerns the whole topological structure of knowledge or social network formed by researchers and the impact of its evolution on knowledge forming, spreading, and recombination both inside and cross organizations.For example, Yan [13] used data from 18 core source library and information science (LIS) journals in China covering 6 years to identify the collaboration pattern and network structure of the co-authorship network of LIS in China.Individual network based primarily on the fact that "people tend to discover information, attain knowledge and methods of solving problems relying on their own social network", intensively investigates the centrality and tendency of individuals in network.Thus Singh [9] examined how collaboration across national, organizational or institutional boundaries contributed to knowledge creation, and discovered that external collaboration did significantly improve future publication productivity of the collaborating scientists.
Currently, there appear more and more theories and techniques applying on the modelling and analysis of scientific research collaboration network, e.g.complex network [1], cluster analysis [12] and Web mining etc.

Node importance measure in network
In a network, to evaluate the importance of nodes and identify the key ones has deserved the research in fields of network analysis and system sciences [17].In recent years, the blossom of empirical study in network, especially the discovery of "small world" and "scale-free", makes it reasonable to evaluate the importance of network nodes on the basis of their dissimilarity.The method of measuring node importance in network arisen at present within different application background can be classified into two types as follows. (

1) Social network analysis (SNA) methods
Major methods for measuring node importance based on SNA are under the assumption that the importance of a node is equivalent to its notability when connecting with other nodes (Knoke and Burt, 1983).These methods generally study the metrics while maintaining the integrity of network, i.e., do not break the connectivity.The basic idea is to find out some useful attributes (such as quantity of information contained in degree or shortest path) to highlight diversity between nodes, in other words, to adequately display the positional traits of nodes in a network, and to "magnify" the significance to define the importance.Proposed metrics based on SNA include two categories as "centrality" and "prestige", while indicators include degree, closeness, betweenness, eigenvector, cumulated nomination etc.Many representative works have been done with SNA based methods by researchers (Bonacich, 1987;Altmann, 1993;Poulin, Boily, and Masse, 2000). (

2) Node deleting methods
It is another thinking to evaluate the importance of a node in network by measuring the destructiveness to the network performance when removing it.That is "destructiveness equals importance".The greater the extent of severity when we remove a node is, the more important it is.Because the maintenance of network connectivity or system function depends on its existence.Thus Xi [16] took the inverse of distance between node pair as weight value, and then calculated the weighted sum of all disconnected node pairs to measure the destructiveness to the network connectivity.This work made a quantized contribution to the "removing based" thinking.In general, it is necessary for node deleting methods to settle the problems of topological changings, even the segmentation of the whole network caused by the nodes removed.
From above we see that single measuring indicator will depend seriously on a single side feature of network, thus ignore the significance of other indicators.Therefore in practical applications, multiple methods and indicators are adopted to measure the importance of nodes in one network corporately.
(3) Analysis of node importance measure in SRCH As a typical kind of network, SRCH and its node importance measuring have attracted much attention of researchers.Newman [6] studied a variety of nonlocal statistics for these networks such as typical distances between scientists through the network, and measures of centrality such as closeness and betweenness, and found that typical distances between pairs of authors through the networks were small-the networks form a "small world" in the sense discussed by Milgram.Tang [11] developed a data mining tool for scientific literature called ArnetMiner, in which a function of discovering important authors was provided.ArnetMiner mainly took the times of collaboration between authors into account.
A lot of achievements have been emerging about SRCH and its node importance measuring, however, there are still some problems that need to be resolved.a) The representation of network is based on the classical graph theory, which is incompetent to express the characteristic of multi element, multi-layer or multi-granularity.Moreover, most of researches consider connected network, many measuring methods cannot support unconnected network with effect.b) Each single indicator focuses on one particular feature of network, e.g. the position of node in network structure, the influence and control capability of node to information spreading, or the contribution of node in substructure.Specifically, degree only emphasizes the number of edges connected with adjacent nodes, which ignores the indirect effects from them.Betweenness is used to describe the informational control capability of a node to the connection between other nodes, but it is unable to mediate its partial contribution.Removing based methods are difficult for measuring the node importance when the influence of removing a node to the network performance is too high or too low.c) It is requisite to traverse the whole network for the node importance measuring based on overall perspective.For instance, the calculation of betweenness is feasible in a small network, but many networks in reality are of substantial scales and complex structures, which makes it impossible to measure node importance from overall perspective.

Hypergraph and hyper network
As analysed above, traditional network based on classical graph theory has been increasingly manifesting its modelling limitations with the increase in network complexity.The emerging of hypergraph and hypernetwork/supernetwork, provides a new research view for different kinds of networks and their relationships.The concept of hypernetwork was first proposed by Sheffi [8] in 1985, which is defined as "a network beyond existing network" and "network in network" (Sheffi [8], Nagurney [4]).If a network is represented by hypergraph, it can be deemed as hypernetwork.Hypergraph and hypernetwork take the advantage of representing the characteristics of nesting, multi-layer, multi-level, and multi-attribute that are beyond the capability of ordinary graph and network.Taking SRCH to be researched as an example, an author is described as a node, and a paper written by two or more than two authors can be represented as one hyperedge connecting more than two nodes.Although a hyperedge with many nodes can be transformed to ordinary network that all nodes connected with each other, but in this case it is difficult to distinguish whether it is co-author in pairs or multi-author collaboration.The ordinary network shows its limitation for semantic representations.
As shown in Fig. 1, the case of five authors writing one paper collaboratively can be represented as hypernetwork as Fig. 1b.Fig. 1a shows the case that each two of the five authors collaborated, in which way 10 papers are written totally.In Fig. 1c, each two of the five authors collaborated, and the five authors wrote one paper collaboratively, and what is more, authors 1, 2, and 3 wrote another paper collaboratively.Totally, there are 12 papers in Fig. 1c.If we analyse by degree of nodes, in Fig. 1a and Fig. 1b, the degree of all nodes is 4, but its implication is completely different.Therefore, metric indicators in classical graph cannot be transplanted directly to hypernetwork, which takes more advantage of concise and precise in representation than classical graph as a modelling method.
Actually in hypernetwork representation, hyperedge does not have to describe co-authorship relations, it is also able to depict relations of emotion of trust relations by setting different attributes, through which "homogeny nodes and heterogeny hyperedges" can be achieved.As a research tool, hypernetwork model can be utilized to express the interactions and influences between networks.So far, researches on hypernetwork mainly concentrated on the construction of supply-chain hypernetwork, finance hypernetwork, etc.

Measuring method from hyper network perspective 3.1 Symbol definition
To facilitate the description of our method, we give the definitions that appeared in the measuring model first as bellow: (

1) Definition of hypernetwork
We refer to Berge's basic definition of hypergraph (Berge, 1973) to define hypernetwork as follows: Besides the form of set, hypernetwork has some equivalent forms as closed curves, matrices and bipartite graphs.Among them, matrix form has the features of conciseness, systematicness and adaptability in calculating, so hypernetwork is often handled as matrix in scientific or engineering calculation tasks.
are adjacent in HN; for vertices pair , and E E e l ′ − ∈ ∀ , have , then the adjacency degree of

DM
is the shortest hyperpath from node v i to node v j .

Basic ideas for measuring method
Although numerous indicators and methods are proposed according to different standards to measure which nodes are more important than the others, in SRCH, what should be deemed as the criterion is the scientific research capability, which is usually measured by the quantity and quality of outputs as co-authored papers, books or patents for inventions.Wherein, co-authorship relations of scientific research papers are easy to acquire and of strong explanations (Melin [3]), so it is a powerful tool to adopt social network analysis methods to investigate the phenomenon of scientific research collaborations by co-authorship relations (Newman [5]).
In many researches of scientific research collaboration network, node importance depends on the degree of node.In this way, author with most collaborators is obtained, but it is inadequate to judge the comprehensive influence of him.In the following some typical viewpoints are given.First, node importance in scientific research collaboration network should integrate weight information of nodes and edges into account, where weight of nodes reflects contribution of authors and weight of edges reflects value of papers.Second, core author in a discipline should meet the two conditions: he has published plentiful papers and, he has collaborated with others broadly.
Therefore, when studying the importance of researchers to organization, we should consider the extensive degree of collaboration with others, i.e. collaborative relationship structure and collaborative achievement value.

Measuring methodology (1) Measure in collaborative relationship structure
In research of (Xiao [17]), we discovered that measuring methods based on classical graph are inapplicable to hypernetwork.Besides, centrality will lose efficacy in unconnected network, so it is necessary to reconsider the problem of measuring collaborative relations under hypernetwork model.Xi and Tang [16] proposed the concept of multiplex multi-kernel network, and took over the idea of "destructiveness is equivalent to importance" in the measuring of node(s) importance.The formal description of multi-kernel network is hypernetwork.So we can introduce the destructiveness idea to measuring node importance in hypernetwork.
For an undirected connected node-weighted network, if a node v i is deleted, the connectivity of network may be broken in two aspects: firstly, nodes originally connected with v i will be disconnected, and their weight cannot spread to other nodes, which will result in Direct Loss (DL); secondly, paths between some of the remainder nodes may be interrupted because of the loss of function as bridge of deleted node v i , so these nodes cannot exchange or share weights with each other, resulting in Indirect Loss (IL).
Therefore, the importance of node v i in hypernetwork HN is the Total Loss (TL) to HN of deleting it, and calculated as: is the decay coefficient of collaborative behaviour of v i and v j , which is defined to be inversely proportional to their distance, because in hypernetwork the effect of farther nodes will be smaller; β is the information quantity between v i and v j .In this paper, we take as the times of collaboration (i.e.adjacency degree) between them, and .
By Adjacency Matrix and Distance Matrix representation, the collaborative relationship structure importance of v i can be calculated as: .) (

) Measure in collaborative achievement value
Dimension of collaborative achievement value reflects the quantity and quality of outputs by a researcher.On the basis of measuring each paper's academic value and investigating the contribution of each collaborator in it, we accumulate the contributions in all papers group by each author, thus the collaborative achievement value of everyone is obtained.
Generally, the value of a scientific research paper can be evaluated by factors as index source, time cited, impact factor of journal and fund project level, etc. Considering these factors synthetically, we define the value of a scientific paper p as: γ is the number of cited times of the paper; y is the fixed number of years the paper published.As the cited times will increase with the fixed number of years the paper published, here is the average cited times per year.The formula above demonstrates the impact of index source, time cited, impact factor of journal and fund project level on collaborative achievement value, and the expansion effect of time cited.In a scientific paper, authors' contribution is different.To describe the contribution proportion in papers with different author numbers, we define an author contribution proportion matrix denoted by A. A is a lower triangular matrix, where each row represents situations of different author numbers.Element a ij is the contribution of author j in total i authors.For each situation i, s. t.
1 .If paper p totally has n p authors, then the contribution of author j in the paper is If author j published s papers altogether, then his collaborative achievement value measures as: . 10 ] 1

) Comprehensive evaluation for multiple indicators
Through the analysis of merits and demerits of existing evaluating methods, we know that single indicator is inadequate to reflect the influence of a node to the whole network, so multiple indicators are adopted in our research.However, another problem will be faced, that is how to integrate these indicators to make effective evaluation.Multiplication operator is able to integrate two indicators, but it is easy to amplify the influence of single one and cause the evaluation results unreasonable.So we adapt addition operator and "additive weighting" method to get the synthetical importance evaluation, which is able to overcome the drawbacks of single indicator and multiplication operator in kinds of applications.Furthermore, a measuring method of node importance in SRCH with adjustable parameters is proposed.Denote C(v i ) is the importance of node v i in SRCH, which is calculated as: ), ( ) ( ) where, α and β are adjustable parameters; TL(v i ) is the importance of collaborative relationship structure of node v i and Val(v i ) is the importance of collaborative achievement value of v i ; α + β = 1.The greatest advantage of this method is that the parameters in the formula can be adjusted whenever necessary according to the actual network application background.
To facilitate the comparison between different networks and eliminate the influence of network scale to the indicators, C(v i ) needs to be normalized, i.e. making the values distributed in interval [0, 1] as: .) ( max

Instance and analysis 4.1 Data acquisitions
Using the measuring method above, we can construct SRCH of specified organization, subject area, or academic journal (group), then decide on the importance of elements in the data set.Following, we take our working unit, Jiangxi University of Finance and Economics (JXUFE) as an instance, to illustrate the application process, results and analysis of the proposed measuring method.We write a web spider program in Java, crawling the paper information of the last five complete years (2009÷2013), in which the affiliation of author is JXUFE, from scientific research databases, including Wanfang Data, Engineering Village, and Science Direct.Altogether we acquired information of 10.211 papers written by 5261 authors.

Network characteristic
Through analysis we discover that the SRCH we investigated is non-connected, which consists of independent sub-networks (See Tab. 1).
In Tab. 1, 2177 among 5261 authors never collaborated with others within the recent five years, and 3167 authors are in relatively small research circles (NoSn.<.4), which may be caused by their research subjects.In the meantime two large networks include 415 measuring method is of preferable explanatory and fault-tolerant capability.d) From a macro point of view, there are 1,51 authors in each paper on average, i.e. most papers are accomplished by one or two authors.The average node degree is 1,62, which shows that each author collaborates with 1÷2 partners on average.Among all papers, 70,8 % have only one author, 20,8 % have two authors, only 8,4 % have three or more than three authors, which indicates that the collaborative level is low, and the scientific research management department should formulate policies to encourage collaborations.
Through contrastive experiments of different node importance measuring methods, the method with adjustable parameters for SRCH proposed in this work is capable of highlighting the dissimilarity between nodes meticulously, and is able to indicate the influences of nodes to the whole network objectively.By adjusting the parameters, this method can be more practical in a wider range, and it can make more guidance value in different network application backgrounds.

Conclusions
To identify key nodes in scientific research collaboration network is an important work for scientific research organizations.In view of the limitations of traditional modelling method for scientific research collaboration network, we present a modelling method of hypernetwork, and on the basis of which we investigate the problem of node importance measuring in SRCH in view of hypernetwork.This measuring method with adjustable parameters has the following advantages.
(1) It is applicable for the evaluation of collaboration within the same organization, or authors of different organizations in the same subject area.(2) The parameters in the addictive weighting method can be adjusted according to different network application scopes, making it flexible and practical.For example, in early scientific research developing and accumulating period, quantity and quality of outputs should be much accounted for.After some achievements have been made, the structural relations may be concerned.Although we believe our method is correct and effective, there are still some limitations to be studied and improved in future works such as: (1) We take scientific papers as carriers of scientific research collaborative behaviour.In fact, scientific research collaboration is included but not limited to scientific papers.Project cooperation and seminars are also common forms.Additionally, scientific research databases besides Wanfang Data, Engineering Village, and Science Direct can be included as the data source of research.(2) Different values of parameters would have different influences on the measuring result, then how to balance the relationships between parameters in accordance with the kind of network, to get maximum effectiveness of the method, is to be resolved.(3) In principle and methodology, what we consider is undirected connected network.Whether this measuring method can be generalized to weighted network, directed network or unconnected network needs to be investigated in the future.

Figure 1
Figure 1 Hypernetwork representation of scientific paper co-authorships

1 of
the number and nodes and hyperedges of HN respectively.The hypernetwork representation of the three graphs in Figs.1a ÷ 1c is:

( 2 )
Matrices of hypernetwork a) Incidence Matrix (IM)Definition 2: IM of hypernetwork HN is a m × n matrix, where the m rows of IM correspond to the m hyperedges and the n columns correspond to the n nodes n

( 3 )( 4 )
We can assign different values of parameters for different nodes in a SRCH, to express the importance of nodes in different combinational value of parameters.This method takes some typical indicators into consideration synthetically, avoiding the one-sidedness of measuring based on single indicator.

Table 1
Sub-networks distribution D: Degree; NoN: Number of Nodes