A GRAPH-BASED ALGORITHM FOR INTERPERSONAL TIES CLUSTERING IN SIGNED NETWORKS

Social ties are formed as a result of interactions and individual preferences of the people in a social network. There are two opposite types which are interpreted as friendship vs. enmity or trust vs. distrust between people. The aforementioned social network structure can be represented by a signed graph, where people are the graph’s vertices and their interactions are graph’s edges. The edges can be positive and negative signs. To determine trustworthiness, this paper considers the problem of a signed graph partitioning with minimizing the sum of the negative edge's weight and balanced size of its clusters. An efficient algorithm to solve such a problem is proposed. The experimental results show that the proposed algorithm outperforms in terms of the execution times and the accuracy within the given bounds.


INTRODUCTION
Recently, the social internet of things (SIoT) has been proposed and is the subject of a rapidly increasing research effort [1], [2] and [3]. The structure of SIoT has been formulated in [4] based on the notion of social relationships among the IoT objects. The aforementioned relationships can be represented by a singed graph where the vertices are the IoT objects and their interactions are the edges. The problem of service selection was discussed in [5] based on social influence. Given the set of IoT nodes providing the needed service, every node, after interaction with another node, gives a rating to the node based on the service it provides. All nodes interact among themselves based on a social network structure. Each object has to be provided with the best and most trusted reliable node.
In a social network, social ties are formed as a result of interactions and individual preferences of the people in a social network. One person may change to another in the causes of behavior related to reliability. There is a level of trustworthiness in a social network that can be determined by the PageRank [6][7][8]. Thus, the level of trustworthiness is considered from the values of PageRank incoming links. For real-world scenarios, there are two opposite types, i.e. there is trust versus distrust between the relationships. Thus, the edges should be the positive and negative signs. Fig. 1 shows an example of a signed graph that represents the social relationships among IoT objects in a real-world scenario. In this figure, trust is represented by a straight line, while distrust is represented by a dotted line. The level of trustworthiness is represented by the edge's weight. If the edge's weight is small, there is not enough trustworthiness.
On the other hand, if the edge's weight is large, there are high levels of trustworthiness. The number of IoT devices and services is steadily increasing, and given the connectivity and reliability of a large signed chart of centralization systems, it is not efficient. A distributed system is a viable choice. In [5], a large-scale service selection problem was considered. Based on such a problem, the large signed graph has to be partitioned into smaller clusters for reducing execution times. The set of clusters is called the partition of the signed graph. For each input graph, there is a number of possible partitions. This paper considers an optimization problem to find approximately the same cluster size that the sum of the weight of the negative external-edge's is minimized while the sum of the positive external-edges is less than the given constant. A graph-based algorithm for interpersonal ties clustering in signed networks is proposed to solve the aforementioned problem. The example of the optimal partition is showed in Fig. 1. Let integers k = 3 and ω = 5 and G = (V, E) be a signed graph. The signed graph is partitioned into its minimal partition P c = {G 1 , G 2 , G 3 } where the sum of negative external edge's weight is −21, as a minimized edge's weight, and the sum of positive external edge's weight is 0 (less than a given constant ω).

PROBLEM DEFINITION INTRODUCTION
In this section, the notations and problem statement are defined as follows.
Definition 1 (Signed Graph): Let G = (V, E) be a signed graph which consists of n vertices. The vertices' set V = {v 1 , v 2 , v 3 , v 4 ,..., v n }, and the edges' Definition 3 (Partition): Given G = (V, E), a set of its clusters P c = {G 1 , G 2 , G 3 , G 4 ,..., G c } is called a partition of G that c is the number of clusters in the partition.
Definition 4 (Internal Edge): Given G = (V, E), and let P c be its partition. For each cluster G i in P c , a pair (v i , v j , w ij ) ∈ E c is called an internal-edge. There are two types of an internal-edge, i.e., a negative internal-edge and a positive internal-edge. A set of a positive internal-edge is is called an external-edge. Definition 6 (Cut): Given G = (V, E), and let P c be its partition. For each cluster G i in P c , a set of the externaledges is called a cut C which consists of a set of positive external-edges This paper considers an optimization problem to find k a roughly equal size of the clusters that the sum of negative external-edge's weight is minimized, while the sum of positive external-edges is less than the given constant. For the problem instance, it is given G = (V, E) and positive integers k and ω. The problem returns P c which consists of k roughly equal size of the clusters, meanwhile, the sum of the edge's weight w ij in C − is minimized, and the sum of the edge's weight w ij in C + is less than ω. This problem is the under category NP-hard by reducing to the min-cut max-flow problem.

THE ALGORITHM
In [9] and [10], an algorithm for partitioning a graph into balanced clusters with a minimal size of cuts was proposed. Based on the aforementioned algorithm, this section presents a heuristic algorithm to find balanced clusters whose sum of the negative external-edge's weight is minimized, while the sum of positive external-edges is less than a given constant.
Algorithm 1 separates a signed graph G = (V, E) into its partition Pc with a satisfied condition at line 1. The sum of all edge's weights in C + is less than ω. The input graph is recurrently separating itself into a smaller cluster G c until the size of P c is k (line 9 to line 13).
In Line 9, the set of vertices of the input graph are separated into two subsets, X ∈ V and Y ∈ V, with an equal size by using the Algorithm 2. Line 10 and line 11 create clusters from x i and y i . Thus, both clusters are equal to the number of vertices and are contained within a partition of the input graph. This partition is called an exact bisection of the input graph. Since both clusters half-separate themselves at line 12 and line 13, the Algorithm 1 runs in O (|V| log |V|) time.

Algorithm 1 Find the Cluster
Require: A signed graph G = (V, E) and constants k, ω.
return P c 8: else 9: find the cut-set Algorithm 2 is formulated based on the well-known heuristic algorithm which is called Kernighan-Lin [11]. The set of vertices V is separated into two disjoint subsets X and Y of equal or nearly equal size which minimizes the cost of crossing edges (the sum of the edge's weight). However, the Kernighan-Lin algorithm is limited with the negative edges of the signed graph.
For determining the cost, let CI be the internal cost of x ∈ X, which is the sum of the edge's weight w x between x and other nodes in X, i.e., 2 ( ) .
Let C E be the external cost of x ∈ X which is the sum of the edge's weight w x,y between the nodes x ∈ X and y ∈ Y, i.e., 2 , , ( ) . w x y x y X w ∈ Σ Therefore, the difference of the internal cost and the external cost is formulated as follows: Moreover, the cost for an interchange between x and y is , , 2 x y x y x y The term C x,y is the cost of the possible edge between x and y.
The algorithm tries to find an optimal series of interchange operations between the elements, minimizing the sum of the external edge's weight between x and y, which maximizes g x,y and then an optimal bisection of the input graph will be produced when the algorithm is finished. Algorithm 2 runs in O |V | log |V | time.
Algorithm 2 Find the max cut Require: Given G = (V, E). Ensure: The max-cut of G. 1: Randomly half-separates the set V into sets X and Y 2: while g max ≤ 0 do 3: Compute D values for all x i ∈ X and y i ∈ Y 4: Let g, X * i and Y * i be empty lists 5: Let k ← |V |/2 6: for i ← 1 to k do 7: Find cost for the interchange between x i and y i by Eq.   Fig. 2 shows an example of the clusters of the input graph by using the Algorithm 2. Let E = 5, the input graph is separated into its partition which consists of the cluster G i and the cluster G j of equal sizes. Based on the heuristic approach, there are three positive external edges whose sum of the edge's weight is four. This partition can be acceptable because the sum of all edge's weight in C + is less than five.

EXPERIMENT EVALUATION
This section evaluates the performances, i.e., the partition's execution times and the accuracy. The experimental results show the performances of the proposed algorithm and MINOS solver by measuring the execution times and the accuracy to the partition. Moreover, the performances of the proposed algorithm on a real-world large-scale signed social network are evaluated.
The experiments use the datasets from Slashdot [12], a signed social network that contains friend/foe links between the Slashdot's users. This dataset was obtained in November   Tab. 1 shows the execution times of the proposed algorithm and MINOS solver. If the graph sizes are increased, the execution times of MINOS solver will rapidly increase. On the other hand, the execution times of the proposed algorithm are less than one minute. For example, at the graph size of 200 vertices, the execution time of the MINOS solver is 245.43 minutes, while the execution time of the proposed algorithm is only 0.69 minute.
Tab. 2 shows the sizes of cuts from the partitions using the proposed algorithm and MINOS solver. The sizes of cuts from the partitions using the proposed algorithm are always higher. However, these solutions are ab acceptable region in the application for a social network analysis.
Tab. 1 and Tab. 2 present trade-off between the execution times and the accuracy. Since the MINOS optimizer is limited with solving very large-scale problems, the proposed algorithm is a viable choice. However, the execution times of the proposed algorithm had to be traded by its accuracy.
Tab  For measuring the ranking accuracy, the Spearman's rank correlation coefficient [13] has been used for determining the degrees of the ranking's error. It is defined as F(σ 1 , σ 2 ) = ∑|σ 1 (i) − σ 2 (i)|, where σ 1 (i) and σ 2 (i) are the rank of the vertex i in the centralized vertices ranking and distributed vertices ranking, respectively. This value indicates the error from the distributed vertices ranking, higher Spearman's footrule distance indicates higher degrees of the ranking's error.

Figure 3
The ranking accuracy of the proposed algorithm compared with the MINOS solver Fig. 3 shows the ranking accuracy of the proposed algorithm compared with the MINOS solver. The X-axis is the graph size in the ranges from 40 to 200, while the Y-axis is the Spearman's footrule distance. The proposed algorithm can perform faster when compared with the MINOS solver, while the accuracy is still acceptable for the vertices ranking on social networks.  In this figure, if the average degrees are increased, the sizes of cuts will rapidly increase until the average degree is 30. However, the graph sizes gradually increase when the average degree is over 30. Likewise, Fig. 5 shows the impacts of the average degrees on execution times. It can clearly be seen that the execution times of the proposed algorithm are the logarithm of the average degrees of the graph.
Based on the experimental results, the proposed algorithm is very efficient for partitioning a large-scale signed social network.

CONCLUSIONS
This paper considers an optimization problem to find roughly equal size of the clusters that the sum of negative external-edge's weight is minimized, while the sum of the positive external-edges is less than a given constant. A graphbased algorithm for interpersonal ties clustering in signed networks is proposed to solve the aforementioned problem. The experimental results show that the proposed algorithm can perform faster when compared to the MINOS solver, while the accuracy is still acceptable for the vertices ranking on social networks. Moreover, experimental results show that the number of cuts has an upper bound which will be considered as an approximation bound in future works.

Remark
The article was orally presented at the 23 rd International Computer Science and Engineering Conference (ICSEC2019).