SELF-ORGANIZING MAPS WITH SLIDING WINDOW ( SOM + SW )

Original scientific paper SOM is a popular artificial neural network algorithm to perform rational clustering on many different data sets. There is a disadvantage of the SOM that can run on a predefined completed data set. Various problems are encountered on a time-stream data sets when clustering by using standard SOM since the time-stream data sets are generated dependent on time. In this study, the Sliding Window feature is included into standard SOM for clustering timestream data sets. Thus, the combination of SOM and Sliding Window (SOM + SW) gives more accurate results when clustering on time-stream data sets. To prove this, a set of internet usage data from a mobile operator in Turkey is taken to test. The taken data set from the mobile operator is clustered according to the classical SOM then the future data usages of subscribers are estimated. The same data set is applied on the SOM + SW to perform the same simulations.


Introduction
In the last decade, academic or industrial information has been rising at exceptional rates.Parsing new information from gigantic databases is challenging, expensive and time consuming if done routinely.The key objective is to find consistencies and relations in the data, thus gaining access to hidden and potentially suitable data.The Self-Organizing Map (SOM) is a properly famous neural network and certainly one of the most popular unsupervised learning algorithms.Since its invention by Finnish Professor Teuvo Kohonen in the early 1980s, more than 4000 research articles have been published on the algorithm, its conception and uses [1,2].The SOM mapping is preserving, namely the more similar two data samples are in the input space, the closer they will appear together on the final displayed map.This allows the user to identify clusters such as large sets of a specific type of input pattern.
There are many studies improving SOM algorithm to solve a specific problem.In one of these studies, Chaudhary et al. (2014) modified the classical SOM in a way that as well as the farthest and nearest neurons from the winner neuron, the winning frequency of each neuron was taken into account for updating the weight [3].In another study, Ghaseminezhad and Karami (2011) presented a novel SOM-based algorithm for clustering discrete groups of data and they indicated the classic SOM algorithm could not cluster discrete data correctly [4].In some studies, they used SOM algorithm combined with other methods such as recurrent prediction [5] for times series, genetic algorithm [6] for data visualization, Markov Model [7] for biological sequence analysis, and support vector machine [8] for classification of enzymes controlling cell division.In addition, Sliding Window has many usages for Neural Network.In one of these studies Steven F. B. (1979) uses sliding window to calculate windowed speech data for suppression of acoustic noise.In this article the weight of input data in sliding window is used to calculate weight of clusters dynamically [9].In fact, Neural Networks have been widely used as time series forecasters.In one of these studies, Frank, R. J., Neil, D., and Stephen, P. H. ( 2001) attempted to answer the question "can the performance of sliding window feed-forward neural network predictors be optimized using theoretically motivated heuristics".They use ATM network traffic data.They calculate the relationship between datasets and network performance [10].In this study, Sliding Window feature is used for hourly period and calculated neighborhood of clusters.The SOM is considered with Sliding Window that is called 'Sliding Window' (SOM+SW) approach that provides dynamical generated time-stream data clusters.
In this paper, Section 2 of the paper introduces the SOM basics and its working.In Section 3, the possessed different features by using SOM+SW technique are discussed.Section 4 gives an evaluation of the system briefly.Section 5 discusses a case study for application of the SOM and SOM+SW to a time-based Dynamic Quota Calculation System (DQCS).Section 6 presents the retrieved simulation results for different constant ranges while Section 7 is dedicated to the conclusion.

Classical Self-Organizing Maps (SOM)
The basic self-organizing system is a one-or two dimensional array of neurons in the form of neighboring units.The first simulation study related to SOM ordering process was performed by Kohonen [11].SOM is used for many practical applications as a clustering method [12,15].The basic algorithm of SOM neural network is as follows [13,14]: 1) Each node's weights are initialized randomly.
2) A vector is chosen at random from training data and attended to the lattice.3) Every node is tested to calculate which one weights closer to input vector.Best Matching Unit (BMU) is the successive node.4) The radius of the neighborhood of the BMU is calculated.Nodes within the range of radius are defined as to be inside the BMU's neighborhood.5) Each neighboring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector.The closer a node is to the BMU, the more its weights get altered.6) Repeat step 2 for N iterations.
A practical SOM is a two-layer feed-forward Artificial Neural Network (ANN).The input layer consists of the neurons indicating the attributes used for clustering.The output layer stands for the clusters usually arranged in the form of hexagonal or rectangular grid [13].There is a reference vector for each cluster neuron in order to indicate the weights between input neurons and the related cluster neuron.SOM algorithm consists of two parts: Training and Mapping.In training part, an unsupervised learning algorithm combined with a neighborhood function is used to determine the reference vectors.Finally, the input rows are applied to SOM to construct the cluster map.In this study, the reference vector is weight of the cluster.After preparation, dataset is entered to the system and changed the average weights of the clusters.The steps depict the working mechanism of SOM: 1) Start SOM.
2) Clusters are prepared with starting conditions.For these conditions the weight of the clusters is given randomly.The counts of the clusters must be predefined also.In this study, the maximum number of the clusters is 100 (this information is real data that is taken from one of the lead mobile companies in Turkey 1 ).
3) The training data is taken into account.4) The closest cluster is defined.5) The input is added in that defined cluster and average weight is changed.6) This process is repeated continuously until clusters have no significant change.7) If no significant change, the clusters are ready for the real dataset.8) The dataset is taken into account.9) The closest cluster is defined.10) The input is added in that defined cluster and average weight is changed.11) This process is repeated continuously until clusters have no significant change.
In Step 7, the used data set is not a time stream data set since SOM is not suitable on the time-stream data set.The results and added dynamism feature to the SOM are discussed in detail in the next sections.

The proposed Self-Organizing Maps with Sliding Window (SOM+SW)
As it is mentioned, the disadvantage of the SOM is that the system will lose its dynamism in time, namely, the weight of cluster is not affected when one more new weighted data is added as an input to the cluster if former inputs are stored in the same cluster forever).An example: as shown in Fig. 1(a) to 1(c), continuous long term arrivals of incoming data packages will lead to lose dynamism (cluster weighting average value) of clusters according to the classic SOM clusters.Let the average weight of a cluster shown in Fig. 1(a) be approximately 200 KB.In Fig. 1(b), the 2nd data of 100 KB is included to the same cluster that has affected the cluster in 50% weight.But as it is seen in Figure 1(c), there is not any effect on average weight of the same cluster when 10 6 th incoming data with 200 KB weight is included to the same cluster.The cluster still has the same average weight that is 150 KB and loses its dynamism since the cluster's average weight value will always stay fixed.In order to solve this problem, Sliding Window sense has been added into classical SOM neural network (SOM+SW).
According to the extended SOM with Sliding Window logic, the last incoming data is included into the cluster in this situation as well.However, it is excluded from that cluster after a specific time zone (for instance, after 1 hour).A flow diagram of the proposed SOM+SW mechanism is presented below in Fig. 2.After a new input is entered to the system the flow is started.The closeness weight is defined regarding the average weight values of clusters created by SOM+SW mechanism.Therefore, the cluster with the closest average weight value is founded.If the calculated distance weight to the closest cluster's average weight is lower than a range value, then the new data is added into that cluster.In addition, the average weight value of that cluster is updated.If the distance to the closest cluster's average weight value is higher than the range value, then a new cluster is created by using that usage data.After 1 hour, weight of this input is excluded from that cluster and also average weight of the cluster is updated.a) The weight of incoming input is taken into account (depicted in Step 1 of Fig. 2).b) The weight of closest cluster for the input is found (depicted in Step 2 of Fig. 2).c) If the average weight of the found closest cluster is closer than a predefined threshold value (assumed as a constant range i.e. 500) then the input is included to the cluster (depicted in Step 3 of Fig. 2).d) If the average weight value of the found closest cluster is further than the threshold value then a new cluster is created.Therefore, the number of clusters is acquired dynamically in this way (depicted in Step 4 of Fig. 2).e) After all, Sliding Window approach is used in order to gain dynamism to the average weights of each generated clusters.To do this, an input which was included in a cluster over an hour is needed to remove from that cluster.In this way, generated clusters stay always dynamic since the average weights of these clusters are changed according to inputs weight within 1 hour time slots (depicted in Step 5, 6 and 7 of Fig. 2).
Next section depicts obtained better results through SOM + SW as shown in the above clustering algorithm that is performed on different time-stream data sets as for the internet usages of subscribers of a Mobile operator case study.

A case study for Mobile Data Communication Systems (MDCS) through SOM and SOM+SW
In this section, the classical SOM and SOM+SW approaches are compared by simulating the dynamic quota allocations and charging problem for internet users in Mobile Data Communication Systems (MDCS).Therefore, a case study is initiated and various simulations performed on a real data set about internet usages of mobile subscribers.The real data set belongs to one of the leading mobile companies 1 in Turkey.In this case study age, gender, home city, client profile (CRM segment), tariff values of subscribers, and their instant internet usages (in terms of weight, KB) are considered as parameters.Nowadays, in MDCS, a constant quota size is assigned for internet usages to mobile customers without regard whether the subscriber has high or low data usage.In general, the 750.KB quota size is assigned to subscribers by the charging system of MDCS statically.The arrangement of instant dynamic instant quota size only with respect to the subscribers with low data usage causes various performance problems such as heavy control signalization.On the other side, the arrangement of quota size only with respect to the subscribers with high data usage leads to unnecessary quota allocation.Because of these reasons, a dynamic quota allocation method is required to increase performance of MDCSs.Therefore, SOM and SOM+SW approaches are simulated to estimate the future total data usage of the subscribers to perform dynamic quota allocation.
Firstly, classical SOM is evaluated in terms of the amount of past internet usage, age, gender, home city, client profile (CRM segment) and tariff.Then, the mechanism of classical SOM is combined with Sliding Window logic to perform the SOM+SW.After clustering, the subscriber characteristics used to estimate their future data usages are: Age, Tariff, Gender, City, and CRM_SEGMENT (Tab.1).

Procedure A. Simulation of SOM provides the information about the difference between the Real Data Usages (Total_Data_Usage) and the Estimated Data Usage (namely, Difference for SOM):
• Apply SOM to generated clusters according to Total Data Usage of the dataset (Table 1).Here, the clusters are generated according to Real Data Usages (Total Data Usage) of entire users in dataset (Table 1).
• Each user in the data set is evaluated according to similarity in terms of Age, City, CRM Segment, and Tariff parameters to find a cluster which has the max number of the similar users for that user.
• Then, the average weight of the found cluster is assigned as an Estimated Data Usage of each user in dataset (Table 1).Amount of Estimated Data Usage of each user is defined here.
• Finally, absolute Difference of Accuracy on Data Usage (Difference for SOM) for each user in dataset is calculated by subtracting the amount of Real Data Usages (Total_Data_Usage) from amount of Estimated Data Usage of that user.

Procedure B. Simulation of SOM + SW provides the information about the difference between the Real Data Usages (Total_Data_Usage) and the Estimated Data Usage (namely, Difference for SOM+ SW):
• Apply SOM+SW to generate the first cluster for the first arrived user to the system (in Tab. 1).Here, the first cluster is generated according to Real Data Usages (Total_Data_Usage) of that first user (in Table 1).Then, the system generates m-clusters after arriving of n-users up to now.According to the SOM+SW algorithm in Fig. 2, the generated clusters kept the users who had arrived to the system in the last one hour.The clusters are ready to Estimated Data Usage of a new incoming user to the system at the moment.• When a new user has arrived to the system, the system tries to find a cluster according to similarity in terms of Age, City, CRM Segment, and Tariff parameters which has the max number of similar users for the new arrived user.• Then, the average weight of the found cluster is assigned as an Estimated Data Usage of the new arrived user in dataset (Tab.1).Amount of Estimated Data Usage of the new arrived user is defined here.The retrieved results of SOM+SW are found better than SOM results on the same datasets for the same problem above that are depicted thru result graphs in the next section

Evaluations
The complete data set about data usages of subscribers for the Mobile company1 between 01/06/2012 and 30/06/2012 is considered in simulations (a small portion of the entire dataset is depicted in Table 1).The complete dataset is separated into four different equal datasets which are listed below; The estimated results of "SOM (Procedure A)" and "SOM+SW (Procedure B)" for the given above four different datasets are presented in the second and third columns in Tab. 2 (for Dataset 1), Tab. 3 (for Dataset 2), Tab. 4 (for Dataset 3) and Tab. 5 (for Dataset 4) respectively.The calculated data are assembled based on 6 hour periods in the "Time" column of these tables.
The first column depicts the date format that is YYYYMMDDHH (i.e.2012060100 means 01/06/2012: 00 AM).In the second column, the sum of differences between Total Data Usage and Estimated Data Usage of each user according to the SOM is presented.In the third column, the sum of differences between Total Data Usage and Estimated Data Usage of each user according to the SOM+SW is presented.
At the bottom of these tables, the sum of the differences is presented.It can be seen that the difference between the Total Data Usage and the Estimated Data Usage by using the SOM+SW is found lower than the difference of the SOM.It can be seen that the accuracy is increased by 57% for Dataset 2. The following graph in Fig. 4 is obtained from Dataset 2.

Simulation results on Dataset 3
It can be seen that the accuracy is increased by 3% for Dataset 3 by considering the SOM+SW approach while Technical Gazette 24, 6(2017), 1729-1737 estimating future data usage of subscribers.The following graph in Fig. 5 is obtained from the Dataset 3.

Simulation results on Dataset 4
It can be seen that the accuracy is increased by 17% for Dataset 4 by considering the SOM+SW approach while estimating future data usage of subscribers.The following graph in Fig. 6 is obtained from the Dataset 4. Table 5 This table depicts the retrieved results of "Difference for SOM (Procedure A)" and "Difference for SOM+SW (Procedure B)" of the fourth week (Dataset 4).

Time
Difference for SOM (Procedure A) As a result of simulations, entire retrieved results of the "Difference for SOM (Procedure A)" and "Difference for SOM+SW (Procedure B)" are depicted in Tab. 6.The retrieved results are: 26.454.300.902bytes for the "Difference for SOM" and also 19.989.808.275bytes for "Difference for SOM+SW".The results depict that the accuracy of the SOM+SW is by 24.5% better than the SOM result.

Evaluation according to different constant ranges
The number of clusters is acquired dynamically in SOM+SW.In addition, different size of threshold values is considered during the simulations to understand the effects of the constant size on the number of clusters.The other considered threshold values except 500 kB are: 125 kB, 250 kB, 1000 kB and 2000 kB.The threshold values are applied on each above four datasets that are listed below: • Difference of SOM = |amount of Real Data Usageamount of Estimated Data Usage for whole Datasets.Fig. 7 depicts the retrieved results after applying the different threshold values on whole Datasets.While the X axis in Fig. 7 refers to "Time", the Y axis represents "Difference for SOM (Procedure A)" and also calculated the "Difference for SOM+SW (Procedure B)" with these ranges: 125 kB, 250 kB, 1000 kB and 2000 kB.In Fig. 7, the SOM+SW results give more lucrative results than the result of SOM for Datasets.• When threshold value is 500 kB, 24% Better correctness for Datasets.
• When threshold value is 1000 kB, 24% Better correctness for Datasets.
• When threshold value is 2000 kB, 24% Better correctness for Datasets.

Observations and comparative studies
As seen in Tab.7 different threshold values can be used for calculations.These values can be varied.The smaller threshold gives better results than bigger thresholds.
There are several other algorithms which are similar to SOM.For example, k-means is a clustering algorithm which aims to cluster n data into k clusters in which each data belongs to the cluster with the nearest mean [16].In order to dynamically change the number of clusters, Xmeans clustering algorithm has been developed over kmeans [17].It is possible to use sliding window in Xmeans algorithm as in SOM in a way that any data at any cluster can be removed at the end of windows time period and the related cluster weight can be updated dynamically.Unlike SOM, in order to create a new cluster, an old cluster must be divided into two parts in Xmeans.In this case, two new created clusters are close to each other.This can be a disadvantage for X-means because new data may have little relation with this cluster when very different data not belonging to any cluster occurs.In SOM+SW, diameter is used for creating new clusters.When a new data which does not belong to any cluster comes to system and its distance is bigger than the closest cluster, SOM+SW creates a new cluster.

Conclusion
In this article, an extended SOM algorithm with the Sliding Window (SOM + SW) approach is proposed and compared with classical SOM via various performed simulations based on a real data set about internet usages of mobile subscribers.The data set is taken from one of the lead mobile companies in Turkey 1 .In this study, the Sliding Window feature is added to the classical SOM by recalculating the average weight of each cluster for a specific time period.In order to figure out that SOM+SW gives more accurate results for clustering on time-stream data sets, a set of internet usage data from the mobile operator in Turkey is used as a case study.By using the past data usages of subscribers in this dataset, the clusters where the subscribers are involved have been determined for SOM and SOM+SW.After clustering, the subscriber characteristics Age, Tariff, Gender, City, and CRM_SEGMENT are used to estimate their future data usages.However, during the SOM+SW simulations, only last one hour data of the data set is used in generated clusters because of Sliding Window feature.In addition, SOM+SW is simulated for different threshold value parameter such as 125 kB, 250 kB, 500 kB, 1000 kB and 2000 kB.As a conclusion, the SOM+SW always outperforms SOM in terms of the difference between real and estimated data usage for all range values by giving more accuracy for small values due to the better cluster assignments for subscribers.

Figure 1
Data flow for SOM Clustering: (a) First data into classical SOM cluster (b) Second data into classical SOM cluster (c) After 10 6 th data into classical SOM cluster.

•
Call Start Date: When a user starts to use mobile data,• Total Data Usage: Total data usage of a subscriber after completing his/her internet usage,• Birth Date: Birthdate of a subscriber,• Gender: Gender of a subscriber,• City: Location of a subscriber,• Current CRM Segment: Classification of a subscriber that is assigned by the company,• Tariff: Bought tariff type by a subscriber.A 3-step general procedure is followed when the SOM and SOM+SW approaches are compared in the case study:• Calculate amount of Estimated Data Usage (will be discussed in below for SOM and SOM+SW respectively).Technical Gazette 24, 6(2017), 1729-1737•Get amount of the Real Data Usage (Total Data Usage) from the dataset (Tab.1).•Calculate the Difference of Accuracy on Data Usage= |amount of Real Data Usage -amount of Estimated Data Usage|.

Figure 4
Figure 4 Graph depicts the comparison of the SOM and SOM+SW for Dataset 2.

Figure 5
Figure 5 Graph depicts comparison of the SOM and SOM+SW for Dataset 3.

Figure 6
Figure 6 Graph depicts comparison of the SOM and SOM+SW for Dataset 4.

•
Difference of SOM+SW (125) = amount of Real DataUsage -amount of Estimated Data Usage with 125 kB constant range for whole Datasets.•Difference of SOM+SW (250) = amount of Real Data Usage -amount of Estimated Data Usage with 250 kB constant range for whole Datasets• Difference of SOM+SW (1000) = amount of Real Data Usage -amount of Estimated Data Usage with 1000 kB constant range for whole Datasets • Difference of SOM+SW (2000) = amount of Real Data Usage -amount of Estimated Data Usage with 2000 kB constant range for whole Datasets.

Table 1
A portion of internet data usages between 01/06/2012 and 30/06/2012 for a MDCS is presented during simulations.
• Finally, absolute of the Difference of Accuracy on Data Usage (Difference for SOM+SW) for the new arrived user in dataset is calculated by subtracting the amount of Real Data Usages (Total_Data_Usage) from amount of Estimated Data Usage of that user.
• The Real Data Usages (Total_Data_Usage) of the new arrived user in dataset (Tab. 1) is used for updating clusters (the aim of this is to keep new arrived users in clusters who have arrived in last one hour) after completing the calculating of the Difference of Accuracy on Data Usage (Difference for SOM+SW).

Table 2
This table depicts the retrieved results of "Difference for SOM (Procedure A)" and "Difference for SOM+SW (Procedure B)" of the first week 1).The data was grouped for 6 hours periods in Dataset 1 that is highlighted bold in the "Time" column.

Table 4
This table depicts the retrieved results of "Difference for SOM (Procedure A)" and "Difference for SOM+SW (Procedure B)" of the third week (Dataset 3).

Table 6
The table presents the results of four different datasets.

Table 7
The table presents simulation results of whole datasets with five different constant values (125 kB, 250 kB, 500 kB, 1000 kB, and 2000 kB).
* The fractional parts of the given values above are discarded.