Week Ahead Electricity Price Forecasting Using Artificial Bee Colony Optimized Extreme Learning Machine with Wavelet Decomposition

Electricity price forecasting is one of the more complex processes, due to its non-linearity and highly varying nature. However, in today's deregulated market and smart grid environment, the forecasted price is one of the important data sources used by producers in the bidding process. It also helps the consumer know the hourly price in order to manage the monthly electricity price. In this paper, a novel electricity price forecasting method is presented, based on the Artificial Bee Colony optimized Extreme Learning Machine (ABC-ELM) with wavelet decomposition technique. This has been attempted with two different input data formats. Each data format is decomposed using wavelet decomposition, Daubechies Db4 at level 6; all the decomposed data are forecasted using the proposed method and aggregate is formed for the final prediction. This prediction has been attempted in three different electricity markets, in Finland, Switzerland and India. The forecasted values of the three different countries, using the proposed method are compared with various other methods, using graph plots and error metrics and the proposed method is found to provide better accuracy.


INTRODUCTION
In a deregulated competitive electricity market, an electricity price, forecasted a week ahead, is one of the most important concerns of the market participators. During bidding, market participators can adjust their bidding, based on the forecasted price to gain more profit. The consumer can also manage electricity usage, based on the hourly electricity price, by diverting the load to nonpeak hours or can utilize the power from other sources during peak hours.
Electricity price forecasting has attracted researchers across the globe over the past decade due to its need and its complexity. Various network models have been proposed from different perspectives by researchers. In [1,2] a simple Artificial Neural Network (ANN) is used for short term electricity price forecasting. Due to the complexity of the problem, the authors combined two different tools for the forecasting model, such as wavelet transform with a neural network and other time serious predictions in which wavelet transform is used to decompose the time series data into different frequencies, to enable easy training in relation to the network [3,5]. Similarly, in [4,6,7] fuzzy logic is integrated with the neural network and other time series predictions to improve training and testing.
Different algorithms for training neural networks have been proposed since the evolution of the machine learning field. In [8] the short-term electricity price is forecasted for electricity markets in mainland Spain and California, using a neural network trained in the Levenberg-Marquardt algorithm. Likewise, in [9] a two stage hybrid network which comprises a self-organized map and support vector machine is proposed; in the first stage, input data are clustered and in the second stage of the network, training is carried out.
Subsequently, to improve neural network training efficiency, different neural network functions are used. In [10] an adaptive neural fuzzy inference system, combined with a radial basis function neural network is proposed and forecasts the system load and electricity price. This system integration improves the forecasting accuracy. Similarly, in [11] an adaptive wavelet neural network is proposed in which the Mexican hat wavelet is used as an activation function for the hidden layer neuron in the feed forward neural network, as well as for the forecasted market clearing price of the Spanish market and the locational marginal price of the PJM electricity market. Likewise, in [12] the advanced self-adaptive radial basis function neural network, trained by fuzzy c-means and auto-configured by differential evaluation, is presented and forecasts the electricity market price of Queensland, Australia.
In recent years, various swarm-based optimizing algorithms have also been used to optimize the various parameters and structures of the neural networks, used to forecast the electricity price. In [13] the feature selection technique is combined with the neuro-evolutionary algorithm for electricity price forecasting in the PJM and the Spanish electricity market. Here, an evolutionary algorithm is employed as an iterative search algorithm to optimize the parameters of the neural network. In [14] a modified relief algorithm for feature selection is combined with the hybrid neural network proposed for prediction and tested in Ontario, New England and Italian electricity markets. In [16] wavelet transform, firefly algorithm and fuzzy ARTMAT are combined and a price forecasting model is proposed. This model's robustness is measured by the statistical index and the electricity price of the PJM market is forecasted. Similarly, in [17] the ANN, combined with the clustering algorithm is used for electricity price forecasting one day ahead. In this method, training data are clustered in a homogenous group and different ANN topologies are tested within a data set that consists of a typical price pattern.
Probabilistic forecasting is also conducted by researchers in electricity price forecasting. In [15] an extreme learning machine with bootstrapping is proposed and used for electricity price forecasting, in which ELM is applied to train the ANN, and bootstrapping is incorporated to improve the forecasted price intervals. Likewise, in [18] the probabilistic electricity price is forecasted using multilayer perceptron as a feedforward neural network and it does not use any data processing techniques. In [19] a hybrid model is presented comprising of Elman recurrent neural network and refined VMD-based framework optimized by group search optimization for multistep electricity price forecasting.
In [20] a hybrid kernel, ELM, based on self-adapting particle swarm optimization and an auto regressive moving average model, is proposed. The first input time single is decomposed by wavelet decomposition technique; stationary serious is forecasted by ELM and non-stationary serious is forecasted by the ARMA model. Performance is tested using the electricity price data of the PJM, Australian and Spanish electricity markets. In [21] the short-term electricity price is forecasted using a stacked denoising autoencoder model and it is tested in different hubs in the United States, including Nebraska, Arkansas, Louisiana, Texas and Indiana. In [22] a new method is proposed using an improved wavelet neural network, trained by an extreme learning machine. Here the uncertainty of the predicted model is considered, and the bootstrapping technique is used to implement it.
The extreme learning machine is widely used currently to train the single hidden layer feedforward neural network because of its quick learning speed and accuracy. Many variants of ELM have also emerged recently. Due to its expletive performance, we use one of the variants of ELM in this paper. In this research, a novel hybrid technique consists of an ABC-ELM algorithm and wavelet decomposition to conduct electricity price forecasting. Here wavelet decomposition is used to decompose the input price series into various decomposed signals, and each decomposed signal is forecasted using ABC-ELM. The forecasted signals are then reconstructed to form predicted price signals. Then Wavelet-ABC-ELM is used to forecast the electricity price in the Finnish electricity market, obtained from www.nordpoolgroup.com [33]; the Swiss electricity market price was obtained from www.mercatoelettrico.org [34] and the Indian electricity price from www.iexindia.com [35].
This paper is organized as follows: section 2 explains Extreme learning machine, section 3 contains the detailed explanation of Proposed optimized extreme learning machine, section 4 deals with Electricity Price Forecasting, section 5 contains Results and Discussion and section 6 concludes the paper.

EXTREME LEARNING MACHINE (ELM)
Extreme learning machine was originally proposed in [23] where ELM trains the Single Hidden Layer Feedforward Neural Network (SLFN) by randomly choosing the weights between input and hidden layer and hidden neuron bias and analytically solves to determine the output weights of the ELM. The results show that training speed is number of times faster than the conventional training algorithm. Later on, ELM is used for various applications and the variants of ELM are also proposed by the researchers. In [24] Evolutionary Extreme Learning Machine (E-ELM) is proposed in which input weights are optimized by differential evolutionary algorithm. In [25] Improved Extreme Learning Machine (I-ELM) is proposed where the hidden neuron activation function is selected based on Fourier series expansion.
In [26] Error Minimized Extreme Learning Machine (EM-ELM) is presented where the number of hidden nodes is randomly selected one by one or group by group. In [27] Sparse Extreme Learning Machine (S-ELM) is proposed as an alternate method for classification problem and here the complexity of computation in matrix inverse in unified ELM is reduced. In [28] author proposed a new training algorithm for SLFN called Optimized Extreme Learning Machine (O-ELM) in which number of hidden neurons, number of inputs and hidden layer activation function are optimized using optimization algorithms and it is tested for several benchmark datasets. Likewise [29] proposed Multiple Kernel Extreme Learning Machines (MK-ELM), a new algorithm to train Single Hidden Layer Feedforward Neural Network (SLFN) in which the author optimally selects the kernels and the structure of the network. The neural network used in this paper is single hidden layer feed forward neural network (SLFN) which is shown in Fig. 1 and it can be mathematically represented as follows.
where n is number of input variable, h is number of hidden neurons, x i is the input variable for i = 1, 2, …, n, w ij is weight of the connections between input layer and hidden layer for j = 1, 2, …, h, b j is the bias of the hidden layer, f j (.) is the activation function of the hidden layer neurons, v j is the output of the hidden layer neurons, w j is the weight of the connections between hidden layer and output layer, g(.) is the activation function of the output layer neuron and y is the final output of the network.
In [23] Extreme Learning Machine (ELM) for SLFN, the weights between input layer and the hidden layer are randomly chosen and the weights between hidden layer and the output layer are computed by Moore-Penrose generalized inverse method and this network can be utilized as universal approximation for continuous piecewise nonlinear function with any bounded values.
Considering number samples as N, output bias as zero and the output neuron has linear activation function, Eq.
(1) can be written as: is the vector of the weight of the connections of hidden layer and output layer and V is matrix of the hidden layer output: Considering randomly assigned input weights and bias matrix as W it is given by 1 2 The output weights vector w 0 is estimated as where † V is the Moore-Penrose generalized inverse of the output matrix of hidden layer and  is the desired output and is given by: By substituting Eq. (7) in Eq. (6) 0 w can be obtained as follows by least-squares solution:

PROPOSED EXTREME LEARNING MACHINE (ELM)
Here optimized Extreme Learning Machine denotes that the Extreme Learning Machine (ELM) for SLFN is optimized with the help of Artificial Bee Colony (ABC) algorithm. The parameters such as the weight of the connections between input layer and hidden layer, the bias of the hidden layer neurons, the activation function of the hidden layer neurons and the regularization parameter are optimized and the best weight for the connections between the hidden layer and output layer are obtained by solving the following problem: where 2  is the Euclidean norm and it is the minimum norm solution to the problem which is defined in the Eq.
The problem here is the two stage problem; the first is the solution to the least squares mentioned in Eq. (9) and the second is the minimization problem mentioned in Eq. (10). This two stage problem can be transformed into single stage minimization problem with the help of Tikhonov's regularization [30] and it is given by: where  is the regularization parameter and 0   The solution to the problem in Eq. (11) is explained in [30] and it is given by: In ELM the number of hidden nodes required is more than the conventional algorithms like back propagation Support Vector Machines (SVM), Logistic Regression (LR), etc. In order to overcome this, numbers of parameter in the SLFN are optimized by a suitable optimization algorithm as mentioned earlier. The objective function for optimization of SLFN is minimization of the function given below: where  is the objective function: where E rmse (y, y d ) is the Root Mean Square Error (RMSE) of the real output (y d ) and the predicted output (y). In the training and testing to improve the performance the data sets are collected in such a way that there is no overlap between them.
In the process of optimization each individual will be constituted by the following equation: , ..., , , ..., , , ..., , where k = 1, …, m., m is the number of population size or number of individual in the optimization process, s j is an integer variable that defines the activation function f j of each neuron separately in the hidden layer and The activation function f j is given as follows: Adjustable hidden layer is possible by using the parameter s j , when s j = 0 that particular neuron output of hidden layer is zero and it is not considered, and the activation function of theeach neuron is selected differently according to the value of s j as mentioned above. The different activation functions used are sigmoid function s j = 1, tangent function s j = 2, hyperbolic function s j = 3, linear function s j = 4.
In this optimization problem the decision variables are of real integers, so in order to use the decision variable to any optimization problem the variables are mapped with in the interval of (0, 1) as real variables. All variables are converted to their original value before the computation of objective function value for each population. If the original value of l -the variable (l =1, 2, …, v) is the real value of the individual k, then it is given by: where floor(.) is a MATLAB function which rounds the value to the greatest integer which is less than or equal to its value.
The variables denoting the weight w ij of the connections between input and hidden layer and the hidden layer bias b j are converted using the Eq. (17) by considering the lower bound as -1 and upper bound as 1. The variables s j , j = 1, …, h, are integers and are converted using the Eq. (18) by considering lower bound as 0 and upper bound as 4. The variable denoting the regularization parameter α is also converted using (17) by considering the lower bound as 0 and the upper bound as 100. In this work ABC optimization algorithm is used.

Artificial Bee Colony Algorithm
Artificial bee colony algorithm is a population-based optimization algorithm inspired by the forging action of honey bees originally proposed by Dervis Karaboga in 2005 [31]. It basically consists of three groups of bees namely employed bees, onlooker bees and scout bees. Colony of bees is equally divided into employed bees and onlooker bees. In other words number of employed bees is equal to number of onlooker bees. Employed bees search for the food sources in the search area around the hive and nectar amount is calculated for each employed bee. Employed bees pass the location and amount of nectar for the food source to the onlooker bees. Onlooker bees search for the new food source around its position with higher nectar amount. After a number of iterations if onlooker bees cannot update their nectar amount they abandon the food source and become scout bees and search for the new food source. This process will continue until a food source of the highest amount of nectar is found.

Implementation of ABC to Optimize ELM (ABC-ELM)
In this optimization problem each food source denotes the possible solution. Population P equal to the colony size is randomly generated in the rage of 0 to 1 using the equation given: P kl is the variable where l = 1, 2, …, q and k = 1, 2, …, m, is the population size and is the number of variables to be tuned and it is given by: Once initial population is created as mentioned in Eq. (20) objective function mentioned in Eq. (14) is evaluated for each population. Now employed bees create a new food source to its current food source using the following equation: where   Then onlooker bee evaluates the nectar amount from all employed bees and selects a food source based on the portability P k given by: Onlooker bee updates their food source iteratively. If an onlooker cannot update its food source after a certain number of iterations the particular onlooker bee becomes a scout bee and searches for a new food source using the Eq. (20). Again, employed bees update the food source according to the Eq. (21) and this goes on until a food source with the highest nectar value is found (the best result for the objective function). The detailed flowchart of proposed ABC-ELM is shown in Fig. 2.

ELECTRICITY PRICE FORECASTING
Here electricity price forecasting a week ahead is carried out for three different types of electricity price data, namely the Finnish electricity market price, the Swiss electricity market price and the Indian electricity market price. Since electricity price data vary considerably in terms of time, wavelet decomposition is used to preprocess the electricity price data to reduce the complexity of training. Detailed flow chart of the proposed network for electricity price forecasting is shown in Fig. 3. The three different electricity data are taken in the following manner. Finland electricity price from January to March 2017, Switzerland electricity price from May to July 2017, Indian electricity price from September to November are taken and shown in Fig. 4, Fig. 5, Fig. 6 respectively. These three data are used for the purpose of training the proposed network.

Input Data Formats
Two different input data formats are taken and used in the proposed method and in other methods. The two different input formats are: 1. Conventional Data Format (CDF) and 2. Modified Data Format (MDF).

Conventional Data Format
Input data are taken in such a way that the electricity price for the first six hours (H1 to H6) is taken as a first input set and the electricity price for the seventh hour (H7) is taken as an output of the first set input. The electricity price of the second six hours (H2 to H7) is taken as a second input set and the electricity price of the eighth hour (H8) is taken as an output of the second set input. This procedure is continued until the last hourly price of the three-month data becomes the output. Here only one hourly price is taken as an output so as to conduct one step forecasting. Similarly, the electricity price of the first week of the fourth month is taken in the same format for forecasting purposes. This is detailed in Fig. 7. The electricity price of Finland during the period of January to March, 2017 is taken for training and first week of April, 2017 is taken for forecasting. The Swiss electricity price during the period of May to July, 2017 is taken for training and the second week of August, 2017 is taken for forecasting. The Indian electricity price during the period from September to November 2017 is taken for training and the electricity price in the fourth week of December, 2017 is taken for forecasting. These training and forecasting data are used in a similar format to that shown in Fig. 7.

Modified Data Format
To improve the efficiency of the prediction from the input perspective, a slightly different pattern is followed. Instead of simply taking the previous six hours' price data as an input here, we are taking the previous four hours as an input, and in addition, the inputs of the previous day, the day before that, the previous week and the week before the previous week's output hour were also taken as inputs.  If we consider that hp is the specific hourly price taken as an output, then the hourly price of hp-24, hp-48, hp-168, hp-336 is also taken as an input, in addition to the previous four hour inputs (hp-1, hp-2, hp-3 and hp-4).
The input and output data format of the modified data format for the single input and output is explained and clearly shown in Fig. 2. Similarly, all the three-month data are formulated so as to have a complete input and output data set. Likewise, the three different electricity prices, namely, the Finnish electricity price, the Swiss electricity price and the Indian electricity price data are formulated accordingly for the purposes of training, as well as forecasting.

Wavelet Decomposition
Wavelet Decomposition (WD) is used to preprocess the electricity price data used for training of the proposed network. Here we use MATLAB wavelet toolbox for the purpose of decomposition using Daubechies Db4 decomposition at level 6. The Daubechies Db4 wavelet decomposition is explained in detail in [32]. Fig. 9, shows the detailed and approximated waveform of Finland electricity price data after decomposition using Db4 level 6. The figure consists of six details from D1 to D6 and one approximation A6. Similarly Fig. 10, and Fig.  11 show the detailed and approximated waveform of Switzerland and Indian electricity price data.

RESULTS AND DISCUSSION
Three different electricity price data formats are forecasted in the following manner: 1. the electricity price in Finland during the first week of April, 2017, 2. the electricity price in Switzerland during the second week of August, 2017, 3. the electricity price in India during the third week of December, 2017, using two different input formats. Technical Gazette 28, 2(2021), 556-567 where F j is the actual value, f j is the forecasted value, n is total number of data used for calculation.

Results of Conventional Data Format
The comparison plots are plotted in such a way that each plot consists of six different electricity prices namely 1. Actual value of electricity price which is forecasted 2. Forcasted result using Back Propagation neural network, 3. Forcasted result using Extreme Learning Machine, 4. Forecasted result using PSO-ELM, 5. Forecasted result using ABC-ELM and 6. Forecasted result using proposed method.  The plot in Fig. 12 shows the comparison of the forecasted result for Finland's electricity market for the first week in April, 2017 with different methods, such as BP, ELM, PSO-ELM and ABC-ELM, using the conventional data format for inputs. From the figure itself it was observed that the electricity price of the proposed method is very close to the actual electricity price, and it was also observed that the proposed method was better than the other methods. Similarly, the plots in Fig. 13 and Fig. 14 show the forecasted hourly electricity price in Switzerland during the second week of August, 2017 and the forecasted hourly electricity price in India during the fourth week of December, 2017, respectively, using the conventional data format for inputs. From the plots it can be inferred that the proposed method indicates better results than the other methods with which it is compared; furthermore it can be seen that during peak variations and sudden changes, the proposed method works efficiently.
In Tab. 1 shows the values of the set 1 metrics of the electricity price forecast, using different methods, to ascertain Finland's electricity price during the first week of April, 2017. The values clearly indicate that the proposed method performed considerably well when compared to the other methods.

Results of Modified Data Format
A modified data format specified earlier is employed and the proposed method is used for forecasting the electricity price. Similar way of approach is taken as of the previous section, the only difference being that this section uses a modified data format instead of a convention data format for inputs.    Similarly Fig. 16 and Fig. 17 show the forecasted electricity price in Switzerland during the second week of August, 2017 and Indian electricity price during the fourth week of December, 2017. From these two graphs it can be inferred that the proposed method graph is very close to the actual electricity price graph and it was observed that it was performing well.
Tab. 7 and Tab. 8 show the metrics value of set 1 for forecasted electricity price of Finland during the first week of April, 2017 and Switzerland during the second week of August, 2017 respectively, using modified data format. The values obtained in the metrics are considerably less in the proposed method when compared with other methods.  Likewise, in Tab. 9 the metrics of set 1 is shown for electricity forecast of different methods for Indian electricity price during the fourth week of December 2017, using modified data format. From the table it can be inferred that the proposed method is performing well when compared with other methods. Tab. 10 consists of values of set 2 metrics of electricity price forecast for Finland electricity price during the first week of April, 2017 using modified data format. From the values obtained the proposed method is outperforming the other methods.  Similarly, Tab. 11 has values of set 2 metrics of electricity price forecast for Switzerland electricity price during the second week of August, 2017 using modified data format. From the table it can be inferred that the proposed method gives good results when compared to other methods. Likewise Tab. 12 consists of values of set 2 metrics of electricity price forecast for Indian electricity price during the fourth week of December 2017 using modified data format and from the values obtained it can be seen that the proposed method is performing better than other methods.
To further demonstrate the superiority of the proposed method, a comparison study of the proposed method with other methods, reported in the literature, [16,36], is conducted. Hence, two real-time markets, namely, the European Energy Exchange (WPEXSPOT) and the Ontario Electricity Market (OEM) are considered. To ensure uniformity, the coding of these methods is simply adopted from the repository, suggested by the authors. Accordingly, this paper simply adopts the parameter settings of the methods in [36]: the number of the hidden layer is 1, the number of hidden neurons is 100 and the population size is 600. For paper [16] the values are adopted as a decomposition level of WD of 3, a population size of 200, the number of the hidden layers is 1 and the number of hidden neurons is 30. Tab. 13 consists of the comparison of our proposed method with the literature paper [16] for the forecasted results of OEM price. Results suggest that the proposed method gives more reliable and stable results when compared with other methods. Similarly, Tab. 14 consists of a comparison of our proposed method with the literature paper [36] comprising the forecasted results of the ANEM price. The values of metrics in the table clearly indicate that our method outperforms the other methods.

CONCLUSION
Integrating two or more techniques by appropriately identifying the meritorious features in them will certainly yield promising results. This has been re-established again in this paper by suitably integrating three powerful techniques, namely, the ABC-ELM and wavelet decomposition. The proposed hybridization is a forecasting framework integrating the ABC with ELM such that the ELM prediction accuracy is maximized, by suitably optimizing the ELM parameters, using the ABC. Several experiments have been conducted using the proposed method to forecast the Finnish, Swiss and Indian electricity market price, with diversified studies based on various climatic seasons. The statistical analysis of the proposed method, along with other well-established methods for forecasting, depicts the superiority in terms of accuracy, provided by the proposed ABC-ELM along with the wavelet decomposition technique, by reducing forecasting errors considerably. Thus, the ABC optimized ELM forecasted more reliable and precise results. The proposed input data format further minimizes the error when compared with existing input formats and establishes the capability and robustness of the proposed hybrid method. The usage of the wavelet decomposition technique reduces the errors during peak prices, however, this method is viewed as a time-consuming process. Further studies on de-noising data and correcting outliers that are widespread in real-time electricity price forecasting data, shall also be considered in future research.