Prediction of Energy Consumption in Buildings Using Support Vector Machine

The energy consumption of buildings can directly affect the buildings users' budget and their satisfaction with the investment in the property. Vice versa, buildings energy consumption has a social implication on the buildings' users. Additionally, building energy consumption is connected with the buildings influence on the environment due to the CO2 emission. Thus, having a model for energy usage prediction is of crucial importance. Data for sixty real-built buildings were collected. Using support vector machine, a model was developed for prediction of energy consumption. The mean absolute percentage error of the model is 2,44% and the coefficient of determination of the model R2 is 94,72%, which expresses the global fit of the model. The model is useful for all participants in the designs of buildings, particularly in the early phases. It can serve as a decision support model during the process of selection of optimal building design.


INTRODUCTION
It is commonly accepted that buildings have big participation in yearly emission of CO 2 per capita and can be concerned as large consumers of energy. Their consumption of total world annual energy is around 42% [1] (it is about 20 -40% with upward trend in developed countries [2]).
The situation described above is supported by the Macedonian Statistical State Office records-the statistical data indicates that there is increasing buildings energy demand each year. The energy demand of the households was 36,5% of the gross-domestic energy demand in 2012. The largest energy demand in this sector 71% goes for cooling and heating, 17% for heating of domestic water, etc. The final consumption of electricity in the residential sector represented 47,4% of the total final consumption of electricity in 2015 in the Republic of North Macedonia, according to the statistical report of the International Energy Agency [3]. Regarding the green gasses emissions, 3.48 tons of CO 2 per capita were calculated in 2015, according to the same source. Consequently, there is an imperative demand for low-cost, energy-efficient housing in the Republic of N. Macedonia.
Concerning the above, one of the EU goals for sustainable development is efficient energy consumption [4] and its decreasing particularly in the sectors which are non-productive, such as the residential sector [1,5]. In that regard, new and existing buildings are named as very attractive structures because of energy usage and CO 2 emissions [5][6][7][8]. Thus, the Strategy for energy efficiency in the R. of N. Macedonia [9] anticipates that energy savings should be made in all sectors, particularly in the residential/housing sector (with incorporation of existing buildings as big consumers of energy).
As stated in [1], the first step in decreasing the building's energy consumption is effective design of the building. Thus, the energy consumption of a building and the associated costs, as well as the index of specific consumption of a building [10] are one of the elements with crucial importance in the process of optimal building design selection.
Various measures can be used for reduction in the energy consumption in buildings [6]. Some of them are: installation of ground source heat pump systems [11], combination of systems for heating, cooling, and power [12], using renewable energy sources such as solar, wind, hydro and biomass energy. Also, the construction of buildings which are self-sustainable is a solution for meeting the growing energy demand [13]. Furthermore, previous experiences in energy consumption according to particular building design could be used as a contribution in design improvement [14]. Because of that, there are studies that are focused on different investigations regarding buildings design and buildings energy usage [15,16].
Improvement of the energy efficiency in the existing and new buildings can be generally classified in two levels. The first level is improvement of the building envelope characteristics: quality insulation, windows and doors with improved insulation and reflective characteristics, use of naturally based and recycled building materials etc. The second level is improvement of the HVAC systems: substitution of the energy resources, improvement of the systems, heat pumps, solar systems for domestic hot water, installation of photovoltaics [17], use of the wind energy whenever possible, etc. However, neither the best solutions for heating, ventilation and air condition can give effects, if the thermal characteristics of the building are poor.
Aim of the paper: Buildings, as one of the biggest energy consumers, can play an important role in environment protection, CO 2 emissions, and social and economic impact on building users. Therefore, estimation of energy consumption of building during its exploitation phase is a responsible and complex process which has a multidisciplinary nature.
Energy consumption is influenced by numerous factors. Practice shows that the geometry of the building the ratio between the building envelope and the total volume of the building, as well as the heated area and volume, strongly affect the final energy consumption of the building. Therefore, having a model for energy prediction is important for all design participants. This paper emphasizes the necessity of a model for building energy prediction and its crucial role in the process for optimal design selection.
Directions for future investigations in the field of energy and buildings are suggested in [18]. Several sets of directions are noted, such as: contextual factors, links between practitioners and policy makers, energy strategies (national/regional), designing/using tools and models etc. Regarding that, the aim of this paper is to develop a model for prediction of building energy consumption, using intelligent forecasting technique-support vector machine. Furthermore, the main goal of this model is to observe the consequences of changes in physical parameters and building envelope components, as an important issue in the design process of new or existing buildings.
Methods used for prediction of building energy consumption can be generally classified into statistical, engineering and artificial intelligence methods [20]. Prediction of energy use in buildings, as an approach for energy conservation and reduction of greenhouse gas emissions, has received a remarkable amount of attention from researchers [21]. Previous investigations for energy consumption of buildings are based on a variety of techniques and methods, such as: neural networks, linear regression, multi-linear regression, etc.
The demand for residential electricity in the countries that belong to the Gulf Cooperation Council is investigated in [22]. The results obtained show that the improvement of the appliances' efficiency and the increase of the consumers' awareness for using the energy is one of the measures for the electricity consumption cutting in this very hot region.
For existing houses in the UK, Jones et al. developed a model for prediction of CO 2 emissions and estimation of energy performance of houses [7]. The impact of energy conservation measures was also assessed. Their investigation for assessing the energy performance of the buildings was based on the SAP method, approved by the UK government. The study main finding is that CO 2 emissions can be reduced around 10 -30% by using existing funding opportunities for installing 'shallow' elemental measures.
Debating the comfort and its future in relation to indoor environment, environmental sustainability and consumption of energy, it has been stated that the quantity of energy used for cooling and heating has a high rating/value ratio, [23]. In that relation, the investigation in [5] stressed that there was a discrepancy between the measured and calculated energy performance ratings for heating homes in Germany. The consumed energy by the occupants was less than the calculated energy rating by around 30%.
Using EnergyPlus software, authors in [11] presented a feasibility analysis of the installation of ground source heat pump systems. The results of this analysis indicate that for Cyprus, the installation of ground source heat pump system is useful for a multi-family building and for a single building. In both types of buildings, the system achieves a substantial reduction in consumption of primary energy. Regarding carbon emissions, it was concluded that in a single-family house emissions for conventional system were lower than emissions for the geothermal system.
A model for multi-objective decision choosing for energy savings of buildings is presented in [19]. The model allows examination of alternative measures which are evaluated using a set of criteria, such as the initial cost of investment, building's annual consumption of primary energy and carbon dioxide emissions etc. Additionally, the buildings energy savings by installation of a system for energy management are investigated in [24], while [1] proposes a reduction of costs for building construction by usage of local materials with low-embodied energy. Furthermore, the efficient design of the buildings should be integrated with the usage of renewable energy systems in the buildings' design and construction, as is stated in [1].
The lifestyle and energy usage habits of residents are affected by the urban construction and the general community development. In that relation, the analytic hierarchy process-AHP, Delphi method and fuzzy logic are used in [25], in order to develop a model that is a symbiosis of the model for construction of sustainable community and the concept of low-carbon emissions.
Research on energy consumption in buildings shows that traditional statistical and numerical methods are widely used methods (for example, linear regression). However, these methods do not give accurate results in many cases, due to the nonlinearity of the assessment of energy consumption, represented by many nonlinear equations with a great number of variables. Furthermore, the analysis of the specific parameters' influence on the consumption of energy used for heating and cooling in the buildings is difficult to conduct experimentally, since it is almost impossible to vary the selected parameters while keeping the others constant. Influential parameters can be climate variables, building structure, thermal properties of the building materials and occupants' activities [20]. Thus, the new innovative methods for assessment of energy consumption are mainly intelligent methods for prognostic modelling, based on the principals of soft computing (such as artificial neural networks (ANNs), SVM and Fuzzy logic systems). Their advantage is obvious in finding solution for different nonlinear engineering problems, which do not have completed data and cannot be solved using the traditional methods of modelling. As stated in [26], genetic algorithms and AANs can be used for modelling and performance prediction of buildings' energy systems. Also, they can be used as a design tool in buildings' energy applications. Later analyses of energy consumption and energy efficiency indicated that predictions with ANNs' usage are more accurate than the empirical relations [27][28][29]. Therefore, ANNs are used for investigation and prediction of energy consumption on different levels. In paper [30], the authors built an ANN model for prediction of cost premium of LEED certified green buildings based on LEED categories. Multiple regression analysis was used for validation of the model as a benchmarking model.
NNs were also used in [31] as one of the approximated thermal comfort models for HVAC systems. Actually, they used two models: artificial neural network as the first model and polynomial expansions as the second approximated thermal comfort model for HVAC systems. They both achieved similar advantages in modelling thermal comfort for HVAC systems. There are other previous investigations which are worthy of attention. The suitability of Elman neural networks and feed-forward artificial intelligence techniques for prediction of energy production is explored in the paper [32]. Ismail et al. [33] used ANNs to analyse the energy distribution systems. Other authors used ANNs for buildings' energy consumption prediction; indoor air temperature prediction; winter/summer requirements of energy [29]; short term hourly consumption in buildings; modelling, forecasting and controlling the whole energy system of buildings [31], etc.

METHOD
A survey was conducted to collect data relevant for prediction of building energy consumption Q (kWh/m 2 /year). The survey covered 60 residential buildings that were built or reconstructed in R. of N. Macedonia during the last six years. 34 of all buildings were individual houses, as follows: 10 with not more than 100 m 2 heated area, 13 with 100 to 200 m 2 , and 11 buildings with 200 to 300 m 2 heated area. The remaining 26 were buildings for collective living, 11 of which were with up to 1000 m 2 heated area and 15 with above 1000 m 2 .
For each building, during the in-situ visits, the data was collected related to: the thermal conductivities of the walls, roofs, floors and windows, as well as their corresponding geometries and areas, which were useful as input parameters for the prognostic model. Particular attention was given to the variables that depend on the building geometry, construction materials that are used during the building construction etc., such as:  Af -area of the floor (m 2 ),  Ar -area of the roof (m 2 ),  Aw -area of the window (m 2 ),  Ae -area of the total building envelope,  Uf -thermal transmittance of floor (W/m 2 K),  Uw-thermal transmittance of window (W/m 2 K),  f 0 -shape factor,  Ah -total heated area,  Htr -transmission losses,  Hh -heating hours per day.
These 10 variables were selected as the most representative predictors for model building, out of a total of 19 available variables.

Support Vector Machine
Performing the task of learning from experimental data (observations, examples, measurements and patterns) belongs to the field of soft computing. The mathematical models, such as support vector machines (SVM) and neural networks stand behind this idea for the last several decades, solving this basic problem in engineering and modern science [34]. Fuzzy logic also belongs to soft computing, tending to embed available structured human knowledge into mathematical algorithms by fuzzy logic models. Softcomputing techniques are of great interest for modelling unknown or partially known systems or processes which are highly nonlinear; they are also universal approximations of any function (multivariate), and because of this they are important tools for many practical contemporary problems [34].

Support Vector Machine Algorithm
SVMs development started from the theoretic sound approach by implementing statistical learning theory. They can be applied very successfully for classification and also regression tasks.
After the successful process of training, the SVM will find the dependency f(x, w) between input x and output y. Using available data in regression task, or in classification tasks it will find a function which separates the data. This function is approximation of the true dependency between input x and output y in the regression task or separation function in the task of classification. This function should also minimize some risk function R(w), also called error function or loss function.
Solving the general regression problem with SVM starts with providing the machine (SVM) with l training data from the training data-set D (Eq. 1), [34]: where x i are n-dimensional vectors, x i ∈ R n , and the responses from the system y i ∈ R are continuous values.
The function approximation f(x, w) is nonlinear function of the weights w which are the subjects of learning. Significant contribution for the theoretical and practical development of support vector machines was delivered in [35]. The implementation of the SVM algorithm in DTREG software is based on that project.
There are several kinds of error (loss) functions which estimate the error of approximation. The most popular is Vapnik's linear loss function with ε insensitivity, defined as: , otherwise This error defines a tube with radius ε, as it is presented in Fig. 1.   Figure 1 ε-tube, [35] For the predicted value in the tube, the error is 0. If the predicted values are out of the tube, the error is equal to ξ or ξ * if the predicted value is above or below ε-tube, respectively.
For simplification, the necessary concepts of SV regression will be explained on linear regression first, when the approximation function is linear: For performing SVM regression, a new empirical risk is defined: The task of solving linear regression using SVM is in fact obtaining function f(x, w) defined with Eq. (3), such that it approximates all pairs (x i , y i ) with precision ε.This situation is presented in Fig. 2. The width of the tube is 2 w  and in order to obtain minimal deviation of the pairs (x i , y i ), w (the norm of w) should be minimized.
Considering that: the Eq. (5) becomes: After the optimization process, optimal vector w0 and optimal b0 are found, and the obtained optimal regression approximation function is: Most of the nowadays contemporary problems do not have linear nature, and the basic idea of obtaining nonlinear regression function in SVM algorithm is mapping the vectors x from the input space to multi-dimension vectors z from new multi-dimensional feature space F with some mapping function Ф and solving the linear regression problem described above in the new space. The mapping function Ф is called kernel function.
After finding the optimal approximation linear function in the feature space, the corresponding approximation function in the input space is easy to find. The architecture of SVM and the process of mapping is given in Fig. 3 [34].
The most important parameters in the process of training and modelling with SVM are ε and C (defined in Eq. (2), Eq. (5), Eq. (6) and Eq. (7)), which should be chosen by the user, depending on the specific data, [36,37].

RESULTS AND DISCUSSION
SVM model of the predictive modelling software DTREG is used for prediction of the target variablebuilding energy consumption [36,37]. The most accurate model was built with the most representative 11 variables, out of a total of 19 variables available for building the model. The building energy consumption Q was used as a target variable and the rest 10 (areas and thermal transmittances of the building envelope elements, heated area and transmission losses of the building), were used as predictors (given in the beginning of sec. 3).
Before being submitted to DTREG software, the variables were normalized on the interval [1, 2]. DTREG software offers two methods for validation and testing the model on unseen data: 1) random percent and 2) v-fold cross validation.
The results for the most often used indicators for the accuracy of the model for validation data, using the random percent (15%) method for validation, are presented in Tab. 1. Fifteen percent (15%) of the data were chosen for testing the model on unseen data, the rest 85% were used for training. The standard estimators of the model: MAPE (mean absolute percentage error) and R 2 -the coefficient of determination, which reflects the global fit of the model, for the validation data are: MAPE = 2,44% and R 2 = 94,72%, respectively. The correlation coefficient between the predicted and the actual target variable is 0,974.
Model was also tested with 8-cross validation method and the results for the accuracy for MAPE and R 2 for both methods are given in Tab. 2. The relative significance of the predictors for the model is obtained using sensitive analysis (DTREG, Tab. 3).
The model was also tested on the other 4 different predictive models with the random 15% validation method: Linear Regression (LR) and three types of neural networks (Radial basis function (RBF NN), Multilayer perceptron (MLP) and General regression (GRNN)). The SVM model gave the highest accuracy (Tab. 4).
DTREG software computes minimal, maximal, mean value and standard deviation for every numerical variable (Tab. 5).   RBF (radial bass function) was used as a kernel function in our model. DTREG software offers several types of SVM models: 2 types for regression and 2 types for classification tasks. Epsilon-SVM type is used for our model. For every chosen SVM model there are 4 kernel functions offered: RBF, sigmoid, polynomial and linear. In the most of the cases RBF gives the best results [36].
The parameter of the model-Epsilon is a tolerance factor which controls the stopping criteria for the optimization process of the SVM algorithm in DTREG and it can be reduced or increased by the user, for obtaining more accurate model, or to reduce computational time, respectively. In our model the Epsilon value is chosen 0,001.
The selection of the parameters of the SVM algorithm impacts the accuracy of the model mostly. DTREG offers two methods for searching the optimal parameter values: a 'pattern search' and a 'greed search'. With the pattern search the searching algorithm starts at the center of the interval selected by the user and does trial steps in each direction for every parameter, and the process is repeated from a new center point if the accuracy of the model improves; if there is no improvement, the searching process starts again with a reduced step size. This process is repeated until the search step size reaches the specified value (tolerance), [36].
The greed search uses geometric steps for finding the optimal values of the parameters over the specified intervals by the user. Using this search the model is evaluated at many points in the grid, for each parameter, so this search method is computationally more expensive. This search uses three parameters: C, Gamma and P and for each parameter the user selects the lower and upper bound of the interval. Using geometric steps for searching the optimal values of the parameters, DTREG software uses cross validation for estimating how well the model matches the data. The performance of the SVM models depends mostly on the determination of the intervals of the parameters specified by the user and the author of the software recommends the greed search. For our model the greed search was selected. The software offers determination of two values for this type of search by the user: one value for the number of values which will be tried between the lower and upper bound of the interval for each parameter and for our model this value is chosen to be 30; the other value is the 'refinement iteration value' and for our model this value is chosen to be 2. If the value of the 'refinement iteration value' is the default value of 1 then the software makes only one grid search, but if it is 2 then the software does one finer level of search after the initial grid search [36]. For our model the intervals chosen for the parameters C, Gamma and P for the grid search and their optimal values obtained by the DTREG software are given in Tab. 6. Other authors have developed predictive models for energy consumption using neural networks.
Neural network models for predicting electric power demand at bioclimatic building were developed in [38] using multi objective genetic algorithm (MOGA). Additionally, these models were compared with models designed with statistical and analytical methods and the results showed comparable accuracy (but their models used only training data set with 2592 samples which is 0,8% of 318340 data samples used by the other comparable models). The accuracy of MOGA models was from 4,21% to 12,39% for MAPE, depending on the time of measurement (winter, summer or month).
The authors in [39] used NNs for developing a model for predicting solar radiation. They used the following predictor variables for the model: mean wind speed, location, mean pressure, month, mean temperature, mean relative humidity, mean duration of sunshine and month. They obtained a model which predicts solar radiation with accuracy 93%, with MAPE 7,3%.
The authors in [40] used SVM model using RBF kernel for prediction energy consumption in tropical region, using 4 commercial buildings in Singapore. They used weather data as input variables: relative humidity, global solar radiation and monthly mean outdoor dry-bulb temperature, they also collected mean monthly landlord utility bills for developing the model. The accuracy of the model was with error around 4%.
Last few years hybrid modelling has presented very promising results in predictive modelling. Muhammad Fayas and DoHyeunKim [41] proposed predictive model for energy consumption in residential buildings, developing hybrid model DELM (deep extreme learning machine) composed of extreme learning machine and deep learning machine. The authors compared the predictive results with ANN model and also with ANFIS (adaptive neuro-fuzzy inference system) and the results confirmed that their new model DELM was more accurate than the other two models. They obtained MAPE = 5,7% for one week prediction and MAPE = 6,5% for one month prediction with their DELM model. Deep learning and extreme learning are two relatively new methods of modelling having some very useful characteristic, for example extreme learning learns very quickly and has very high generalisation ability [42], deep learning method enables working with large data sets and can give very accurate predicting. The authors Caliskan and Cevik [42] have obtained very good results using extreme learning in determining noisy pixels and also protecting critical structural information that can be used for disease diagnosis (and also in civil engineering in analysis of flaws in materials for buildings).

CONCLUSIONS
Prediction of the consumed energy in a building is important not only for investors, but also for other participants in such a project (particularly in the early phases of the building project). It is equally important for the building users, because the energy consumption and related cost could have social and economic implications on the users needs.
A model for predicting the energy consumption in buildings using Support vector machine is proposed in this paper, using the predictive modelling software DTREG. The mean absolute percentage error of the model MAPE is 2,44%. The coefficient of determination R 2 , which expresses the global fit of the model, is 94,72%.
The limitations of the model are parameters which are influenced by the local climate characteristics, such as: value of heating degree-day, heating hours per day, building orientation, shadings of the building, etc., that can vary for each building independently and individually.
Nevertheless, the proposed model is a useful tool for all participants in building design. It can be used for prediction of energy consumption in the building and its related cost, especially in early design phases. The model can be also used as a support in the process of optimal building design selection. Furthermore, it can serve as a base for developing a model for predicting building cost for energy demand.
Future research can be focused on developing models for predicting energy consumption for different types of buildings, different ranges of the project budget, different areas, and also on hybrid modelling-combination of two or more models-which was proven as very promising in the last years.