Building and Calibration Transport Demand Model in Anamorava Region

The main objective of this research is to develop a model and to calibrate it in order to apply it to transport forecasting for Anamorava region. The synthetic model has been developed, and it is composed of a transport network model and a demand model, which is enabled using PTV Visum software as well as using the following variables as input: number of residents, number of people employed, working places available as well as the volume of vehicles entering and leaving certain locations surrounding the Anamorava region at "peak hour". Required coefficients are used for converting the traffic volumes from 12 hours to 24 hours expressed, such as AADT. As a criterion, for initial verification R2, RMSPE, Percentage Deviation and Regression parameters are used. Then, a calibration of the demand model is conducted using the TFlowFuzzy algorithm by employed GEH test, comparing it with the observed data of traffic volumes accomplished simultaneously at some locations inside and in the vicinity of this region for time intervals of 12 hours on two different days in one week in May 2016. In order to fulfil all the criterions, it has been found out that the final model may be used for transport demand forecasting in the future in this region.


INTRODUCTION
Nowadays, transport development is directly connected to the economic and the social development of a given country or region. The rapid advancement of IT in the last years has produced possible orientation of engineering studies as precondition of transport development and perspectives towards the progress of transport demand models [1].
Transport demand models are classified as: macroscopic, mesoscopic and microscopic [2]. They can also be classified as aggregate and disaggregate levels. Based on the modelled period, there are models which represent the period of 24 h and models for every other significant period of a day or so-called "peak hour models" [3].
This research relates to 24-hour, macro type of models because it involves regional as well as aggregate level due to the collection of data according to variables at the zone level. These types of models are used for strategic studies and trips, and they are expressed as Annual Average Daily Traffic (AADT) volumes in the road transport network [4].
This study represents a possibility to develop a four step (synthetic) model in order to estimate transport demand using data on macro variables as an input and its outcome is the generation of initial origin destination (O-D) trip matrix for traffic volumes. The next step is to compare this matrix to OD matrix trips based on observed trips through traffic counting. An initial matrix OD depends on economic and demographic macro variables, the design of network traffic as well as on traffic volumes in peak hour at incoming/ outgoing of the region is used as input. The observed OD matrix of trips directly depends on the accuracy of counting and interviews accomplished with road side traffic participants and their processing. The adjustment of the initial OD matrix to an acceptable level is done using various statistical indicators and the further correction procedure is continued using GEH test and TFlow Fuzzy technique which is a special tool in PTV Visum software.
The key purpose of this research is to develop and to calibrate a road transport demand macro model through modelling techniques by PTV Visum software at the Anamorava region. At the same time, the focus is oriented towards the application of the methodology of data input to be used as well as steps to follow when development and calibrating a synthetic model and its relevance to transport demand forecasting. Another purpose concerns sensible transport planners and decision makers undertaking studies of this nature.

LITERATURE REVIEW
Pursuant to the literature referred, the assessment of initial OD matrix is done based on the following two approaches [5]: a) Directly estimated are usually known as "origindestination surveys" which require large surveys in field studies such as: roadside, household and terminal surveys; b) Indirectly estimated where OD trips are correlated to a number of known explanatory variables to the study area. This includes: gravity approach, network equilibrium and entropy maximisation.
The first method is expensive, time consuming and with the tendency to loss validation in the short term, while the second one is less expensive because it requires lowcost data, less time and less man power [6].
Gravity approach assessment of OD matrix is used by some researchers such Robillard (1975), Hogberg (1976), Tamin and Willumsen (1989) and Willumsen (2003). These models work relying on linear and non-linear regression [7]. Its disadvantages can be that, since it requires zonal data for calibration of parameters on the transport demand, and also it cannot be accurate to handle external-external trips [8].
Network equilibrium approach is used in cases where the number of alternative roads between two points is lower and the congestion does not play an important role in selecting the road to follow. This is more suitable for interurban roads. Selection of static estimators for O-D matrix by researchers, done according to three ways [9]. Maher (1983), Bell (1983), Casceta and Nguyen (1988) have done it relying on Maximum Likelyhood (ML), while Casceta et al. (1984) according to Generalized Least Squares (GLS) and Maher (1983) by Bayesian Approach (BA). These models have conceptual differences compared to gravity one because it uses impedance of travel time, and it stratifies social economic variables in order to determine the probability of trip completion. These models according to this approach bring these variables close to utility or impedance function and require more information for assessment and computational time [10].
Entropy maximization approach consists of minimized information and maximal entropy [11]. In order to define OD matrix of trips there is a need for getting the lowest numbers possible of information; when receiving this information it is not possible from counting of traffic. The entropy is an expression from physics and it relies on the supposition that many similar matrices of trips OD have more "microstates" relation. Models set up according to this approach are affected by the total number of trips and in most of the cases they do not take into account variables such as population and number of vehicle ownership so that makes them not so suitable for long term planning applications [12].
This approach has been used for the first time in transport field by Wilson (1970). Wilson and MacGill (1977) have proven that this method is partially useful to the system with bigger number of components with complex and disorganized appearance. Their theory was confirmed also by Hogberg (1975) [13]. This approach has been used and upgraded later by Von Zylen and Willumsen (1977) in order to assess OD trip matrix based on traffic counting [14].
Estimation OD matrix according to the above approaches have a serious weakness because traffic counting used as input have an inaccuracy and it is subject to sampling errors. In this regard, Xu and Chan (1993) were pioneers in assessing OD matrix by applying fuzzy weights [15]. A year later Rosinowski's (1994) developed an algorithm named TFlow Fuzzy (TFF) which traffic counting in the model considers imprecise value based on fuzzy sets theory functioning according to route choice proportion for each link calculating only one time and keeping constant throughout the process of calculation [16]. Reddy and Chakraborty (1998) developed a fuzzy bilevel inference applying maximum entropy in the upper level using assignment algorithm [17]. Shafahi and Faturechi (2009) proposed a new method for estimation of fuzzy OD matrix which takes into account link data as a fuzzy value which varies to the certain bandwidth [18]. Yousefkia et al. (2013) proposed a modified TFlow Fuzzy (TFF) for the estimation of OD matrix taking into account counting like not precious based on fuzzy sets theory. This approach has been used by Mamdohi et al. (2013) in intercity roads of Teheran province as an uncongested network [19]. This method is considered very practical and very realistic because it works according to route choice proportion for each link within the road network in which proportion is updated successively in each iteration.

METHODOLOGY OF RESEARCH
For the development and the calibration model for transport demand forecasting, study area is determined as well as research techniques and procedures, development synthetic model, establishment OD matrix of actual trips, preliminary verification of results, calibration of model and conclusion.

Study Area
The development of the model has been done for Anamorava region which is situated in south-east part of Kosovo, which is composed of six municipalities (Gjilan, Partesh, Vitia, Kllokot, Kamenica and Ranillug) with 166 settlements and a total surface of 1331 km 2 . It is surrounded by other regions of Serbia in east and by Macedonia in the south [20].

Research Technique and Procedures
Modelling is a process in a number of iterative steps, consisting of: design, verification, calibration and validation [21].
This research begins with the design of an initial macro model at the aggregate level of the region which includes the preparation of graphical substrate, geometric elements of the observed segments on the network, establishment OD matrix of trips through variables identified with impact in generating it etc.
Verification is a visual check of the model's operation and a confirmation of the logic of the modelling results. The initial verification of the model is a very important check whether the model settings are accurate and realistic as well as whether the results are within in range of expectations.
Apart from visual, verification can be done also by statistical indicators such as determination coefficient (R 2 ), Relative Mean Square Percentage Error (RMSPE), Percentage Deviation (PcrDev) and regression parameters. This makes it possible to detect incompliance of model at early phase of establishment and in this way bigger errors, which could potentially appear in the latter phase, may be avoided.
The next step is known as calibration process. According to the Highway Capacity Manual (HCM 2010), calibration is the process of comparing model parameters with actual data obtained by counting and observed in the road network [22].
The aim is to reduce the difference between the output results of the simulation model and the data obtained by counting. It is important to identify the key input parameters that the model uses in the simulation and their impact on the output modelling results.
The procedure methodology of calibration is not finally approved but it is thought that calibration of model is the most significant requirement for the success and application of results in practice in line with new circumstances. The use of models with default values as input or by minimal requirements of calibration will bring unreal results of the model and it will contradict it.
The last step is the validation of the model which has to do with assessment of calibration process comparing model values calibrated with real data of traffic. Being aware of the impossibility to get new data in the lack of Automatic Traffic Counters (ATC), this step is ignored.
The model development according to this approach for transport demand in Anamorava region is given in Fig. 1.

Synthetic Model Development
In order to develop a synthetic model to estimate, transport demand relies on a demand and offer model [14]. For a demand, data for macro-variables and traffic volumes are required while for offer side data for current facility of road network are needed. Data for variables such as resident population, number of employed persons, number of working places have been collected through competent institutions for year 2016 [23], while traffic data have been obtained through counting. In order to develop a model, a special "meta model" is used in PTV Visum software, which is dedicated for modelling in the transport field [24], using as input data presented in the following Tab. 1. Furthermore, there is a need to conduct other actions such as zoning and additional attributes related to types of trips, placing road network according to road category and their connection to centroids through connectors etc. Zoning is split into six inner zones and seven outer zones. In the inner zone there are given municipalities of the region which include Gjilan, Vitia, Kamenica, Kllokot, Partesh and Ranillug and the outer part includes 101 -Novoberdo, 102 -Llabjan, 103 -Lipjan, 104 -Ramjan, 105 -Kacanik, 106 -Preshevo, and 107 -Bujanovc, which are presented in Fig. 2.  The main road network in which the traffic volume is dominant is taken into consideration. Categorisation of the road network as well as the capacity on lanes per hour relies on literature [25]. In this regard, based on traffic volume expressed, such as AADT, this road network is composed by main, regional and local roads, as presented in Fig. 3. Each road network is expressed through nodes and links which (when connected) create a transport network. Each link is associated thereafter with defined parameters, such as: capacity per lane, number of lanes, length, speed restrictions, allowed types of transport, etc. In this regard, the road network is also composed by connectors related to so-called centroids.
One of the most important actions in the zones is inputting data attributes. Attributes are a key component in creating the model [26]. For the normal functioning of traffic attributes of networks, elements must be defined for all activities and trips for each traffic zone in the region, which comprise 10 attributes inserted in PTV Visum software, as presented in Fig. 4.  In the computer presentation of the models, zones are shown as a single point at which attributes are concentrated and which are called zone centroids. Centroids are joined with networks through connectors of centroids, which include time, distance and cost connecting the transport system with trips from origins to destinations for each zone. The position of centroids is done automatically in the centre of gravity of the zone. Centroids and their connectors play an important role in maintaining the quality of public and private transport in the road network trips from origins to destinations. Connectors connect zones to the link network. They represent the distance to be covered between a zone's centre of gravity and the connector nodes. The number of connectors depends on the load of traffic in the area, and the routes represent the traffic volumes. Despite entities' connectors for all 13 zones (six municipalities and seven outer zones), which direct traffic volume incoming and outgoing the region and connect it to the remaining Kosovo municipalities and territories of other neighbouring countries should be taken into consideration. The connectors are presented in Fig. 5 and are shown in yellow colour.
With the intention of collecting data on traffic volumes incoming (orange) and outgoing (green) this region, roadside counting traffic volume for peak hour 16:00-17:00 was made on 15 th January 2016, assisted by students of the Traffic and Transport Department at the University of Prishtina. Counting was done manually, recording each vehicle in line with the type of vehicle going through a road section as well as registration plates in each direction. The summary data are presented in Fig. 6. In this regard, it has been necessary to define outer zones, which play an important role. Moreover, for the proper functioning of the model it has been required to create a matrix for transiting this territory, particularly for passenger and good vehicles. Values on the intensity of transit traffic between outer zones are shown in Tab. 2 and Tab. 3. It is known fact that various categories of vehicles have a different influence on traffic volume. Therefore, it is done equivalent of vehicles per unit in order to have a uniform traffic influence. Obtained values are multiplied with an average coefficient K e = 1,4 in order to produce the Passenger Car Space Equivalent (PCSE). This coefficient is used as recommended by literature [27]. Technical Gazette 26, 6(2019), 1784-1793 Since counting was conducted on 15 th January 2016 during peak hour, it was necessary to find out AADT as well as nonlinear seasonal and weekly coefficients. Based on the results obtained by ATC administered by the Road Directorate of Kosovo which were fixed in this region, the results indicate that the average value of this coefficient is K s = 1,34. Using the same data, it was found also that the weekly nonlinear coefficient for Friday was K w = 1,01. In this regard, taking into account counted traffic volumes that were converted for 24 hours, multiplying them with the above coefficients, the value for traffic volume incoming and outgoing this region [14] was found. These values can be obtained by using Eq. (1) where AADTis the annual average daily traffic, VOLis the traffic volume, K e -is the passenger car space equivalent, K s -is the seasonal coefficient and K w -is the weekly nonlinear coefficient. When generating trips, six groups of activities that are embedded in PTV Visum software, presented in Tab. 5 are taken into account [28].
Balancing of production and attraction of trips is done by gravity model for a single constrained which is taken into account in calculating distribution of trips. The parameters assessment of gravity model is done by the realization of iterations using Tanner deterrence function of travel time as input in processing distribution as expressed in equation Eq. (2) [21].
where t ij -is the value of travel time from zone i to zone j; U ij -is the travelling utility which is Factors of generation for production and attraction for each one of the six purposes of trip regarding the parameters a, b and c are determined as presented in Tab.5. Applying iterations for every step, the matrix is established which approximate is given by distribution based on trip distance function. Initial assignment of trips takes place afterwards in the road network in line with equilibrium assignment technique according to Wardrop's first principle. As the result of completion of all previous activities and modelling of trips by PTV Visum software according to four step (generation, distribution and assignment) traffic volume is generated and classified according to road categories expressed like AADT, as presented in Fig 7. Modal split here is ignored because vehicles are converted into unit vehicle.

Establishment O-D Matrix Based on Observed of Trips
It can be easily understood that trips generated according to the synthetic model do not describe actual trips from every zone to any other zone, respectively traffic volumes to the main road segments due to the lack of accuracy of data, the lack of appropriate road designs in software or other attributes. For this purpose, research has been undertaken in order to establish an OD matrix of trips based on observed actual of trips using counts and interviews. The establishment of an OD matrix is the most important part of this research, because results gained serve for verification and calibration of model developed according to synthetic model at AADT level. The additional reason to undertake this research is also the lack of the study which should have been completed in the past in this field for this region or else called "prior matrix" which could be used as starting point for this analysis. This is made possible by completing manual counting and interviews in roadside in some selected locations in the main road network with higher load of traffic volume. Forms and special questionnaires have been prepared in advance in order to complete counting and interviews, which contain the date and the time, the location of the interview, the occupation transport mean used for the trip, the direction of trip (OD), the purpose of the trip, the trips frequency, the number of passengers in vehicle, etc.
Depending on the origin and the destination of trips, internal-external, external-internal, internal-internal and transit trips have been taken into consideration. The trips within one zone have not been a subject in this research. Counting and interviews were completed simultaneously on 18.05.2016 and 21.05.2016 and continuously for 12 hours from 07:00 a.m. until 19:00 p.m. As a result, there were 11 523 interviews completed, comprising 19,43% of total traffic volumes, and also showing 59 317 vehicles counted manually. Interviews were completed according to the "face-to-face" method, stopping at the maximum number of vehicles and reflecting the questionnaire answers received from traffic participants. Even here, counting and interviews were conducted assisted by approximately 100 students of the Transport and Traffic Department-University of Prishtina who were trained in advance and supported by Kosovo Police. The exact positions of locations used for manual counting and interviews are given in Fig. 3. After the research has been finalized, the collection of counting and interviews has been completed, using MS Excel. Then it has been possible to establish the OD matrix of trips for type of passenger vehicle as unit. It was not possible to continue further without conducting additional calculation coefficients in relation to K e , K int and K convert(12 h/24 h) in order to obtain a final matrix of daily trips by unit vehicle. This is the reason why vehicles have been converted into units using the coefficient K e = 1,4 as reflected in Tab. 4. Also, it is not possible to interview every vehicle driver and passenger. In order to find the number of trips realized by a vehicle within the period of 12 hours the coefficient of interview K int was used . This coefficient is obtained by dividing the traffic volume and the interviews conducted within the given period of time for each location separately, as an average of two days as given by Eq. (3).

int 12
where K int -is the coefficient of interview; VOL 12 -is the number of vehicles counting in 12 hours interval and I 12 -is the number of interviews in 12 hours interval. Summary results regarding this proportion and interview coefficient for each location are given in Tab. 6. By using this coefficient, it has been possible to establish OD matrix of trips for the actual total number of vehicles for a time interval of 12 hours, by applying Eq. (4): matrix(12 h) 12 int (vehicle/12 h) I K = ⋅ OD (4) where OD matrix(12 h) -is the origin-destination matrix of trips realized by vehicle in time interval 12 hour. After using data for traffic volumes by ATC for an interval of 24 hours, which are fixed in four locations in this region, it has been possible to obtain a converting coefficient of traffic K con from 12 h to 24 h by Eq. (5). 24 con 12 where K con -is the converting coefficient of traffic from 12 h to 24 h, VOL 24 -is the traffic volume by ATC counting in time interval 24 h and VOL 12 -is the traffic volume by number of vehicles counting in time interval 12 hour. As using data for four locations with ATC, the average value of this coefficient is K con = 1,5.
With regard to the application values of traffic volumes and value of coefficients gained K e , K int and K con , the final OD matrix of trips is obtained for vehicles for the period of time 24 h, by Eq. (6). matrix(24 h) 12 e int con (vehicle/24 h) (6) where OD matrix(24 h) -is the origin-destination matrix of trips realized by vehicle in time interval 24 hour. The OD matrix has been established with dimensions 13×13, like an average of two days when counting and interviews are done with an intention not to require application of a weekly nonlinear coefficient of trips.
This OD matrix includes all municipalities of Kosovo as well as the municipalities of Bujanoc and Preshevo due to their vicinity to this region and there has been a high impact on generating trips. The obtained results according to zones which are incorporated in PTV Visum software are summarized in Tab. 7. After the above methodology, taking into consideration all values gained this OD matrix is imported directly into to PTV Visum software. Modelling procedure here is simple because it does not require a complicated procedure of entering data compared to the synthetic model [29]. Modelling of traffic volumes conducted for unit of vehicle category in the current road network of this region is shown in Fig. 8.

COMPARSION RESULTS
Once the synthetic model is developed and the OD matrix is established, the comparison of traffic volumes modelled and the ones observed is done for every location in the main road network, as presented in Tab. 9. The comparison of traffic volumes in magistral roads is done by the following indicators: R 2 > 0.85, RMSPE < 20%, PrcDev < 20% and regression line parameters [30].
Given indicators are useful to measure the scale performance in order to approach results and to assure an indication of the error of a model. This way is considered as quick correction. Initially there have been significant differences of traffic volumes in certain parts of the road network but after applying the process of balancing based on production and assign by equilibrium method to the traffic network it has been made possible that results are approximate to the certain level of acceptability according to iteration approach. Furthermore, based on the value gained by indicators respectively on the less relative error in percentage the synthetic model is selected with best performance and it describes closely the current situation of trips in the road network as presented in Tab.8. This can be justified also by the value gained on the coefficient of determination R 2 = 0,8642 > 0,85, which means that in the regression equation y = 0.9788x + 217,82 there is a good spread of data which are displayed on the Fig. 9 and they indicate that trips by model best fit with OD matrix which are based on the observed counts.
Similarly, referring results presented in Tab. 8, also by two other indicators RMSE = 17,46% and PcrDev = 15,87 it is clear that traffic volumes generated by synthetic model compared to the ones according to OD matrix have much less deviation and get an acceptable level less than <20%. Results gained above are a clear indicator that there is no need to go back and adjust input data of the model or also its parameters of functioning as presented by algorithm in Fig. 1. With the intention to approximate and further correct results of the model, there is a requirement to calibrate it using GEH test.

Figure 9
Demand model according to counting and interviews

CALIBRATION MODEL
The calculation of values of the GEH test is similar to statistical method χ 2 (chi-square), which takes into account the difference value obtained according to the model vs. observed. GEH test can be calculated by Eq. (7), as follows: where O i -is the value obtained by observation, M i -is the value obtained from the model. The GEH test can be applied for each segment as well as for all segments together. Based on [31], the criterion for model calibration of transport model is fulfilled if GEH < 5 on 85% of all location taken into account.
Firstly, the OD trip matrix has been built, and regarding it assignment to the links in traffic network, the traffic load of links fij has been obtained. Transport demand is usually expressed through the OD matrix, but due to the exclusion of all non-zero OD trips within the most convenient areas, the vector form should be used, as shown below in Eq. (8) where v ris the value of traffic volumes on links expressed in vector form. The trips of any OD matrix pair provide a certain share to each traffic count location. Therefore, the observed volumes correspond to the sum of all OD matrix trips which are carried out on this link. Thus, by taking it into account, the general form is given by Eq. (10): where Ais the pair of OD matrix share on a certain link, fis the OD trip matrix, vis the traffic volume of links. For OD traffic observed values, A is especially constant. As a result, very large number of combinations in the OD matrix values is obtained according to the directions f ij that should match best with the counting performed for the traffic volumes. In order to select the best matrix, out of all possible matrices according to [33], by using the method of entropy maximisation, the objective function q(f) given by Eq. (11) and Eq. (12) is applied: So that A f v ⋅ = (12) where ij f  -is the traffic demand on one OD pair of the initial OD matrix, f ij -is the traffic demand on one OD pair of the corrected OD matrix, pis the number of nonnegative elements of the OD matrix.

Figure 10
Fuzzy logic set [32] With regard to the description of the above procedure used to find the function of maximum q(f), so as to find the best fit, there are also shortcomings because the observed traffic figures expressed with symbol v and it can be treated as fuzzy sets, as they can express non-linearity from the area of origin up to 20% from day to day [32]. Considering the case of the correction matrices of OD, observed traffic values should be replaced by Fuzz Sets s v  with varying bandwidths, as given by Eq. (13) and Eq. (14) and presented in Fig. 10.
Comparing it to simple intervals representing observed as Fuzzy Sets, allows favouring observed values close to the mean value. Values which are closer to the bandwidth border are also accepted if the decrease in the fuzzy membership function is more than the offset by the increase in the entropy function. Unfortunately, in general, fuzzy membership functions do not have the analytical properties (continuous, differentiable) that are required for an efficient solution algorithm. The counted values of the traffic load are substituted by fuzzy set q  with the bottom value of deviation s and the upper value of deviation s [32]. Now, the problem of maximisation takes the form: where: , v vare the maximum and the minimum of Fuzzy set, , s sare the deviations from the observed traffic count, ŝ v v = − -is the upper bandwidth for observed traffic count, ŝ v v = − -is the lower bandwidth for observed traffic count.
Incorporating slack variables s , s into the weighted entropy maximisation gives preference to matrices which achieve A f v ⋅ = within the bandwidth "as good as possible". The formulation that produced the problem with the help of the fuzzy set is structurally identical to the original one of the problems which had the exact limitations. The table below shows that the GEH index for all locations, except for segment No. 28 is lower than 5 (GEH < 5). The total of seven locations gives 14 directions corresponding to this criterion. Thus, it may be concluded that the traffic model is suitable. Due to incompatibility, TFlowFuzzy is used for calibration in which correction is carried out in these seven links, as presented in a schematic way through Fig. 11 [34].
Tabs. 9 and 10 show general assessment, e.g. GEH test values, with absolute difference of traffic flow modelled and that which was counted, expressed as AADT levels.