Comparison of Clustered-based MNL and Duration Models in Departure Time Choice Modelling

One of the important decisions for travellers about each trip is the time that they depart from their point of origin. The aggregation of the departure times of all travellers forms the pattern of a temporal distribution of trips in a day and also peak periods of a day. In this paper, we applied the multinomial logit (MNL) on a choice set which was derived from cluster analysis, and we also made use of the duration models to estimate departure time for home-based work trips. For the duration models, the Kaplan-Meier and Cox proportional methods are used and then, the results are compared to cluster-based MNL model. This study is conducted in Mashhad city which has a population of about 2.7 million people. The results show the traveller's job and traveller's selected mode have a significant effect on his departure time choice. Comparison of the predictability power of these two modelling approaches indicates that the cluster-based MNL model in this case study is the preferable model.


INTRODUCTION
One of the important traveller decisions about each trip is the time that a traveller departs from his/her origin. Each person chooses the departure time independently and this choice is originated in experienced travel time, travel costs, marital status, family obligations, income, occupation and the flexibility of work, travel mode and the importance of activity [1]. The aggregation of the departure times of all travellers forms the pattern of a temporal distribution of trips in a day and also peak periods of a day. Particularly, the commuters' choice of departure time from home, which is a crucial factor in determining how congestion will change through the morning peak period [2].
In almost all urban areas, the temporal trip pattern shows that the main portion of demand start their trips in a short period and in this duration, public and private network capacity will be inadequate and therefore, congestion and delays will be inevitable. Additionally, analysis of various scenarios at different times in a day is essential for economic and environmental evaluation of decisions in the transportation planning. The preferred arrival times are commonly concentrated in a short period, which forms peak hours. Demand management policies try to spread the peak over a longer period and make a tradeoff between arriving early/late at the destination and thus spending less time in congestion or arriving on time but spending more time in congestion [3]. For performing this analysis, estimation of demand in different time intervals is considered.
Thus, modelling departure time choice is an important part of the travel demand modelling process. The major applications of this model are analysis and evaluation of different demand management policies (modification in the trip pattern) [4][5][6], dynamic assignment models, the response of travellers to ATIS systems, air quality modelling and activity-based models [7]. This paper is organized as follows: In section 2, we review the literature on departure time choice, then section 3 provides an overview of the duration model and clusterbased MNL model used in the analysis and in section 4 we present empirical results. Finally, section 5 emphasizes the findings of this study.

LITERATURE REVIEW
Different models are used for selecting the departure time including multinomial logit model (MNL); nested logit model; cross-nested logit model; mixed logit model; continuous time model; and ordered generalized extreme value (OGEV) model. Two key challenges are involved in DTC modelling. Small and Cascetta et al. [8,9] employed multinomial logit models to select a departure time. In 1987, Small [10] proposed an ordered generalized extreme value model, in which several departure time periods were considered in a nested design. Hendrickson et al. [11] used the data collected from Pittsburgh to examine flexibility of departure time choices for work trips. In these applications of the MNL model, a day is divided into a number of periods, and the MNL model is used to select a choice from the set of choices (periods). In the MNL model, systematic utility functions are defined as a function of socioeconomic attributes and variables related to trip purpose. Although the MNL model overlooks the similarities or correlations among adjacent periods and violates the IID assumption, it is still popular due to its closed form and extensive use [12].
Bhat [13] referred to the importance of the mode and departure time choices although the priority of researchers modelling travel demand was mode choice and little attention has been paid to the DTC. He introduced a nested structure by considering the mode choice in the higher level and the DTC in the lower level. The MNL model was used for mode choice, while the ordered generalized extreme value (OGEV) model was used to consider ranks of choices and selection of departure time. In another study, Bhat & Steed [4] used the "hazard-based duration model" to develop a continuous model of DTC for urban shopping trips. Habib et al. [7] used the "hazard-based duration model" to select the departure time. Application of this model for DTC does not necessarily represent behaviours.
Ettema et al. [14] developed a DTC model based on activity plans. They used the utility of participation in activities to model the DTC. Lim et al. [15] investigated the logit-based combined departure time and dynamic stochastic user equilibrium assignment (DDSUE) problem. Ben-Elia et al. [16] proposed a DTC model based on the latent preferential arrival time notion. Using this model, they developed a modelling framework for the calculation of the DTC by assuming a latent class of the preferential arrival time. Sasic & Habib [17] proposed a discrete choice model with a latent choice set. This model is one type of generalized extreme value (GEV) model.
To conclude, various modelling approach has been used in DTC models which are shown in Fig. 1 that shows the two major modelling approaches are discrete choice models and continuous models. Furthermore, some researchers have applied joint models, activity-scheduling models and changing departure time models for different problem definitions. The two major modelling approaches have some strengths and weaknesses. Regarding random utility maximization (based on rational decision maker), discrete choice modelling could reflect traveller's behaviour and therefore these models will predict the effect of change of major variables in any time horizon. On the contrary as a weakness, based on continuous nature of time, choice set generation will be a challenge. The time discretization leads to some intervals that form the choice set and travellers choose one of them but the departure times near the edge of two adjacent intervals are highly correlated and do not match with independence from irrelevant alternatives (IIA) assumption of discrete choice models.

Cluster-based Multinomial Logit Modelling
In the discrete choice models, each choice is favourable to a certain degree for a person and the choice with the highest level of utility will be selected. The utility function of the person named n that corresponds to choice i in the C n choice set is determined through the following equation.
where, V in is the deterministic term of the utility and is the random term of the utility, which denotes the uncertainty caused by the limited power of the analyst. The deterministic term of the utility is determined in modelling and the effect of different qualities on the selection of choices is identified.
In other research, we evaluated various clustering methods for choice set generation in MNL modelling of departure time [18]. The assessed clustering methods in that study are K-means, K-medoids, EM clustering, and Hierarchical Ward clustering method. The result of that study showed estimated models based on K-means clustering method have more accurate prediction, so Kmeans clustering method is applied in this research.

Duration Models
The duration models (sometimes called "hazardbased" models or "survival" models) have three main characteristics related to data. First, the dependent variable is the waiting time until the occurrence of a well-defined event. Second, observations are censored, in the sense that for some units the event of interest has not occurred at the time the data are analysed; and third, there are explanatory variables whose effect on the waiting time we wish to assess or control. These characteristics match departure time nature when we assume departing from the origin as an event. In this approach, departure time (T) is defined as a continuous random variable with probability density function f(.) and cumulative density function F(t) giving the probability that departure event has occurred by duration t. It will often be convenient to work with the complement of the c.d.f, the survival function.
An alternative characterization of the distribution of T is given by the hazard function, or instantaneous rate of occurrence of the event, defined as A parametric model based on the exponential distribution may be presented as follows: In this case, the constant α represents the log-baseline hazard since logλ i (t) when all the x's are zero. A large family of models introduced by Cox (1972) focuses directly on the hazard function. The Cox proportional hazards model is a semi-parametric model where the baseline hazard α(t) is allowed to vary with time: If all of the x's are zero the second part of the above equation equals 1, λ i (t) = λ 0 (t) so the term λ 0 (t) is called the baseline hazard function. With the Cox proportional hazards model, the outcome is described in terms of the hazard ratio [19]. In this study, we applied Kaplan-Meier (a nonparametric method) and Cox proportional method (a semi-parametric) for modelling and analysis of departure time.

Data Preparation and Explanatory Analysis
The data used for this analysis is the 2008 survey conducted in Mashhad City with a population of about 2.7 million. This city is the second most populous metropolitan city in Iran [20,21]. The sample size was about 3% of the population and household, household members, vehicle, trips and activities data were gathered. About 62 percent of Mashhad's residents are students and workers which comprise 51 percent of total daily trips and 84 percent of morning peak period trips. Therefore this group has a major role in forming morning peak periods. In this research, we considered those trips whose goals were job, education, medical care and going to offices. In addition, this study assumes parents choose departure time for children under 10 years old and therefore we filter trips of under 10-yearold travellers [22]. The departure time of travellers that was used for this analysis was between midnight and 13:00. The final sample contains 12467 individuals, 70 percent of which was used for model calibration and the rest was used for forecasting performance evaluation. Appendix1 shows descriptive information on work departure time which is based on the method of Kaplan-Meier.

Cluster-based MNL Model
In this study, we used the result of other research from authors in which various clustering methods were evaluated for choice set generation. The result of that study showed that estimated models based on K-means clustering method have more accurate prediction. In the next step, we use different combinations of variables in modelling and compare their predictability power. Finally, the best model is presented in Tab As we expected results of coefficient (parameter) estimation show traveller's job has a significant effect on departure time choice. For example, salespersons show a tendency to depart between 7:15 and 10:45, also professors, physicians, and teachers are prone to start their trips after 7:15. This is predictable because of different start time and activity duration for various jobs. Moreover, the effect of selected mode on departure time choice is unsurprising. The public transport users are more likely to depart in a period between 7:15 and 8:18. The pedestrians and bicycle riders are more reluctant to depart before 6:10 than the period between 6:10 and 10:45 and they are more inclined to exit homes between 10:45 and 13:00. The long travel times will dictate choosing time periods before 6:10 for morning work shifts and the period of 9:30 to 10:45 for afternoon shifts that are shown with positive coefficient in Tab. 1. In this research, the multinomial logit models were estimated in Nlogit 5.0 (Econometric Software, Inc. NY, USA.) [23]

Semi Parametric Model: Cox Proportional Hazards Model
In this section, we introduce Cox proportional modelling steps (as a semiparametric survival model) briefly. These steps are the selection of covariates, fitting a multivariate model, considering interaction variables and finally checking Cox proportional model assumptions.
For the selection of major covariates, a thorough univariate analysis of the association between departure time and all important covariates has been conducted. The result of this analysis revealed that the gender and use of private car variables were not effective in departure time but other variables have a significant effect. The major covariates in departure time are the use of public mode, walking and riding bicycle, travel time, the age of traveller, being a parent, and four groups of traveller's jobs.
Next, various multivariate models are fitted and the best one was selected. This model contains walking or riding bicycle dummy variable, travel time variable, the age of travellers, being a parent dummy variable, and two groups of travellers' jobs. In the next step, the effect of various interactive variables on the model is investigated and multiplicative variable of travel time to two job groups is entered into the model. The concluding model is presented in Tab. 2. Once a suitable set of covariates has been identified, we checked each covariate to ensure that the proportional hazards assumption is valid. As in all regression analyses, some sort of measure analogous to R 2 is needed. Schemper and Stare show that there is not a single and simple measure to assess the goodness-of-fit of a proportional hazards regression model [24]. Hosmer & Lemeshow recommend the following as a summary statistic for goodness of fit [19]: where L 0 is the log partial likelihood for the intercept-only model, L M is the log partial likelihood for the fitted model, and n is the number of cases included in our analysis. The positive value of the coefficient for "parent" variable shows that parents depart earlier in order to accompany their children to school or day-care. Also, the positive value of "travel time" coefficient shows longer travel time increase likelihood of sooner departure time due to the importance of on-time arrival for work trips. The application of Cox method on 30% of data approximately matched with the application of Cox method on 70% of data. The model predictions were different in comparison with observed data, but the application of Kaplan Meier method on data represents a more accurate prediction. All variables used in MNL and Duration models are described in Tab. 3. Trips that use a public transport equal one and others zero. AC_ MODE Travelers who use a bicycle or walking in their trip equal one and others zero. AGE18P Traveler older than 18 equal one and otherwise zero. JOB_G1 Travelers who are a workman or army equal one and others zero. JOB_G2 Travelers who are a teacher, professor, physician or nurse equal one and others zero. JOB_G3 Travelers who are salespersons equal one and others zero. Parent Travelers who are parent equal one and others zero. TRAVEL_T The travel time of the trip.

Comparison of Cluster-based MNL Model and Duration Model
In order to find the appropriate model, we compared predictability of Cluster-based MNL and duration model. For this purpose, we used 30% of data which have not been used in model calibration. The Cluster-based MNL model predicts a probability of departure time in each one of 6 intervals, however, duration model forecasts departure time continuously. In order to provide the possibility of comparing two models, we integrated duration model results in 6 predefined intervals in the cluster-based MNL model. The probability of each interval is predicted for each person based on estimated model and then using Monte Carlo simulation the selected interval is reproduced. In the next step, the share of travellers in each interval was determined. On the other hand, using estimated Cox proportional duration model the probability of departure in each interval has been estimated. Also, non-parametric Kaplan-Meier analysis was done for predefined intervals and the results are shown in Fig. 2. As can be seen, in this figure the probability of departure time at each interval is presented. The blue column represents observed data (30% of data which have not been used in calibration) and models have to reproduce this column. It can be seen that Cox model underestimates for intervals between 6:15 and 9:30 although it overestimates for other intervals. Fig. 3 shows the error of models prediction in a radar diagram and as can be seen, Cox model has more deviation from the observed value in comparison to others.

CONCLUSION
The results of this study show that the cluster-based MNL approach has predicted departure time for work trips accurately compared to the duration model in the case study of Mashhad city. In addition to predictability power, the sensitivity of this model to socioeconomic characteristics of travellers made it more reliable. Additionally, our use of six time periods is more disaggregate than a simple peak versus off-peak breakdown and therefore better serves the purposes of evaluating congestion management policies and air quality modelling.
The continuous modelling of departure time removes the necessity of discretization, which is inevitable in MNL, and it also provides fine temporal resolution. Nevertheless, this investigation shows that if time partitioning is done properly, the predictability power of MNL will be more than the duration model's predictability.
We also have to note this limitation that, rounding departure time to 5 or 15-minute periods by travellers in surveys resulted in partial violation of continuous time assumption in duration models.
As we expected, results of coefficient (parameter) estimation indicate traveller's job has a significant effect on departure time choice. This is predictable because of different start time and activity duration for various jobs. Moreover, the effect of selected mode on departure time choice is unsurprising. The public transportation users are more likely to depart in a period between 7:15 and 8:18. The pedestrians and bicycle riders are more reluctant to depart before 6:10 than the period between 6:10 and 10:45 and vice versa, they are more prone to leave their homes between 10:45 and 13:00. The long travel times will dictate choosing time periods before 6:10 for morning work shifts and period of 9:30 to 10:45 for afternoon shifts that are shown with positive coefficient in Tab. 1. The positive value of the coefficient for "parent" variable shows that parents depart earlier in order to accompany their children to school or day-care. Also, the positive value of "travel time" coefficient shows longer travel time increases the likelihood of sooner departure time due to the importance of on-time arrival for work trips. It can be seen Cox model underestimates for intervals between 6:15 and 9:30 although it overestimates for other intervals. Logit Cox-Proportional-30% Kaplan-Meier-70%