CORRECTING AND COMPLEMENTING FREEWAY TRAFFIC ACCIDENT DATA USING MAHALANOBIS DISTANCE BASED OUTLIER DETECTION

Subject review A huge amount of traffic data is archived which can be used in data mining especially supervised learning. However, it is not being fully used due to lack of accurate accident information (labels). In this study, we improve a Mahalanobis distance based algorithm to be able to handle differential data to estimate flow fluctuations and detect accidents and use it to support correcting and complementing accident information. The outlier detection algorithm provides accurate suggestions for accident occurring time, duration and direction. We also develop a system with interactive user interface to realize this procedure. There are three contributions for data handling. Firstly, we propose to use multi-metric traffic data instead of single metric for traffic outlier detection. Secondly, we present a practical method to organise traffic data and to evaluate the organisation for Mahalanobis distance. Thirdly, we describe a general method to modify Mahalanobis distance algorithms to be updatable.


Introduction
Nowadays, increasing road traffic is causing more accidents and it gains more attention from authorities.Therefore, a vast number of traffic monitoring devices have been installed to collect traffic data.As a consequence, a huge amount of traffic data has been archived, sometimes together with related information such as accident records [1].However, patterns and relationships between the traffic data and accident records are invisible.To discover hidden information, the stored huge amount of data can be investigated using data mining techniques [2,3], especially supervised learning [4,5] for analysis and prediction.To do this, we need to know related traffic data given an accident record, i.e. labelled data is required.Nevertheless, accident records are neither accurate nor complete.In many authorities' databases, accident occurring time is estimated by the witnesses or drivers and duration of accident time is often missing.What is worse is that the direction of road where accident happened is also missing.Those problems make it impossible to know which archived traffic data is related exactly.Messy accident information is hard to be used directly to label traffic data in supervised learning [6]; even semi-supervised learning needs some initial labels [7].Thus, it is necessary to correct and complement accident records.
The procedure of manual correction and complementation of accident records requires repeated actions and consumes a lot of time [8].Besides, it is hard to make decisions regarding accident occurring time, duration and direction from millions of raw traffic values without any external help.To regain the accident related information, in this work we developed a system to help correct and complement accident records.The system will calculate accident occurring time, duration and direction using the accident detection technique that we propose.
Accident detection techniques can be separated into two categories [9].The first category provides "recognition" of accidents if the monitored traffic situation is similar to previous accidents situation.The second category discovers observations that are significantly different from typical values and this procedure is called outlier detection [10].
The first category includes conventional methods such as McMaster [11] as well as novel machine learning based classification methods [12].The conventional methods usually require physical location characteristics like shape of the roads or multi-device data as part of the input, and machine learning based classification requires labelled data.
The second category outlier detection compares one observation with normal situation or prediction.There are two types of outliers, global outlier and local outlier [6].Global outliers are considered as outliers regardless of the concept, whereas local outliers are concept-related.For example, 40 °C temperature is normal in India, but outlier value in Sweden.Much research provided methods to detect outliers while assuming outliers as global for transportation [13].However, efforts made to adapt local outlier detection for transportation are insufficient.
Further, existing research preferred to use single metric and its threshold to detect outliers.For example, [9,14] compare monitored flow rate with its threshold.In [15] researchers use speed to do comparison and detection, and [16÷18] use density (occupancy).As there are fundamental differences among characteristics of different roads, procedures considering only single metric are unsuitable.For example, in [19] flow speed is influenced more than flow rate during breakdown events.In [20] events change flow rate suddenly but maintain flow speed.Some data also shows that during certain events, both flow speed and rate can change.However, we cannot find existing research which considers this kind of change in data.Though some work has been done with regard to transit fundamental diagram [20], none considered differential time-varied fundamental diagram.Gonzalez [15] uses the term "level of change" to describe one kind of change, but this change detection is based only on speed.Mentioning differential calculation, the most popular algorithms are ARIMA-like algorithms.Nevertheless, they can handle only single variable and the variable needs to be stationary [21,22].Moreover, their related vectorised versions require more assumptions which are not practical [23].
In this paper, we propose to use Mahalanobis distance [24] (M-distance) based analysis to detect accidents.Mdistance is a general distance used in multivariate analysis and has been widely used for detecting outliers [25÷27].However, few works used it for detecting traffic accidents.One possible reason might be that few researchers are considering more than one metric together, so M-distance is not necessary.Even if multivariate data is considered, another problem is Mdistance is only for multivariate data that is clustering like filled ellipses, i.e., data is normally distributed [24,28,29].However, traffic data in traditional speed-flow fundamental diagram [30] cannot hold this assumption as shown in Fig. 1.The outline of the paper is as follows.In the second section we propose the method to organise traffic data so that it is suitable for M-distance.The third section describes the methodology that detects accidents and complements accident information based on M-distance analysis.The experimental results using real world data are presented in the fourth section.In the fifth section we present the implemented system followed by conclusions in the last section.

Data pre-processing
In this section we propose a general method that can pre-process traffic data to be suitable for the main methodology in the third section.

Metrics selection
Flow rate, speed and density are three fundamental metrics in traffic engineering [30].It is possible to calculate one metric given both the other two, so two metrics should be considered at the same time.Previous research prefers to consider only one of them within time domain, usually flow rate or speed.We use both flow rate and speed within the time domain.

Time-separated data organisation
As shown in the density plot of speed-flow fundamental diagram (Fig. 1), data instances (points) are clustering mainly as day time and night time parts with transits between them.It is unsuitable to use M-distance directly in this conventional diagram.One solution is to find a time period to organise data according to time of day.The results are time-separated datasets.The time period should separate different data points into several datasets.Data in each dataset are clustered as filled ellipse.Additionally, the separation should keep the differences between neighbouring datasets ignorable (insignificant) to avoid breaking continuous time series data, otherwise the time period should be changed to a smaller value.
Below is a mathematical description of how hypothesis testing [31] can be used to evaluate timeseparated traffic data.
H 0 (null hypothesis): there is no difference among all datasets.Consequently, its competing hypothesis H 1 (alternative hypothesis) is: there are differences among all or some datasets.
In this hypothesis testing, we are considering twodimensional data with flow rate  and speed.Both  and  are normally distributed in datasets.We now pick two datasets (two groups of data according to different time of day) and name them Dataset 1(D 1 ) and Dataset 2 (D 2 ).Thus, we get two distributions for datasets D 1 and D 2 as: where q = 2 (2 dimensions);  is dataset's centroid and  is covariance matrix.
According to the properties of operations on independent multivariate normal distributions, a linear combination of multivariate normal distributed variables is still distributed normally: Therefore, the difference between two neighbouring datasets: is a multivariate normal distribution.Now, the null hypothesis can be written as: i.e.
0 :  1−2 = 0 (6) which means no difference between two centroids.Meanwhile, the alternative hypothesis is equivalent to:  1 :  1−2 ≠  (7) For that pair of hypotheses, we can calculate p-values by using hypothesis testing and compare the testing results with significance level to know if there are significant differences among datasets.When the separation is done, we can proceed to analyse separated data using the methodology proposed in the next section.

New accident Information from outlier detection
In this section we firstly introduce existing works in the first two subsections and then propose our modifications and improvements so that we can correct and complement traffic accident information.

Mahalanobis distance
Euclidean distance [32] is a widely used traditional and ordinary distance for outlier detection [33,34].It is easy to understand, implement and fast to calculate [35].However, it cannot represent a concept-related distance, as it does not consider the shape of distribution (scatter) [36].To avoid this weakness, M-distance [24] based analysis is used in this work to detect multivariate outliers.
Here is a brief description of M-distance.Suppose  ( 1 ≤  ≤  ) is instance index and  ( 1 ≤  ≤  ) is variable index in dataset  = ��  �� that contains n observations of k variables.The covariance between variable  and variable  is: where   and   are variables' expected values.Thus the covariance matrix of dataset  can be expressed as a k×k matrix  = �[  ]�.
Finally, M-distance to centroid is the distance between instance   and centroid : Although M-distance can also be used to measure distance between any two points, it stands for "M-distance to centroid" in our work especially.
When being compared with conventional thresholds, Fig. 2 shows that M-distance is suitable for multivariate outlier detection and can provide better thresholds by considering the shape of distribution [37].

Adaptive threshold
Though M-distance considers the shape of distribution, it comes with a shortcoming.When calculating the covariance, M-distance is sensitive to outliers and extremes [29,38].To reduce the sensitivity, adaptive reweighted location and scatter estimation [37] (ARW) is used to determine an adaptive threshold   .If the distribution function of   2 is noted as (), and the empirical distribution function as   (), where n is the number of observations, ARW can be described as pseudo code below.  is used to describe difference between theoretical distribution and empirical distribution.
A data instance can be detected as an outlier if its Mdistance is bigger than the adaptive threshold.As estimating the centroid and scatter consumes a vast amount of computing resources, minimum covariance determinant [39] (MCD) is used to calculate centroid and scatter.
Outliers can now be detected by comparing timeseparated data's M-distance with adaptive threshold, and those outliers are time-separated outliers.

Differential outlier
Though flow rate and speed are considered together within time domain, the detection is not using a nature of the time domain that is differential characteristic.Thus, we also propose to use differential data to detect outliers.For differential calculation, we can get the differential data from two consecutive instances.That is, the differential data is a result of subtracting one data instance with its neighbour in a consecutive time series.
If we note each instance as   = [  ,   ], we can get the differential data as: For example, if there are originally 100 instances, we can have 99 differential instances.Then the differential data is divided by time of day according to the time stamp of instance  .Each differential dataset has almost five thousand instances.The remaining analysis procedure is the same as normal outlier detection.Each differential dataset was analysed using M-distance to detect outliers.Now the detected outliers are time-separated differential outliers.

Monthly updatable algorithm
The aforementioned outlier and differential outliers are detected by comparing data instances with thresholds calculated from all archived data.Therefore, the model that is being compared with is not timely dynamic.This might cause two time-related problems.Firstly, if some weeks or months are special, the algorithm may behave unexpectedly.Secondly, the algorithm cannot reflect the fact that more and more cars are coming on the road nowadays gradually.To solve those problems, one solution is to calculate weighted instances in ARW and give more weighting to recent instances.This solution requires the whole archived data and increases calculation cost.Another solution is to use updatable method.
Updatable algorithms [40] can produce a new result from the latest old result and new data without old data, which can dramatically reduce calculation time.Thus, we improved the original algorithm to be updatable which will consider recent data instances more than other history instances when deciding if the new instance is an outlier.In addition, we weighted the influence of old data by limiting the number of old instances, so the new result gets adjusted according to the new data dynamically.
Below is a description of the improved algorithm.Consider  1 = [ 1 ,  1 ] containing  1 old instances and  2 = [ 2 ,  2 ] containing  2 newly arrived instances, we can use the existing covariance matrix of  1 and instances in  2 to detect updatable outliers.If the old covariance matrix is noted as: and the covariance matrix for new data is: then overall updatable covariance matrix is: where In accordance with Eq. ( 8), we derived: where  0 =  1 +  2 .The other three items in  0 can also be calculated using similar equations.�   ,    � and (   ,    ) are centroids of old and new data instances respectively.Therefore, new updatable centroid is calculated as: To take advantage of this new algorithm, time series data can be grouped by a time gap, say a month, and then each group's centroid, scatter and threshold can be calculated separately.Hence, we get updatable Mdistance based algorithm that leads to Updatable Time-Separated Outliers and Updatable Time-Separated Differential Outliers.

Accident occurring time, duration and direction
Through previous calculation, we already have four different outliers, and we need to select suitable ones to use according to real world data.For one data instance, the system will calculate its M-distance and then divide by adaptive threshold.If the quotient is bigger than threshold, and its timestamp is near selected accident record, it is the accident occurring time.Otherwise, the biggest quotient should be used.
The outlier following occurring time is accident ending time which means the traffic starts to recover from the accident.Sometimes traffic situation will resume from one accident gradually and the second outlier is unobvious, then the largest quotient related timestamp in three hours after the accident occurring time will be considered as road cleaning time.This is due to the fact that most accident duration is less than three hours [41].
Accident direction is given by accident indicator which measures the deviation of accident traffic from non-accident traffic (in Fig. 3).The indicator is defined as: Centroids of traffic during accident time  accident and non-accident time  non.accident are calculated respectively.The difference between those two centroids, i.e., differential centroid, is analysed using differential outlier detection.The M-distance is compared with adaptive threshold to get accident indicator.The biggest indicator gives the detection device as well as the direction of accident.

Proposed steps
The aforementioned procedure is robust to outliers, also adaptive to both degree of freedom and number of instances.
Our work uses the procedure to analyse traffic data as shown in Fig. 4.  Firstly, the system queries the data from database.The original time series is processed according to the left side steps while the differentialized time series is processed according to the right side steps.Secondly, the time series will be separated into several datasets according to data's timestamps.The results can be visualized as two types of diagrams.Thirdly, the timeseparated data will be analysed.Thereafter, both updatable and non-updatable algorithms are used to detect outliers.Thus, non-differential and differential data lead to four types of outliers.Finally, the system considers those outliers together with the inaccurate or incomplete accident records to produce accurate and complete accident information as suggestions to fix the record.

Experiments and results
In this section, the proposed steps are used to process real world data.Our source of data comes from devices monitoring a freeway named Kunshi in Southwest China (in Fig. 5).The freeway is 70 kilometres long and thirty devices are collecting traffic data.The devices report traffic statistics every 5 minutes.Each reported record contains total number of cars in 5 minutes and time averaged speed during the same period.The data is from April 2013 to May 2014, and more than six million records are stored in the database.Multi circle icons may be displayed as one when close to each other.Total length of monitored freeway is70 km.

Cleaning data
The raw data needs to be cleaned before data preprocessing.For each monitoring device, there can be one or more cameras and each camera will report records of one lane statistic data independently.This brings two problems.Firstly, one monitoring device generates multiple records according to lanes and those records should be aggregated.Secondly, the arriving timestamps of reported records from different cameras might have several seconds' delay which makes it harder to find the right records to aggregate.After analysing the raw data, we found that the records from one device have the same timestamp until minute level but different at the second level.We aggregate the records according to timestamps until minute level to get data instances.The aggregated results show that this method is working correctly when being compared with raw data.

Hourly data as time-separated data
Among collected traffic metrics, flow rate and speed are used in our analysis as proposed in the second section.Cleaned data is plotted in Fig. 6.The road usually carries under-saturated flow that is part of the conventional speed-flow fundamental diagram in Fig. 1.The data then is ready for organisation using the method in section 2.2.We found that dividing data into datasets by "hour of day" is suitable for M-distance based algorithm.Each dataset of one device has about five thousand instances.Fig. 7 shows an example that the density plot which shows that the data points from a device during one hour are clustering as an ellipse and suitable for M-distance to use.
By using inferential statistics and hypothesis testing, we get p-values of differences among all hourly datasets.As shown in Fig. 8, there are significant differences among 30% of those hourly dataset pairs, so it is necessary to analyse data according to its hour of day.Thus, the null hypothesis H 0 is rejected and H 1 is accepted.Therefore, the data needs to be divided according to its hour of day before analysis.
On the other hand, no pairs of neighbouring hourly datasets are significantly different (all dark cells are connected).Which means the transits between neighboured hourly datasets are smooth.Therefore, choosing one hour as the time gap not only separates different data but also keeps the transits smooth.

Different outliers in speed-flow diagram
After data organisation, M-distance analysis introduced in the third section is used to detect outliers.Below is an example of analyses result from a dataset that contains data from 10 AM to 11 AM.Fig. 9 is the dot plot of Fig. 7.
After applying adaptive M-distance outlier detection, we calculated outliers in hourly speed-flow fundamental diagrams.Non-hourly outliers and hourly outliers are plotted respectively with their thresholds.
When compared to non-hourly outliers and threshold (in Fig. 10), hourly outliers (in Fig. 11) are more reasonable.

Hourly differential outlier in speed-flow diagram
Using differential calculation, we can get differential datasets.The differential data points are clustering as a cross shape instead of ellipse, so it is not possible to use M-distance directly.However, if we plot hourly differential datasets, we can see the expected ellipse (in Fig. 12).Applying hourly adaptive M-distance analysis, we finally calculated hourly differential outliers.When being compared to non-hourly differential outliers (in Fig. 13), hourly differential outliers in differential speed-flow diagram (Fig. 14) show improvement that the threshold ellipse is more reasonable.

Different outliers in time series
We can see outliers in speed-flow diagrams, but it is hard for analysts to use.Time-series plot is necessary for further analysis.Fig. 15 displays several hours traffic situation on a holiday.The hourly outliers spread over this period.Instead, hourly differential detection is more stable and accident is detected correctly.In our system, one month length is used in updatable algorithm to make sure there is enough data to be analysed to update centroids and thresholds.One important thing to notice is that  1 will be limited to maximum 360 which is product of 12 instances per hour and 30 days per month, i.e., the old result has less weighting than the new one.
Performance of the updatable algorithm is usually similar as the original one, but it performs better when there is a change for monthly traffic data.For example, when the biggest annual festival came earlier (the festival is based on lunar calendar), updatable detection algorithm adjusts to this change and can stay below threshold, while the original hourly outlier detection algorithm cannot get used to this situation and gives many outliers (Fig. 16).Outlier displayed threshold is enlarged from 1 to 100 as well as the quotient of data's M-distance and threshold to ease visualisation.For the same reason, displayed maximum quotient values are limited to 150.Besides, updatable differential hourly outlier performs well even during abnormal fluctuation in holidays (Fig. 17

Interactive user interface
On one hand, R [42] provides a multiplatform commonly used environment [43,44] to perform state-ofthe-art data analysis [45] (R environment is used throughout this paper).On the other hand, the speed-flow diagrams (Fig. 9 ÷ Fig. 14) are hard to understand and use, it is necessary to have a rich-media user interface to provide illustrative information for analysts.Web-based system is widely used for data monitoring and visualization due to its excellent display effect and userfriendly interface [46÷48].Shiny [49] is R's web framework and both of them are available as open source software under GNU General Public License.Based on the proposed algorithms, we developed an interactive system for analysts using R and Shiny.
As shown in Fig. 18, the main panel has four areas.The first area is used to select which panel to display.
The source area contains three subareas (Fig. 19).In metric selection subarea, the analyst selects some metrics that should be displayed.Flow rate, speed and two updatable types of outliers are selected by default, the other two types of non-updatable outliers are optional.Then the analyst selects one accident that needs to be investigated from accident selection subarea.According to the selected accident, the system finds out the nearest four devices for both directions.In addition, two buttons named "previous accident" and "next accident" can be used to navigate among accidents when it is necessary to go through the accidents one by one.Besides using accidents, the analyst can also manually fill a time point and a place in manual input subarea for the system to pick related devices.On the top left of Metric Selection Subarea, there are several metrics that are ready to be displayed in the plot.As mentioned in the previous chapter, different lanes carry different level of flow, to ease the view of outlier displayed threshold, an enlarge factor is used.All the selections are echoed from server to make sure the actions are right.The figure below shows Manual Input Subarea.For "user manual input source" mode, the system requires a rough milestone position and a rough time.

Figure 21
Manual Input Subarea for Data Source Area.The left side is the switch of source "user manual input" or "selected accident".The right side is the manual input of milestone and time.
For "selected accident source" mode, Fig. 22 below shows the accidents to be selected.The accident information includes the type of accident (archived in different tables), raw recorded accident time, milestone position and brief accident fact (which is blurred for privacy).Given the time and four devices from previous step, related data is analysed by the system and the results are displayed in the plot area (Fig. 23).The data from two devices in upstream direction is shown in the top subarea, one device is ahead of accident point and the other behind.The data from downstream devices is displayed in the bottom subarea.Then the selected accident is displayed in the middle to ease the analyst's comparison with plots.Taking one plot as an example (Fig. 24), the system suggests a new occurring time and duration for the selected accident shown as the shaded area.The analyst can check metric values by moving and hovering mouse.Then the analyst can tune the system suggested information and confirm it.Hence, there is an accurate accident occurring time.Two new attributes are accident duration and direction.Based on the four accident indicators, the analyst can choose the biggest one which means the most obvious detection.Accident information is updated when the analyst confirms the analysis results.When the system promotes confirmation window, those attributes are stored in addition to the existing ones.
The plot title indicates only device distance and device ID, more detailed information such as number of monitored lanes, device type, installation place and surrounding environment is shown in the related devices area below the plot area.During the above procedure, if the analyst wants to analyse the data deeply, "details" panel is useful.There are four areas in this panel (Fig. 25).The top area provides extra metrics such as monitored lane, occupancy, average space and flow rate of different types of vehicles.The analyst can select a specific device and time in the second area.The plot area then visualises time series which is similar with the plot area in the main panel but with more details using extra metrics as well as the main panel's basic metrics.The bottom area shows raw traffic statistic records before aggregation which can be checked when there is suspicious device malfunctioning.

Conclusion and future work
In this research, we propose to use multi-metric data instead of commonly used single metric data for traffic analysis.We also introduce a general method to preprocess traffic data to be suitable for multivariate Mdistance based algorithm.In this process, we introduce the importance of differential distance.Then we modify the algorithm to be updatable and describe the methodology to detect accident and correct and complement accident data.Finally, based on proposed algorithm, we develop a system with illustrative and interactive user interface to visualize different outliers in time domain and help to fix accident information efficiently.
One issue should be concerned during data organisation, which is that the hourly datasets may cluster as ovals instead of eclipses.In that situation, evaluation of eclipse shape and appropriate transformation, such as logarithmic transformation or exponential transformation, are recommended.In addition, due to the lack of accurate flow-accident data, we are not able to statistically compare fluctuation estimation and accident detection performance with other distance types and algorithms.The usages and issues of proposed methodology and procedure will be investigated in future work, for instance traffic analysis and prediction using supervised learning.

Figure 2
Figure 2 Example of different thresholds.A: the rectangule shows threshold considering each metric separately.B: the circle shows threshold considering two metrics together.C: the ellipse shows Mdistance threshold considers two metrics together as well as the shape of distribution.

Figure 3
Figure 3 The indicator of traffic direction is calculated from the difference between accident traffic and non-accident traffic.

Figure 4
Figure 4 Main steps of the proposed method.The data is analysed device by device.In-process outcomes (grey coloured) are two types of diagrams and four types of outliers.Final outcome is accurate and complete accident information.

Figure 5
Figure 5 There are totally 30 monitoring devices as circled on map.Multi circle icons may be displayed as one when close to each other.Total length of monitored freeway is70 km.

Figure 6
Figure 6 Density of speed-flow fundamental diagram for one device data during the whole day.The distribution cannot hold M-distance assumptions.

Figure 7
Figure 7 Density of hourly speed-flow diagram for one device data (10 AM to 11 AM).The distribution can hold M-distance assumptions.

Figure 8
Figure 8 30% of pairs (lightest grey cells) are under or equal to 0.02 (values are rounded to 1 or 0), so data should be seperated by hour of day before analysis.35% are under or equal to 0.05.41% are under or equal to 0.1 (darkest cells), which means transits of hourly data are smooth and suitable.

Figure 9
Figure 9 Before adaptive M-distance outlier detection, one hourly dataset in speed-flow diagram (10 AM to 11 AM).

Figure 10
Figure 10 After using all data (non-hourly data) in adaptive M-distance outlier detection.Non-hourly outliers are marked as triangles in speedflow diagram, unacceptable poor quality (10 AM to 11 AM).

Figure 11
Figure 11 After using hourly data in adaptive M-distance outlier detection.Hourly outliers are marked as triangles in speed-flow diagram.Quality is better than non-hourly outliers (10 AM to 11 AM).

Figure 12
Figure 12 Before adaptive M-distance outlier detection, one hourly differential dataset in differential speed-flow diagram (10 AM to 11 AM).

Figure 13
Figure 13 After using all data (non-hourly data) in adaptive M-distance outlier detection.Non-hourly differential outliers are marked as triangles in differential speed-flow diagrams (10 AM to 11 AM).

Figure 14
Figure 14 After using hourly data in adaptive M-distance outlier detection.Hourly differential outliers are marked as triangles in differential speed-flow diagrams.Quality is slightly improved from nonhourly differential outliers (10 AM to 11 AM).

Figure 15
Figure 15 Hourly differential outlier can detect accident correctly.It is robust to extreme high traffic during holidays.Detected accident duration is shaded.Outlier threshold is set to 100.

Figure 16
Figure16 There is no accident in the plotted duration and direction.Updatable hourly outlier is adopting to traffic situation change and more accurate than normal hourly outlier.Outlier threshold is set to 100. ).

Figure 17
Figure17 Updatable differential hourly outliers can detect the accident, and stay stable after the accident compared with non-updatable ones.Detected accident duration is shaded.Outlier threshold is set to 100.

Figure 18
Figure 18 Main panel.It contains four areas.Details are explained in the following zoomed in figures.

Figure 19
Figure 19 The Source Area structure.It contains three subareas and the details are shown in the following three figures.

Figure 20
Figure 20 Metric Selection Subarea for Data Source Area.

Figure 22
Figure 22 Accident Selection Subarea for Data Source Area.It is the most important subarea as the selection leads to four related devices.The information shown here includes accident's type, raw time, position and description, which helps users to idenfy the correct accident.

Figure 23
Figure 23 Plot Area.It contains three subareas.The selected accident is displayed in the middle, surrounded with related traffic data from four related devices.The Selected Accident Display Subarea shows the same information as Accident Selection Subarea from source as a confirmation, and the plot is shown in detail in the next figure.

Figure 24
Figure 24 One device's analysis result plot, part of plot area.Detected accident duration is shaded (11:20 to 11:55).Top-right values follows mouse hovering position.Zooming in or out can be done using zooming bar or direct mouse-drag selection.The four plots are syncronized for the zooming rate, which means zooming one plot will also let other three plots zoom to the same rate.

Figure 25
Figure 25 Details panel.It contains four areas, including a detailed plot for the selected device and un-aggregated raw traffic statistic records.The plot is simplified from the main panel by removing accident time span, but with more metrics like lane, occupancy, average space, types of vehicles that are selected from Extra Metric Selection Area on the top.Traffic Data Records Area is zoomed in and shown in details in the next figure.

Figure 26
Figure 26 Traffic Data Records Area.One record contains several fields like timestamp, ID (including numbering of site, device, direction and lane), flow rate, speed and occupancy.More fields such as statistics for different type of vehicles can be shown instead of ID's considering display space.