Preliminary Analysis of Quality of Contour Lines Using Smoothing Algorithms

In this paper several well-known filtering techniques were compared in the purpose of automatic line generalization. The used methods for line simplification are digital first order low-pass filter, Savitzky-Golay (SG) filter and Whittaker filter. Two versions of the algorithm for line feature generalization were tested, from source scale 1:25 000 towards target scale of 1:100 000 and from source scale 1:25 000 towards scale of 1:50 000. Also, GPS data filtering for the target scale 1:50 000 was tested. The first version of the algorithm considers that there are no control data, and the filtering parameter is dictated by the desired accuracy for the target scale. The second version involves control data in the target scale. This means that the optimal value for the filtering parameter is the value for which the difference between input and control data is the smallest. Analysis showed that the SG filter yielded the best results in general. The proposed filters can be considered as a new solution for automated cartographic line simplification.


INTRODUCTION
Generalization of geodata is a broad term that can be divided into model generalization/database generalization and cartographic generalization. Model generalisation is the derivation of a reduced database from the given one. Cartographic generalization is the process of derivation of a graphical product or visualization, either from a database or from another map in a larger scale [1].
Generalization operators define particularly a generalization process at the conceptual level (e.g. simplification), implemented by generalization algorithms. For a single generalization operator several generalization algorithms may exist. Simplification is defined as the act or process of making data simpler or reducing that to its basic elements [2].
The amount of data in machine readable form is increasing rapidly. Thus, cartographers must pay close attention to the simplification manipulations that can be applied to machine readable files. At the same time, they must remember that human subjectivity and hardware limitations have determined the contents of those files.
The topic of this paper is the generalization of topographic data by simplification. The objective of this article is to generalize contour lines using filters. Also, the paper compares several line filtering techniques within the scope of contour line generalization. Namely, the purpose of that is analyzing the effects of scale and quality of topographic data. Generalization method proposed in this paper provides automated simplification of line objects. On the other hand, having adjustable parameters it still enables a certain degree of control to a cartographer.

THEORETICAL PERCEPTIONS AND EXAMPLES OF SIMPLIFICATION USAGE
Automated line simplification problem was previously examined in numerous studies. Many of these studies are based on well-known Douglas-Peucker (DP) [3] algorithm or sometimes called Ramer-Douglas-Peucker algorithm [3][4]. Douglas-Peucker algorithm has the possibility that the resulting simplified polyline contains self-intersections [5]; this can lead to topological issues, hence some researchers based their studies on upgrading and modifying the DP algorithm [6][7][8].
Authors in [6] proposed a hybrid line simplification method that consists of 2 parts, quantitative characterization and the other part is segmentation and simplification. Briefly described, the segmentation is used to detect homogeneous regions and then each segment is simplified by the algorithm to which the segment belongs. Algorithms used in this study [6] for line simplification were: DP, sleeve-fitting and turning function. Hybrid method yielded better shape preservation than the results of individual algorithms, for rectangular shapes. Also, the hybrid method produced the lowest positional error values when compared to conventional line simplification algorithms.
Pallero [7] presented variation of DP for individual line simplification. The robust variation of DP algorithm produced polylines that do not contain any selfintersections. However the proposed robust DP algorithm, which checks for the intersections, is time consuming (for tolerance value 0,2 km). This can be avoided if the limit of 2000 intersections is set to be checked. Generally, this method produces lower number of vertices than the original DP algorithm.
Ling [8] introduced an improved strategy of detectionpoint identification which is topologically consistent. The improved algorithm for simplification adds inconsistency check and improvement to the DP. Number of vertices of simplified lines in improved DP was greater than in the DP.
The hexagonal quantization algorithm for vertex clustering in [9] presents scale specific automated line simplification algorithm. Hexagonal quantization algorithm samples the input vertices in a hexagon and produces a new set of points. Two algorithms were implemented: one that uses the spatial mean of vertices in a cell, and the other that uses the midpoint in a cell, formed by the first and last vertices in the cell. Results imply that the method is suitable for natural linear features, such as rivers and coastlines. Also, the spatial mean quantization method produced simplified lines more similar to the input line than the midpoint method.
The Grid-Gen (GG) [10] algorithm works with a set of control points in order to simplify polylines of the map. This method includes forming a uniform indexing grid to accelerate the process of simplification.
Extension of Grid-Gen, the Grid-Gen 2 (GG2) [11] is based on Visvalingam-Whyatt [12] algorithm. In GG2 vertices are ranked based on the effective area as proposed in [12] that is, vertices with higher effective areas are more important than other vertices in less effective areas. Due to this, points which lie in less effective areas will have more priority for removal than the points in higher effective areas. GG2 provided better results than GG, in terms of shape preservation. Also, the GG2 was two times slower than GG in almost all the experiments. Both of the GG algorithms use a set of control points in its simplification process.
Ai [13] proposed an algorithm specific for coastline simplification which involves using the Delanay triangulation in its process. This algorithm preserves geomorphologic characteristics of coastlines.
Researchers also studied methods for line simplification by smoothing. Some of them include smoothing based on snakes, [14] and [15]. Lawford [16] used Fast Fourier Transform in order to simplify coastlines.
Lawford also studied the relationship between simplification (level of detail) and several scales. Mansouryar and Hedayati [17] applied an iterative process of line smoothing which is categorized as an averaging method.

ASSUMPTION AND GENERAL METHODOLOGY
The degree of generalization depends on four basic factors: map purpose and conditions of use, map scale, quality and quantity of data and graphic limits. Namely, the level of generalization of a map mainly depends on the scale, the complex of the symbol and the spatial context. Moreover, according to a given problem of representation of contour line, specific operations are performed (curvatures, displacement, bends removal…). In this way, typical situation is that the input data is contour line vectorized from the map of larger scale or from the field data (collected using GPS, generation from LiDAR data or some other survey method). Output data is contour line of desired scale and it is the scale that is the main criterion to determine the degree of generalization ( Fig. 1).

Figure 1 The concept of the level of cartographic generalization
If contour lines for the same height but in different scales are considered it is clear that, due to generalization, the line in larger scale is much more accurate than the line in smaller scale. This fact was a guideline for our proposed solution for automated generalization. Since the accuracy in the given scale is determined by convention (i.e. the state regulations) input contour line can be iteratively simplified in small steps until accuracy for smaller scale is satisfied. Here accuracy was interpreted as difference between input and simplified line, because input line from larger scale has higher accuracy than in the target, smaller scale. In other words, contour line will be simplified in small steps; the generalization algorithm will stop when error reaches the value for target scale. However, the input line is not absolutely accurate; its error is only much smaller than the accuracy in target scale. There are three options. The first one, if source scale is much larger than the target scale, then errors in input data can be ignored. The second option is to use some control data, for instance, the same line which is manually generalized for target scale. Comparing simplified line with control data we can determine the degree of generalization. The last option is to increase the target scale accuracy in amount of source scale accuracy.
The simplification method used in this algorithm is filtering, namely, digital first order low-pass filter (FOLP). It is a well known method used to remove noise from recorded data and it has several characteristics that make it suitable for generalization. From the computing point of view it is a very simple method which does not require much processor power. It has only one adjustable parameter, which can change its value from 0 to 1. If parameter is set to 0 none of filtering is done; if it is set to 1 all data is filtered and every point is the same as the first one. Finding appropriate value for only one parameter is easier task than for more parameters. The last characteristic of the FOLP filter is that it does not remove points, it only changes their coordinates (rate of change is determined by the parameter a); therefore, certain amount of information can be preserved.
Savitzky-Golay (SG) filter represents the least-squares polynomial smoothing across a moving-window [18][19][20]. Namely, SG filtering technique is controlled with two input parameters. The first one is the frame size (moving window-n), and the second parameter is the polynomial degree (p). Lower value of the polynomial degree tends to generate smoother results. Larger value of frame size can over smooth the input data, allowing sharp peaks to be flattened. SG filter is practically the same as the movingaverage filter when the polynomial degree is 0 (constant) [20].
Whittaker filter [21] is also one of frequently used filters for various applications, some of which include filtering remote sensing derived vegetation indices. It is based on penalized least squares and as the FOLP filter it has one user adjustable parameter (w) which controls the smoothing; the higher the value of filtering parameter the smoother the results will be.
Proposed generalization algorithm was implemented in MATLAB software and it was tested in two versions. The first one is applied in situations when there are no control data. If scale of input data is much larger, then errors in input data can be ignored and desired accuracy is equal to the accuracy given for target scale. Otherwise, desired accuracy is calculated as mentioned above. In our example input line was in 1:25.000 scale with 5 m accuracy, while target scale was 1:100.000 for which given accuracy was 20 m. Thus, desired accuracy was 15 m in order to compensate errors from input data. It consists of several steps (Fig. 2). First, after loading input data (x), target accuracy (delta) is set. Then iterative procedure starts for the specific filtering technique. For the FOLP filter, a is set to 0,99 and in each iteration, it decreases by 0,01. After each filtering, error (err) is calculated and if it is bigger than allowed for target scale filtering parameter a decreases, and one more iteration is done.
The SG filtering procedure (Fig. 2) is more complex because there are two adjustable parameters. The following conditions must be met for the SG iterative procedure: polynomial degree (p) must be less than frame size (n) and frame size must be an odd number. For our study data, values for p which provided the least error were low (0, 1, 2 and 3) whilst the values for n were 13 or less. Iterative procedure for Whittaker filter is very similar to FOLP filter, the only difference is in the values of the filtering parameter (w). Error is calculated as maximum distance between input and simplified line (xfilt). When error becomes lower than allowed accuracy, algorithm stops. If control data in target scale are present, second version of algorithm can be applied (Fig. 3). It is more complex than the first version. It is assumed that set of control data is smaller than input data set therefore target data are used to determine optimal value for filtering parameters, that is the value that generates the smallest possible distance between control and corresponding filtered input data. After this step, entire input data set can be generalized using determined value. In this version test input data and control data are loaded. Test data is a subset of input data which has its corresponding control data. For the FOLP filtering parameter a is initially set to 0,99 and in each iteration, it is decreased by 0,01 until 0. In each iteration error is calculated and recorded. Then the value of a (aopt), for which error is minimum, is found. For the SG filter, values for two parameters p and n are initially set to 0 and 1, respectively, in order to meet the filtering conditions mentioned previously. Value of polynomial degree that will eventually end the process is set to 5, because values for p higher than 4 produced higher errors, and the same parameters apply for frame size, also values of n higher than 20 increased the error. Filtering parameter for Whittaker smoother is set to 4 and is decreasing in small steps of 0,01. The result showed that optimal value for Whittaker filter was around 2, and that the values higher than 4 did not produce good results.
In both versions of the algorithm, lines are represented as ordered sets of points with X and Y coordinates in state plane. Error is calculated for every point of processed line, as the orthogonal distance to a closest straight line segment of the second line, or a distance to the closest point of the second line, whichever of these two distances is smaller. The biggest error on a point of the line is considered the error for that line.
All of the mentioned filtering techniques will smooth the line in different ways; the main controlling factors for all of them are filtering parameter, window size and the allowed error (control data for version 2).

RESULTS AND DISCUSSION
Implemented algorithms were tested on various examples. Both input and control data were obtained by vectorizing contour lines in the area of Fruška gora mountain, on the topographic maps of Serbia. First example included generalizing vectorized contour lines at scale 1:25.000 for the target scale 1:100.000. Given accuracy for input data is 5 m, while the accuracy for the target scale is 20 m. Since control data for this example were provided, both versions of the algorithm could be tested. Input line consisted of 1986 points. In the purpose of testing, certain graphic representations of error were implemented in the code. Root mean square error (RMSE), mean average error (MAE), standard deviation (SD) and maximum error (MAX) were calculated in order to evaluate filtering quality and accuracy.
For the first version of the testing using the FOLP filter, delta was set to 15, and 27 iterations were done until error became smaller than delta. Filtering parameter a in the last iteration was 0,72, and the maximum generalization error was 14,68 m. As it was expected, with every iteration error was becoming smaller (Fig. 4).

Figure 5 Input data and generalization results
Generalization results together with input data are presented in Fig. 5.
When delta was set to 20, 22 iterations were done, a was 0,78, while the maximum error was 19,69 m. The optimal values for the SG filter were 0 for polynomial degree (moving average filter) and 9 for frame size. With those parameters, the error for SG filter was 14,51 m. In the case of Whittaker smoother, the value of filtering parameter that produced the generalization error of 14,66 m was 1,7. In most of the cases the resulting (filtered) contour lines overlap (Fig. 5), the filters produce very similar results, so it is very hard to discriminate the performance of the specific filter by visual analysis of the results.
Control data for the second testing version correspond to contour line in scale 1:100.000. Results show that optimal value for the FOLP filtering parameter was 0,77 (Fig. 6), with maximum error of 41,49 m, and MAE of 8,98 m. For the SG filter, optimal value for p was 0 (constant) and the optimal frame size was 13. Optimal filtering parameter w for Whittaker filter was 2,2. All of the statistical measures, for optimal values of filtering parameters, can be seen in Tab. 1.
Generalization results when using version 2 of the proposed algorithm are presented in Fig. 7.
These control data were used to verify the first version results for the FOLP filter. With desired accuracy of 15 m, maximum difference between resulting and control contour line was 42,94 m and MAE was 9,11 m (Tab. 1), while for desired accuracy of 20m maximum difference was 41,44 m and MAE 8,98 m.  Second example includes the process of generalization for the contour lines at scale 1:25.000 with the target scale of 1:50 000. The accuracy for the target scale is 10 m and in order to compensate errors from input data, the desired accuracy delta is thus set to 5m. The input line in this case consisted of 2347 points. As in the previous example, control data at scale 1:50 000 were provided and the filters were tested on both versions.
The first version for the FOLP produced the following results: 35 iterations were carried out until the error became smaller than 5m. Generalization error in the last, ending, iteration was 4,87 m (a was 0,56). For the SG filter, polynomial degree of 1 (linear) with frame size of 5, satisfied the delta threshold value, and the generalization error was 4,83 m. Optimal filtering parameter for Whittaker smoother was 0,83, with the error of 4,99 m.
With the control data present in the dataset (version 2), the optimal value for FOLP filter was 0,46 with MAE of 6,20 m. Optimal value of polynomial degree for the SG filter was 1, with the frame size of 3. That is, not much filtering was done with these values of parameters. The maximum error was 33,09 m whilst the MAE was 6,25 (Tab. 1). The third, Whittaker filter with the value of w equal to 0,44 produced MAE of 6,22 with maximum error at 32,38 m (Fig. 8).
Generalized contour lines when using version 2 are presented in Fig. 9.
The proposed solutions were also tested on noisy GPS data in order to generate a map at scale 1:50.000. GPS point data (hiking trails) were collected with hand-held device (Trimble-GeoXT) with accuracy range 3-5 m. The collected data was intended for the purposes of thematic map production at scale 1:50.000. According to the state regulations for the production of thematic maps at scale 1:50.000, the allowed error is 10 m. However, owning to the constraint that the conditions during the field data collections were inadequate, the collected data was affected by errors. Control data was also provided so both versions of the proposed algorithms could be tested. Generating control data from original noisy GPS data included manual, time consuming and tedious job, which can now be avoided by using version 1 and the optimal filtering technique, or by using version 2 together with a subset of control data.
Generalization errors as well as the graphical results of the generalization are presented in Figs. 10 and 11.   Even though the maximum errors were significantly higher than the allowed error, the MAE were much lower and satisfied the needs of accuracy at scale 1:100 000. This can be seen in Tab. 1.
However, for the target scale of 1:50 000 the MAE were higher than allowed error which implies that the proposed methodology is more suitable for generalizing lines for target scale 1:100 000. For the first and second version of the filtering algorithms the SG filter gave the best results in terms of RMSE, MAE and MAX (scale 1:100 000). The SG filter for the target scale 1:50 000 and for version 1 performed better than the other two filters. For version 2 the FOLP filter performed slightly better than the Whittaker filter. Regarding the filtering of GPS data the Whittaker filter produced the smallest values of RMSE and MAE (version 1 and 2), while the FOLP performed poorly and produced maximum error which was significantly higher than for the other 2 filtering techniques.
Regarding the cartographic quality of filtered data, in the first version of the algorithms, it will strongly depend on the input data. For the second version of the algorithms the cartographic quality of the filtered contour lines depends on the subset of control data and on the input data.
Finally, when it comes to the distribution of the errors for the specific filtering technique and for the specific target scale, Figs. 12, 13, 14, 15 and 16 display the frequency of the error together with the error range. As it can be seen most of the errors fall in 0-10m range for all versions and filters. W denotes Whittaker filter.

CONCLUSION
In the process of generalization from one scale to the other, creation is based on minor changes which mean reduction of topographical data. This paper analyzed several solutions for automated line generalization. We have selected contour lines at 1:25.000 scale, which is the basic state scale, for the purposes of line generalization for target scales 1:100 000 and 1:50.000. If we consider all of the input data and both versions Savitzky-Golay filter yielded smallest values of errors. Also, the proposed solution did not produce satisfactory results for target scale 1:50 000. The algorithms are completely automated and the cartographers' job is to change the filtering parameter(s) and the allowed error (version 1). The accuracy of the generalization in the first case (version 1) is determined by the accuracy of input data. The accuracy of the generalization in the second case (version 2) is designated by the accuracy of control data and input data. The quality of the input and control data is crucial in this study. If errors in the mentioned data persist they will be manifested in the generalization outcome. Development of procedures for automatic generalization of contour lines from source scale 1:50 000 towards scale 1:100 000 is still in progress. Future work will be focused on comparing several well-known filtering techniques for line simplification with point elimination methods at different scales and with different types of input and control data (coastlines, boundaries, roads, etc.).