Absolute Time Series GNSS Point Positioning-Data Cleaning and Noise Characterization

: T ime series data of GNSS point positioning are considerably used for the purpose of geophysical research. The velocity estimates and their uncertainties deriv e from time series data of GNSS point positioning affected by seasonal signals and the stochastic noise, contained in the series. D ata cleaning of GNSS time series is a prerequisite for the noise characterization and analysing. In this article one point positioning of time series was analysed in four different periods during the five year interval. The noise characteristics were estimated for all periods. By applying Lomb - Scargle algorithm the comparable results were also provided. Lomb - Scargle algorithm used to estimate the spectral strength density of unequal sampled data is a typical tool for this kind of analysis. S pectral indices have been estimated before cleaning data and after removing linear, annual and semi - annual signals and outliers. T he spectral indices estimated from time series data of GNSS point positioning were located in the area of fractional Gaussian noises , and stationary stochastic process was described for the whole research time period.


INTRODUCTION
Time series data of GNSS point positioning through the global GNSS network solutions, have been widely used in quantifying and explaining the earth's surface changes. They are particularly important for geophysical research and provide very significant information about spatiotemporal characteristics of the earth surface [1][2][3][4][5]. Time series data of GNSS point positions are contaminated with random and time correlated noises that hinder the accuracy of velocity estimation and its uncertainties [6]. Identifying types of noise and the influence of noise on GNSS point positions is of great importance in analysing this phenomenon. GNSS data noise can be described as a power-law process [7]. Spectral analysis for time series GNSS positions provides a very powerful instrument for understanding the intrinsic mechanism that affects tectonic movements. Determination of the time series key frequencies and explanation of the system characteristics from the GNSS measurements are the main objectives of the spectral analysis [8]. The earlier methods of power spectrum analysis required evenly spaced data. These types of data are uncommon in earth sciences. The Lomb-Scargle method performs the spectral analysis on unevenly sampled data and is a powerful tool for estimating and testing the significance of weak periodic signals [9]. This algorithm evaluates the time series data at the measured times and does not need any interpolation. This paper aims at studying the offset and seasonal variations affecting the point movement velocities and their uncertainties through the noise characteristics of GNSS signal which describe this geophysical phenomenon.

METHODOLOGY
The used methodology has been divided into three parts: finding and removing outliers from the data, decomposition of time series (estimation of a linear trend, seasonality) and noise spectral analysis.
In this research, time series data were used from one EUREF permanent GNSS network station SRVJ located in Bosnia and Herzegovina, in Sarajevo, Fig. 1  The SRVJ station was integrated in 1999 and to this day has availability of large amount of data, collected for about 20 years, that is significant in the field of geophysical research. Considering the fact that duration of data of time series of GNSS is the key parameter for high-precision (sub mm yr -1 ) studies, and that duration of 8 years or more ensures a low-velocity bias in both horizontal and vertical components, while short series with less than 4.5 years duration cannot be used for high-precision studies, (except in the rare cases when one can be certain that they contain no significant offset). For intermediate durations of (4.5 -8.0 yr), only series with no offset can provide a lowvelocity bias. All others are associated with intermediate horizontal biases (0.6 mm yr -1 ) and a high vertical one (1.3 mm yr -1 ) [6]. We analyzed time series data of SRVJ for four time periods, each lasting about five years, to analyze the amount of data collected, their gaps and amount removed data (outliers). The first period was from 1999 to 2005, the initial period of establishment of the station SRVJ, the second was from 2006 to 2010, the third was from 2011 to 2015, and the fourth from 2016 to 2019, period with advanced GNSS technology. Time series data were downloaded from the Nevada Geodetic Laboratory at http://geodesy.unr.edu/NGLStationPages/stations/SRJV.st a.
To process time series used TSAnalyzer, a GNSS Time Series Analysis Software [10]. Software package TSAnalyzer was written in Python and was developed for preprocessing and analyzing continuous GNSS position time series individually. It also provides Lomb-Scargle spectrum analysis. Since it is based on Python, it is a crossplatform.

Outliers in Time Series of GNSS Coordinates
Site velocities estimated from time series data can be biased if outliers are present. Problematic outliers, not representative of the population, are contrary to the objectives of the analysis, and can seriously distort statistical tests [11].
Systematic error in time series, often ignored or hidden in GPS analysis, is the problem of offsets. However, whether known or unknown, the general effect of its impact is to decrease the accuracy and reliability of the estimated velocities [12]. If the offsets were left they could dominate in the velocity estimation quality. Time series of GNSS coordinates are almost inevitably polluted. Outliers and offsets can be broadly categorized into actual crustal movements such as earthquakes or artificial events. Artificial offsets can be environmental, equipment malfunction and change or human error.
The noise characteristics of the time series GNSS can be biased by many factors, which in turn affect the estimates of parameters in the deterministic model. The stochastic part mostly considered as observational noise can be described by noise models [13]. Most of the stochastic parts of GNSS time series are time correlated. Noise analysis in GNSS time series does not reduce the noise but classifying the noise can identify the source of the noise and characterize them, hence help increase accuracy and precision. Based on the value of α, different stochastic proceses can be described. If it is in the range this is fractional Gaussian motion (stationary stochastic process), for 1 α > this is fractional Brownian motion (nonstationary stochastic process). Special cases of spectral indexes are: white noise (α = 0), flicker noise (α = −1) and random-walk noise of Brownian motion (α = −2). Fig. 2, illustrates the noise spectrum and the associated names given to the integer values. The outliers were eliminated using a robust outlier detection algorithm model Eq. (1) based on the median and interquartile range (IQR) statistics [14]: Used 3σ criterion seems to fail in case of spread GNSS time series, due to the fact that the standard deviation is calculated from the whole data set [15]. In continuation, for detecting the offset in the time series of GNSS coordinates the signal segmentation algorithm was used [16]. The algorithm was originally developed for image processing [17], based on the variational principle and implemented in Software package TSAnalyzer.

Model Parameters
The mathematical model Eq. (2) used in analysing the coordinate components of the GNSS daily time series was given, as [18]: where: y-daily solutions of time series of GNSS coordinates, a-position of the station, b i -linear velocity of the station, c i and d i -annual and semi-annual amplitudes of periodical motions (harmonic components included in model-annual, seasonal and higher frequency time dependent).

Estimating Spectral Indices
Using the methods of spectral analysis in the frequency domain the spectral indices of post fit residuals were estimated. With Lomb-Scargle algorithm the periodogram of postfit residuals was also calculated for each position components [19][20][21][22]. The method has the advantage of evaluating data of time series only at measured epochs including the test of significance. The normalized Lomb-Scargle periodogram P of time series y(ti) for i = 1 to N was estimated with Eq. (3) [23]: where: w-angular frequency (w = 2πf > 0 ), σ-root mean square (RMS) ( ) ( ) 10 10 where P v is the power spectrum of the post fit residuals estimated from Eq. (4) and f s is the sampling frequency with the unit of day −1 . Then, the spectral index α of a power-law process can be estimated in the log-log plot using [6]: ( ) 10 10 x

RESULTS OF NUMERICAL RESEARCH
Within this research the GNSS point positions were analysed from five year time series measurements. For each period, the offsets were estimated and outliers removed (section 2.1). Decomposition of time series was performed (section 2.2) and spectral indices were estimated (section 2.3).The same methodology is used for all four periods of our study.

GNSS Time Series Data Analysis for the First Period
Geophysical time series often feature missing data. We used the Lomb-Scargle method to compute the periodogram of unevenly sampled time series and reconstructed the missing values in a time series from the amplitude and phase information of the dominant frequencies. In the first period of data (from 1999 to 2005) 1378 epochs were included with the percentage gap of 42.3%. Fig. 3 presents the least square fit without seasonal variations and contaminated time series, and velocity estimation by linear regression based on a time series. Key factors controlling the velocity estimates and their uncertainties are: duration of the series, clean time series (offsets, outliers and periodic signals) and the statistical properties (stochastic noise) contained in the time series. If these parameters are not detected, there is an error in the velocity rating of up to several mm yr-1.
After eliminating the outliers, using Eq. (1) and cleaning time series of GNSS coordinates, least square fit with seasonal variations and cleaned time series was shown in Fig. 4. The effects of insufficiently modelled seasonal signals will propagate into the stochastic model and falsify the results of the noise analysis, in addition to velocity estimates and their uncertainties [24].
In Tab   We estimated spectral indices of postfit residuals using the methods Eq. (3). The spectral indices show power spectral densities of contaminated time series and detrended daily observation without seasonal variation (Fig. 5). Fig. 5 shows the stochastic character of the power-low spectral index of the examined stations for the first period of the SRVJ. In general, spectral indices estimated for stations vary between −1.14 and −0.05, meaning we deal with different spectral characters of residuals. It is characteristic that for the contaminated time series of the first period, for the north component, the spatial index corresponds to the flicker noise. The shift of spectral indices toward flicker noise, α = −1, is probably due to the large-scale processes of atmospheric or hydrospheric origin with spatial correlation to some extent.
For cleaned position time series and detrended daily observation including seasonal variation, the spectral indices show power spectral densities (Fig. 6). Spectral indices were estimated as −0.18 ±0.04, −0.68 ±0.04 and −0.07 ±0.04 for the east, north and up directions, respectively, and it converts to the state fractional Gaussian noises and described stationary stochastic process.

GNSS Time Series Data Analysis for the Second Period
In the second period of data (from 2006 to 2010), 803 epochs were included. Time series of GNSS coordinates frequently contain missing data because of malfunction of GNSS receivers, power failure, removal of abnormal results, etc. The time-series of GNSS coordinates for this period SRVJ has a gap of 55.9%. This is the period with the largest percentage gap. Fig. 7 and Fig. 8 present the second period of GNSS time series data analysis.   Due to large percentages of gaps, differences in velocity estimates and their uncertainties are also insignificant. The seasonal signals (annual and semiannual) have a low impact on the determination of the longterm velocity.
The spectral indices show power spectral densities range from -0.11 to -0.69, for both cases and do not show significant differences spectral indices ( Fig. 9 and Fig. 10), and described stationary stochastic process. This effect derives directly from noise model definition in which the noise amplitude follows a linear power law dependency on the frequency [7].

GNSS Time Series Data Analysis for the Third Period
The next our study was, third period of data (from 2011 to 2015), 1095 epochs were included with the percentage gap of 38.1%. This period has fewer gaps than the previous two periods. During the process of gross errors identification 5.8% of data were removed for this period. Fig. 11, Fig. 12 and Tab. 3 present the third period of GNSS time series data analysis.    (Fig. 13  and Fig. 14).

GNSS Time Series Data Analysis for the Fourth Period
Considering the continuous and uneven distribution of daily measurements of SRVJ stations this period of data has the least percentage gap of 7.1%. As the input data, we used the results of observations, collected within the fourth period from 2016 to 2019. In the fourth period of data 1113 epochs were included.
We used the same methodology as in the previous time periods. Fig. 15, Fig. 16   Spectral indices for the last time series that correspond to the period of the new era are shown in Fig. 17 and Fig.  18. This time series belong to the purest time series with the smallest amount of gaps. Differences between spectral indices of contaminated and cleaned time series are minimal. In this period no improvement significantly in the spectral indices of the power-law noise was noticed for the East, North and Up components, respectively. Also, we did not observe a significant average reduction in the accuracy of stations. The future of GNSS positioning by continuous measurements provided by permanent stations, will lead to the installation of stations in many new places on Earth. Each inequality of operation time span in relation to spatio-temporal analysis should be considered as missing value (gap).
where E α , N α and UP α are spectral indices of North, East and UP, respectively. The results show that a minimum ratio of r = 0.16, a maximum ratio of r = 1.04 and a median of r = 0.57. Maximum ratio is for the third period r < 1 and we noticed that the vertical magnitudes spectral indices are same than the east component for third periods. For other periods had a ratio r < 1, which means that the spectral indices for Up are closer to zero and so closer to white noise at the same time than for North and East components.

RESULTS AND DISCUSSION
Classification and quantification of noise components are of great importance for the time series GNSS analysis. Understanding noise components helps to increase accuracy and the precision of data. We analyzed the noise behavior of four five-year time periods, using the Lomb Scargle spectral algorithm.
The research shows that the color noise is present in the time series of GNSS coordinates. For all four periods, the spectral indices for each component represent fractional Gauss noise, which is considered to be stationary. As long as the spectral index increases, the amplitudes of oscillations also increase. This arises from the fact that any power-law process with α < 0 brings a correlation between amplitudes of seasonal terms and velocity [19].
Cleaning of time series of GNSS observables is a key processing task in estimating the site velocities and their uncertainties. The velocity estimates and their uncertainties are summarized in Tab. 1, Tab. 2, Tab. 3 and Tab. 4. Periodic signals are important for five year time series, whereas the stochastic noise has a significant role when clearing the time series, including seasonal variation. We have shown that the presented model improved uncertainty by 27% in horizontal and by 46% in vertical components. The magnitude changes are up to 0.6 mm yr-1 in horizontal and 0.6 mm yr-1 in vertical components, relative to initial speeds.
Spectral indices estimated from the post fit residuals of contaminated and detrended, without seasonal variation time series, ranged between −0.96 and −0.31 for the east, −1.14 and −0.29 for the north, and −0.42 and −0.05 for the upward direction and they are consistent with the value of flicker noise ( 1 α = − ). They point to the dominance of white noise and isolated flicker noise in some components.
Spectral indices estimated from the post fit residual of cleaned and detrended, including seasonal variation time series, ranged between −0.55 and −0.22 for the east, 0.79 and −0.29 for the north, and −0.40 and −0.07 for the upward direction.
Their weighted means for all periods are 1.69 mm for horizontal component and 6.59 mm for height component, after fitting a linear drift, offsets and seasonal variations cleaned time series. A large WRMS value (up to 25.0 mm) for the height component can result from too large geophysical anomalies, which affect changes in GNSS station coordinates and cause inaccuracies in deformation modelling [25]. Spectral indices are smaller in contrast to the spectral indices calculated from postide residues of contaminated time series where detrending is without seasonal variation. They point to the dominance of the white noise, with no isolated flicker noise.