Segmentation of a Road Vehicle Vibration Signal Using Multiple Comparison Procedures between Paired Samples

: Segmentation of road vehicle vibration (RVV) signals can occur by the need to analyse or synthesise vibrations obtained in passenger cars or on the stowage of vans, trucks. A general and widely used measure to quantify RVV signals is its description via power spectral density (PSD). From a given PSD a Gaussian signal can be generated in a shaker testing laboratory. However, actual RVVs tend to have a non-Gaussian and nonstationary nature, which can be modelled as a composition of different segments, each with a different length and RMS content. For simulation purposes of nonstationary vibration signals, different approaches have been introduced yet each with its unique signal segmentation approaches. The current paper proposes a signal segmentation method implemented in the time-frequency domain in order to find segments within an RVV signal, where each segment has similarity in-between and is dissimilar to neighbouring segments. For this purpose, multiple comparison tests had been utilised between the Short-time Fourier transforms (STFT) applied on given fractions of an RVV. Different countermeasures had been applied against the Type I. error inflation.


INTRODUCTION
A general and widely used measure to quantify road vehicle vibration (RVV) signals is its description via power spectral density (PSD). These can be utilised for packaging testing purposes according to many different international standards, like the International Safe Transit Association (ISTA) or American Society for Testing and Materials (ASTM) [1][2][3].The test signal obtained from PSD profiles via inverse Fourier transform with a uniformly distributed random phase has a stationary and Gaussian nature over time [4]. However, this approach can often be contradictory to the nature of real RVV signals. Many researchers have addressed the limitations of the established (PSD-based) methods [5]. It has been shown for RVV signals that "non-Gaussianity is caused by the nonstationary nature of RVVs" [6] which can be modelled as the composition of "distinct segments each with an arbitrary duration and RMS level, i.e. the vibrations are nonstationary" [4]. Scholars developed different methods for nonstationary simulations, which also encompassed the decomposition of an RVV into stationary segments, thus the necessity of changepoint detections. The introduction of improved methods with the utilised segmentation approaches follows in the next section.
A hypothesis testing-based RVV segmentation method is introduced in the current paper in order to systematically assess the information contained in the time-frequency domain of a given measurement series. The current paper consists of the interpretation of paired sample t-test and the Wilcoxon signed rank test in order to find spectrally similar segments within the waterfall plot of the Discrete Fourier transforms (DFT) calculated per 2 [sec] of an RVV. After the multiple comparison procedures, two different methods will be introduced against Type I. error inflation -which is accountable in multiple hypothesis testing. Different agglomerative diagnostics of the yielded segments will be briefly compared with an emphasis on the post hoc tests.
Vibration-related studies of different subsystems of the pavement-vehicle-package dynamic system may benefit from the current approach. The article discusses previous endeavours from packaging vibration testing, where signal segmentation is a possible approach to find homogenous segments or in other cases, transient events in RVVs. Although simulations are not discussed here, the presented method might be helpful to understand the nonstationary nature of RVVs, before defining non-standard excitation profiles either in shaker testing laboratories or in numerical simulation models.

REVIEW OF NONSTATIONARY SIMULATIONS
Many established vibration testing standards utilise vibration tests based on averaged PSD characteristics, which can describe the average signal power density as a function of frequency, revealing dominant frequency bands and characterise the overall RMS. The high-level acceleration and short-duration events are absent, which give vehicle vibration itself a non-Gaussian distribution [1,2,6,7]. On the other hand, since PSD or DFT describe the signal in frequency domain, they introduce a time-invariant description of a given signal, simulating the vibration as steady-state signal with a Gaussian distribution in the time domain [6].
The simulation of nonstationary vibrations gained increasing emphasis by researches. The Split-spectra decomposition [8,9] as one of the first initiatives decomposes the signal into low and high-amplitude events. Initially, two PSDs had been calculated based on 70% of the RVV with the lowest acceleration levels and the remaining 30% with the highest acceleration levels [6]. Further studies [10,11] used different splitting ratios or even more classes for grouping events. The approach makes it possible to carry out tests on higher acceleration levels for periods. Therefore, the simulated signal -as a whole-is nonstationary and non-Gaussian. However, these subsequent extended periods per se are still stationary Gaussian vibrations [2].
The Random Gaussian Sequence Decomposition method [4,12] produces a nonstationary and non-Gaussian signal for simulation purposes. From a measured PSD generated random Gaussian function is multiplied by a socalled modulation function, where the latter itself is produced from the combination of the distribution of the segments' duration and the RMS value distribution. For the segmentation of RVVs, a cumulative sum -bootstrap algorithm had been applied on the analytical signal, obtained by Hilbert transform in previous studies [6].
Wavelet-based Gaussian decomposition [1,2] uses continuous wavelet transform (CWT) to decompose vehicle vibration signals into Gaussian components, with an iterative process [6]. First, an equivalent random Gaussian signal is generated from the PSD of a measured RVV. The CWT of both the original RVV and the equivalent Gaussian signal is computed. Segments are considered as Gaussian parts, where the CWT of RVV fits within the maximum and minimum envelopes of the CWT of the equivalent Gaussian signal. These Gaussian parts are extracted from the signal. Residual parts of the first iteration are considered as the new RVV. The iteration begins again with the comparison of CWTs of both the (residual) RVV and its Gaussian equivalent. The number of iterations on the first level has a predetermined upper limit. The last residual segments of the iterations are considered as the non-Gaussian part of the signal, which is approximated by the best-fitting Gaussian process. The second level of iterations is initiated to improve the Gaussian fit of formerly resulted "Gaussian" parts.
Bayesian algorithm [13] has been introduced for detecting changepoints to partition measurement series of international roughness index into homogeneous sections with respect to changes in the level and variance and-or the autocorrelation. The Box-Cox transformation had been applied to the series of measurements, prior to an AMOC algorithm used as a Bayesian detector [6].
Shock extraction method has been introduced in [14], the original signal is decomposed into a series of approximated Gaussian segments and one segment containing shocks. The segmentation is carried out via the moving crest factor (MCF) and one-tenth peak (OTP) value. MCF is often used to index transients, but it is "not always reliable and is often not appropriate for signals that contain strong non-stationarities" [15]. Segments are simulated as stationary Gaussian signals which are concatenated to be the total simulated signal. Fatigue life prediction may also be unrealistic, since suffered damages may vary by the sequence of high-and low-or low-and high-stress fluctuations [16].
Actual transport distributions had been decomposed into the sum of weighted Gaussians with an iterative process in [17], which found that three to four Gaussians were enough for the decomposition of random vibrations. A curve-and peak fitting software based on the Levenberg-Marquardt algorithm had been used for the decomposition, which software enabled the "setting of the height and the full width at half maximum of a Gaussian". Different machine learning classifiers had been studied in [18] designed for shock detection in RVV and assessed via ROC-and PERFO curves. It was found that obtaining the classifier's optimal operation point (OOP) via synthetic dataset was inadequate; thus, the re-calibration of the classifiers had been carried out via measured RVVs. Following the re-calibration, the detection capability of the Decision Tree, 20NN and Bagged Ensemble classifiers were found to be equivalent to random guessing, but the Gaussian SVM obtained an area under the curve (AUC) of 0,94.
The discipline of packaging vibration testing lately turned also towards multi-axial motions, like heave, pitch and roll vibrations as in [19], in which those were studied on different transports "to establish the nature and level of the multi-axial vibrations that exist". It was concluded among others that the relationships between the moving RMS of the above-given degrees of freedom were not strongly correlated but could be characterised statistically as joint distributions.

METHODS
The independency of samples must be ensured before evaluating hypothesis tests in the current method. Therefore the autocorrelation function of the acceleration signal had been examined prior in Fig. 1. The acceleration signal is strongly auto correlated within 2 s, which has the practical consequence that the smallest time resolution of the current segmentation algorithm is limited to 2 s, and shorter segments within STFT cannot be detected. Smaller fluctuations of the ACF-exceeding the confidence intervalare also noticeable in higher lags, which are assumed here non-significant, as if the signal was not auto correlated above lags corresponding to 2 s. The given signal in Fig. 2 represents the waterfall plot of the DFTs of an RVV signal measured on a passenger car's cockpit sampled with 1 kHz during a 600 s long journey in the study of [20]. A window frame equal to 2000 samples used for the calculation of DFTs yielded 300 timeframes.
The time-frequency domain is interpreted on an equidistant grid per axis with 0.5 Hz and 2 s resolution. The following calculations are limited to the [0; 100] Hz bandwidth. The DFT vectors have the same resolution over time; thus, it is interpretable as the paired observations of the same frequency bins over time.
The paired sample t-test and its nonparametric variant, the Wilcoxon signed rank test are utilised here to find the dissimilarities between consecutive DFT distributions. Two vectors of the spectrum lines will be compared on the α 0 = 0.10 preliminary significance level. Afterwards, the resulted series of p-values are compared to differently adjusted significance levels.

Paired Sample t-test
Using a one-sample t-test, statistical inference can be made about the null hypothesis that the data comes from a population with a mean equal to μ. The alternative hypothesis can be formulated as that the population distribution does not have a mean equal to μ. The onesample t-test is a parametric test of the location parameter when the population standard deviation is unknown.
Under the null hypothesis, the test statistic has a Student's t-distribution with n -1 degrees of freedom [21]. The corresponding hypothesis pair for a two-sided test is, as follows: H 0 : data has a mean equal μ, H A : data do not have a mean equal μ.
The observations in each group are paired with another observation from the other group. The distribution of differences did not follow a normal distribution from the total 299 sets based on Anderson-Darling tests on 0.10 significance level. The assumption of normality of the differences is harmed in every case. However, paired t-test is "robust" to the assumption of normality [22]. The paper proceeds with the paired t-test despite the possible effects of the outliers, since the reason for outliers being present remains unknown, on the other hand changes in spectral peaks (on the reason of either frequency -or amplitude modulation) are in the major interest of the current examination. Alongside the nonparametric Wilcoxon signed rank test [23] is also implemented in another MCT, which is more suitable to the testing of non-normal data. The differences between each of the paired observations serve as the input to conduct a one-sample t-test, and the nonparametric Wilcoxon signed rank test. The paired sample t-test assessment of the null hypothesis, that that the data in d = x − y, comes from a normal distribution with mean equal to 0. The paired sample t-test is practically a one-sample t-test of the mean of differences between the paired observations, to test the null hypothesis that the pairwise differences d between data vectors x and y has a mean equal to 0.
H 0 : data in d = x − y have a mean equal to 0, H A : data in d = x -y do not have a mean equal to 0.
The p-value is the probability of the test statistic being at least as extreme as the one observed, given that the null hypothesis is true. The choice of α is somewhat arbitrary, although in practice values of 0.10, 0.05 and 0.01 are commonly used. The significance level will not be considered here as a part of either the acceptance or the rejection areas. It is also examined that no p-value falls on the significance limits in Fig. 9.
After completing STFT with 2 s window lengths resulting in Fig. 2, the current design can be considered as the paired observations of DFT values a i in the 2i-th secundum, in-between 0.5 Hz resolution. Let j denote the j-th difference vector and the j-th hypothesis pair takes the form:

Wilcoxon Signed Rank Test
The nonparametric Wilcoxon signed rank test is utilised here for two populations being paired. The test statistic W is the sum of the ranks of positive differences between the observations in the two samples, j d .The returned   W j p -value corresponds to a paired, two-sided test for the null hypothesis that j d comes from a distribution with a median equal to 0. In case the sample size is large, or the method is approximate, the test function calculates the p-value using the z-statistic [21].

α Adjustment
The preliminary significance limit is set to α 0 = 0.10 in current example. Even though a high number of tests found significant a priori, not all of them may be considered truly significant due to α inflation. This section introduces the Type I. error inflation and the following two sections deal with two different post hoc procedures to control the familywise error rate (FWER).
The Type I. error rate or significance level is the probability of rejecting the null hypothesis, given that it is true [24]. If the null hypothesis is false, then it is impossible to make a Type I. error [25]. The incorrect rejection of the null hypothesis is referred to as a false positive. The probability of correctly "accepting" a true null hypothesis equals S = 1 − α and is called specificity.
The incorrect acceptance of the null hypothesis is called Type II. error [24], which can only occur if the null hypothesis is false, and the probability of committing this error is called β. This second type of error is often called a false negative. The probability of correctly rejecting a false null hypothesis is P = 1 − β and is called power. Types of the possible statistical inferences are followed in Tab. 1. In the simultaneous multiple comparisons of more than two groups, e.g. A, B, C the following j = 1, …, m hypotheses can be formulated, and the set of comparisons is referred to as a family of test. In the current case, m = 299 comparisons per MCT have been made.
Tab. 2 introduces the framework for simultaneous hypotheses testing [26], given at least two hypotheses to be tested.  When more than one hypothesis has been simultaneously tested the probability of committing false statistical inferences increases considerably. Utilising the same Type I. error rate in an increasing number of comparisons will result in an ascending familywise error rate 1 Eq. (2) is also known, as inflation and is visualised in Fig. 3   & ; & A B B C ). Therefore, a consecutive setup conforms best the approach of this paper.

Bonferroni Adjustment
The Bonferroni method is one of the most commonly used methods to control FWER [26]. Based on the Bonferroni inequalities [28], an approach is described for "constructing sets of simultaneous confidence intervals for the means of variables which follow a multivariate normal distribution" [29]. The Bonferroni method is a simple method that allows many comparison statements to be made (or confidence intervals to be constructed) while maintaining an overall confidence coefficient. With an increasing number of hypotheses to be tested the Type I. error inflates, therefore the significance level is divided into the number of hypotheses tests and Type I. error is lowered in this manner. In this single-step adjustment  is the preliminarily chosen significance limit and m denotes the number of comparisons. The more hypotheses to be tested, the more stringent the criterion and the lower the Type I. error per comparison, but the lower the power of the test, as well.

Holm-Bonferroni Adjustment
Based on the Bonferroni method, Holm adjustment  [27]. The procedure similarly to [26] is, as follows.
The comparison stops at the first

RESULTS
Multiple paired t-test and Wilcoxon signed rank test of consecutive DFT vectors i a have been assessed to find similarities in the mean tendencies of neighbouring DFT vectors of the waterfall plot. The i a values have been considered as paired observations of the same frequency bins over time.
The initial segmentation processes showed high fragmentation of the signal, due to the high fluctuation of p-value around α 0 preliminary significance limit, as in Fig.   4     Since the significance limits aim to lower the FWER, i.e. Type I errors, many of a priori t p -and W p -values from different tests were considered non-significant post hoc. Fewer truly significant findings within the same period mean that the ratio of longer segment durations increased.
The segment lengths L and -root mean squares RMS characterise the resulting segments themselves, offering the possibility for the assessment of new aggregated statistics regarding their attributes, presented in Fig. 8. Note that the two different post hoc procedures gave the same solutions within either the t-test or the Wilcoxon signed rank test, which involves that the same hypotheses had been assessed as significant findings by the different α adjustments. However these identical results per post hoc procedures are not necessarily identical, which is further explained in the Discussion and in Fig. 9.
Segments lengths L tend to have still many shorter durations with only a few subseries from more extended periods in both post hoc procedures. The RMS distributions show a not so fragmented picture in total. The testing of the shape of distributions is not investigated here, since the 10 min duration of current measurement is not considered significant here to draw wide-term conclusions about RVVs in general.

DISCUSSION
The identical results of the two different α adjustment processes can be attributed to the same p-values in the hypotheses rejection areas per MCT, which is not a general conclusion, rather than an individual result of the current experiment. Namely, the two adjustments do not have to give necessarily the same solution (per MCT).
The most condensed explanation of this can be comprehended in Fig. 9 (6) where the first subindices t or W denote the paired t-test and Wilcoxon signed rank test respectively; the second subindices B or HB denote the Bonferroni and Holm-Bonferroni post hoc procedures respectively. However, the results of segmentation approaches differentiated by the type of MCT (t-or Wilcoxon-test) do differ, which is explained by the different p-values for the hypothesis between the same neighbouring DFT vectors paired. Note in Fig. 8 how dotted, and solid lines differ at each colour, which is the consequence of the test statistics being different in the t-test and its non-parametric variant the Wilcoxon signed rank test.  Comparison of the ascendingly sorted p-values from the t-tests and the Wilcoxon signed rank tests, denoted by blue and purple lines respectively; displayed around the Bonferroni-and the Holm-Bonferroni adjusted levels, denoted by orange and yellow lines respectively.

CONCLUSIONS
A multiple comparison test (MCT) procedure has been introduced in the current paper in order to find spectrally similar segments within the time-frequency domain of a road vehicle vibration (RVV) signal. The spectral values of neighbouring DFT vectors α i have served as the inputs of paired t-test and its non-parametric variant the Wilcoxon signed rank test. Two different post hoc procedures, namely the Bonferroni and the Holm-Bonferroni adjustments were introduced after both types of test, as countermeasures against the accountable α inflation in case of MCT.
Following studies will investigate the use of different statistical tests for MCT, such as the two-sample t-test and its non-parametric variant the Wilcoxon rank-sum test. However, besides the tests of central tendencies of DFT vectors, their spectral shape can be considered as well. In the latter case, the Kolmogorov-Smirnov type test seems an applicable approach.
The current article introduced and utilised two FWERcontrolling procedures, but the family of FDR controlling methods holds other opportunities in post hoc procedures, as well. Still, the relationship between the number of pvalues and the post hoc procedures must be assessed; and the sensitivity analysis of MCT shall be implemented in case of "small" amplitude and/or frequency modulations within the STFT.
The current article introduced the establishment of a novel segmentation algorithm, whereas the procedures were capable of finding spectrally similar segments within the time-frequency domain of a vibration signal. Similarities of consecutive time-slices of the STFT waterfall plot were assessed based on mean tendencies. The MCT approach to RVV segmentation purposes seems a promising solution, which the author finds worth to study further.