AN ENHANCED METHOD FOR EXTREME LOADS ANALYSIS

The analysis of time records, coming from seakeeping experiments in irregular waves, is used to determine the occurrence of extreme events. The common procedure used for data analysis is to assume that the statistics of record’s peaks is following two or three parameters Weibull distribution. For particularly severe sea states it can happen that the peaks assume a multi-modal distribution. In this case, a Weibull distribution is not suitable, also in the three parameters form, to reproduce the peaks population. That means some errors in the estimate of the extreme loads may occur, affecting consequently the vessel/structure design process. To overcome this source of error it is possible to use multi-modal distributions, or to change the peaks extraction technique, adopting a certain threshold. By using this second approach data should be fitted according to a Generalised Pareto distribution. Based on this theory, a data analysis procedure including the threshold selection is here proposed and tested on a set of time records coming from seakeeping model-scale experiments. The results are then compared with the standard Weibull approach.


Introduction
The execution of model scale tests or numerical simulations is a useful support to design process of an offshore vessel/structure, resulting in the estimation and prediction of the extreme values for motions and loads in harsh environmental conditions.To properly model the extremes starting from the sampled data of a time series, it is not common to analyse the entire record, but to consider only the peaks.There are basically two different ways to extract the peaks from a time series, and, according to the selected method, different distributions can be referred to.The theory of the extreme [1] [2] [3] suggests that the Generalised Extreme value Distribution (GED) must be used when all the peaks are considered, and in particular the subcase of the Weibull distribution [4].Other authors as [5] suggest that, for the specific case of a severe sea state, a general Gamma Distribution is also suitable to describe the maxima.If only some of the peaks are considered, and a certain threshold is specified for their selection, the Generalised Pareto Distribution (GPD) should be used [6].
When severe sea states are analysed or complex structures are experimentally tested, the peaks extracted from the time record of some measurements may show a multi-modal behaviour.In such a case the standard analysis, based on two or three parameters Weibull distribution as recommended by ITTC [7], is not able to fit correctly the population [8].In this case, by considering all the maxima, a better approximation of the peaks distribution can be obtained by using Mixed-Weibull distributions [9].This approach shows that the extreme loads values are much lower than the ones predicted by the standard analysis.
Whether a mixed distribution approach may result to much complicated for data fitting and extremes calculation, an alternative is given by considering only the peaks above a certain threshold and then to perform the extreme value analysis with a GPD distribution.This second option has been presented in [10] for a single time series.An enhanced analysis with a set of data records, obtained from seakeeping tests in a severe sea state, is here reported together with an overview of the implemented procedure.Final results are then compared with the standard ones [7], highlighting that the newly implemented procedure is estimating a consistently lower value for the extreme loads, thus giving a completely different input to the designer from the analysed records.

Extreme values theory
The extreme values theory is of upmost importance to properly select the best distribution for modelling the maxima of a certain variable.For this reason, it is necessary to correctly apply this theory while estimating the extremes in the data analysis of seakeeping tests and numerical simulations.The general theory is widely applicable in all the fields where data analysis is required and is giving the indications for the right selection of the distribution to adopt, according to the approach selected for the peaks extraction from the time record.
In fact, the most important distinctions about the possible distributions at hand that can be used for the extreme values determination are given by the alternative ways to extract the peaks from a time series.

Peaks extraction techniques
The first operation that must be carried out in the analysis of extreme values is to extract the peaks from the data set record.According to the standard procedures used in the data analysis, mainly two different kinds of extraction can be made: the Block-Maxima and the Threshold Value Maxima.The first method is considering the maxima on several periods inside the records or, once the period is coincident with the sample time, extracting all the maxima of the record.On the other hand, it is possible to consider all the peaks above a certain threshold value (u), so the method is also known as Peaks Over Threshold (POT) technique.
In the common applications regarding seakeeping and offshore tests and simulations the Block-Maxima method is widely used, while POT technique is more related to research fields where the tale modeling of a population is mostly considered instead of the entire one.In Fig. 1 the differences between the two methods are shown on a time record.
The selected method for the peaks extraction will have an impact on the way the entire analysis has to be carried out.In fact, according to the extraction method, different distributions should be selected according to extreme value theory.In the case that the Block-Maxima method is selected for peaks extraction, the limit law to identify the maxima distribution is given by the Fisher-Tippet-Gnedenko theorem [1], stating that the GED should be used to describe the phenomenon.
Briefly, GED can be expressed with the following cumulative density function: where: (2) and: The three real constants, appearing in equations ( 2) and (3), are the shape parameter β, defined in (0,+∞), the scale parameter η, defined in (0,+∞), and the location parameter γ, defined in (-∞,+∞).According to the value of the shape parameter, inside GED distribution, three particular sub-cases are obtained: the Weibull, the Gumbel and the Frechet distribution.The main difference between the distributions is related to the value assumed by the shape parameter.The Gumbel distribution, having the shape parameter equal to zero(β=0), is used once the data are supposed to follow an exponential distribution.Frechet distribution, obtained by positive shape parameter(β>0),but with a reversed formulation of the distribution(changed sign of x axis),is used for particular populations presenting a significant amount of data in the tale (so called fat-tale distributions);in fact, this distribution is exhibiting a slower decay when increasing the variable values, compared to other distributions.The Weibull distribution, with a positive shape parameter(β>0), is representative of all the cases not covered by the previous two alternatives and is, therefore, widely used for several engineering problems such as defect data analysis, weather forecasting and, as already mentioned, for the prediction of extreme loads in seakeeping and offshore experiments [7].
Considering equation ( 2) as general law for GED, in Fig. 2 a comparison between the three GED sub-cases is presented, showing the Weibull distribution in his standard formulation.However, adopting equation ( 2), Weibull distribution will result in his reverse form.Here the Weibull will be always used in his standard form.

Generalised Pareto Distribution (GPD)
By changing the peaks extraction technique and applying the POT method, the aim of the analysis is to estimate the distribution function Fu of x values above a certain threshold u.Therefore, Fu is called conditioned excess distribution function, and, applying the Pickands-Balkema-de Haan theorem [11] [12], a suitable form can be identified in the GPD distribution.
Then, according to the above theorem, it is possible to represent the GPD with the following cumulative density function: Shape, scale, location parameter and z are defined as per equations ( 2) and ( 3), but with the following limitations: β is defined in (-∞,+∞), η is defined in (0,+∞) and γ is defined in (-∞,+∞).It should be noted that the shape parameter γ in equation ( 4) can be also identified by the threshold u.
Also GPD distribution is presenting typically three different behaviours, related to the value of the shape parameter.In Fig. 3 an overview of the distribution is given.

Weibull analysis
In common engineering problems related to data defect analysis, Weibull distribution is widely used to predict the time to failure of a certain component.Also in seakeeping and offshore experiments the same distribution is successfully applied to predict the extreme values of a certain variable, starting with the peaks extracted from a sample record.This distribution is capable to evaluate the occurrence of those peaks and, consequently, to extrapolate the extreme values of the selected variable.The procedure is typical for extreme loads, forces or wave heights for long-term predictions [13].
As per ITTC recommendations [7], all the maxima of the selected record should be extracted with Block-Maxima method considering the sample time as time interval.Then, in accordance with the extreme value theory and Fisher-Tippet-Gnedenko theorem, the Weibull distribution is suitable to describe the selected peaks population.
Weibull distribution can be expressed as function of two or three parameters.However, for standard cases, the two parameter formulation is usually adopted, discarding the location parameter γ.As particular case of a GED distribution, the formulation of equation (1) can be used; however, it is common praxis to change the sign of x axis.
In such a case the two possible formulations that can be adopted for the cumulative density function of the Weibull law are: where z is defined as per equation (3).Equations ( 5) and ( 6) are expressing two and three parameters Weibull distribution respectively, and differ from equation ( 1) just for the sign of x.
The two parameters representation of the distribution is usually selected, because highly simplifies the data analysis.In fact, the graph of function (5) becomes linear by adopting a particular kind of Q-Q plot, which is called the Weibull plot.The linearization is simply obtained by considering on the x axis ln(x) and on the y axis ln(-ln(1-F(x))), see Fig. 4. The cumulative density function of a three parameters Weibull distribution is exactly the same as the two parameters one, except for a shift in x-direction due to the location parameter γ.Then, by taking into account that the two parameter Weibull distribution is defined for x > 0, it follows that three parameters distribution is defined for x > γ.For the definition of GED there are no limitations in sign for the location parameter, means that x can then assume also negative values.
It is also possible to draw the three parameters distribution as a straight line on a Weibull plot, in this case ln(x-γ) should be considered instead of ln(x) on the abscissa axis.Otherwise the three parameters distribution is showing a convexity or a concavity on the standard Weibull plot.Carrying out the linearization process of equation ( 6) as per equation (5), it follows that the sign of γ is affecting the shape of the distribution on the Weibull plot.

Parameters determination
According to the selected, two or three parameters distribution, an appropriate way to determine the regression coefficients must be selected.There are several methods available for the data regression analysis, but the difficulties of parameters estimation increase by increasing the number of coefficients to be determined.
The easiest case is represented by the two parameters Weibull distribution where, as the name of the distribution suggests, only two parameters need to be estimated.However, once the number of unknowns increases, the standard estimation methods are no more adequate to solve the problem [9].In [10] the most common techniques for parameters estimation have been compared with enhanced techniques based on genetic algorithms, showing that the genetic algorithm approach is comparable with the standard procedures like Maximum Likelihood Estimation, Method of Moments and Least Square Fit methods.
For this reason the genetic approach has been adopted in this study for all the distributions, i.e., the standard Weibull and the GPD analysis.

GPD analysis using POT
A different approach to the extremes analysis consists in the extraction of the peaks having a magnitude higher than a pre-determined threshold value.Adopting this kind of procedure (POT), the validity of the Pickands-Balkema-de Haan theorem implies that a GPD distribution should be considered instead of the standard Weibull distribution with two or three parameters.
Considering the cumulative density function of the GPD given in (4), it can be observed that it is defined for x>γ when β>0, otherwise for γ<x<γ-η/β.It is not possible to state a-priori whether β will be positive or negative.Analysing the shape of the function it seems reasonable that the trend will follow a slope like the Weibull one, with a positive β.
As for the Weibull distribution, several methods for parameters estimation can be found in literature, and basically are the same mentioned in the above paragraph for Weibull distribution.In fact methods like maximum likelihood, moments or least square fitting are commonly used also for the GPD parameters estimation.Therefore, due consistence with the Weibull analysis, the same procedure based on genetic algorithm has been here used.An additional issue for the GPD is the selection of the threshold u, which is the starting point of the whole procedure.

Threshold selection
Adopting GPD to fit the peaks distribution from a sample record, implies that a suitable threshold value should be found, ensuring that the approximation given by Pickands-Balkemade Haan theorem is applicable.The threshold selection must also take into account the fact that a sufficient number of events must lie above the selected value, in order to ensure a sufficiently accurate estimation of the unknown distribution parameters.
A suitable method to do that is based on the adoption of the sample mean excess function.Even though this is a simple procedure, it is currently considered [14] one of the most appropriate one for the threshold selection.The sample mean excess function is defined by: means that the function ( 7) is obtained by the sum of the excesses above the threshold, Xi-u, divided by the number of data exceeding the threshold itself.
The sample mean excess function is an empirical estimate of the mean excess function, which is defined as: By adopting this definition, the mean excess function is representative of the expected occurrence of threshold exceedance.In any case, for signal analysis, the sample mean exceedance is adopted according to equation (7) and represented as function of the threshold value as in the example of Fig. 5. Several authors [1][15] give the interpretation of the sample mean excess plot, stating that once the excess function is assuming a reasonably straight line than the distribution will follow a law like the GPD.Since the signal is coming from a set of measured records, it is not possible to observe really a straight line in the plot, especially when they are representative of non-linear phenomena.For this reason, it is common to assume as indicative thresholds the points where the excess function is changing slope [16], namely u2 and u3 in the figure .By considering Fig. 5 as an example, it is possible to observe more than one change in slope of the function, means that the above mentioned rule is not able to determine a single threshold value.For this reason, it has been selected to choose the last change in slope of the sample mean excess function and u3 has been assumed as the threshold value u to adopt for the present data analysis.In this way, the multi-modality has been excluded.

Extreme Values Determination
For every kind of distribution that can be selected for the extreme value analysis, the extreme values can be determined from the fitted law.To do that, the quantiles (inverse cumulative distribution) at the desired probability of occurrence p must be determined.
To obtain useful design information from the extreme analysis, the events with probability p of 3%, 1% and 0.1% have to be extracted.To know these values, the quantiles should be determined in one of the following forms: The above equations are representative of the two, three parameters Weibull and of the GPD distributions respectively.In the particular case of equation (11), n represents the total number of record samples and Nu is the number of samples exceeding the selected threshold value u.

Test cases
With the aim of evaluating the differences between the standard extreme value analysis based on Weibull distributions and the procedure based on POT peaks extraction technique, reference has been made to three records concerning measurements of structural loads acting on an offshore vessel.
The data refer to an irregular fully developed sea, and the measured forces are representative of a single test.Three forces acting on on-board components were measured in laboratory.Those data were selected because the magnitude of the predicted loads with a standard Weibull procedure was extremely high and consequently the design of the final structure resulted over dimensioned according to assumed designer's criteria.
The sample records are presented in Figs 6-8, and are representative of a model test having 3 hours of recording in full scale and simulating a hypothetical storm condition.From the peaks extraction according to the Block-Maxima method a multi-modal behaviour can be seen for all the three distinct forces records.The multi-modal behaviour can be easily recognised in , where the points represent the effective peaks distribution coming out the extraction process.Thus, to overcome the multi-modality of the sample, it has been decided to test the POT method, in such a way to cut-off the populations beyond a certain threshold.

Analysis of the results
By adopting the above described procedure with the Block-Maxima extraction, the extreme values of the populations have been calculated adopting 2 and 3 parameters Weibull distributions, with reference to equations ( 9) and ( 10) respectively.The values with reference to p= 3.0%, 1.0% and 0.1% have been extracted as usual.
On the other hand, with reference to equation (11), extreme values for the same p have been extracted on the peaks population, generated with the POT technique, according to the GPD distribution.For the specific case of GPD distribution, the threshold values u have been selected according to the sample mean excess procedure, resulting in different thresholds for each analysed force record.The regression procedure, including the parameter estimation with the genetic algorithm [17], leads to extreme values reported in Tables 1, 2 and 3 for Force1, 2 and 3 respectively.In the same tables also the regression coefficient of determination R 2 and the sum of square errors (SSE) are reported.
The coefficient of determination has been calculated according to the following formulation: (12) where: and where yi are the record data points, y is the mean of the record and fi are the fitted values coming from the genetic regression algorithm.
It can be noted that, especially for p = 0.1%, the extreme values predicted for Forces 1 and 2 are much higher in case of the standard analysis (2 and 3 parameters Weibull distributions) compared with the GPD results.In the above mentioned cases, the 3 parameters Weibull distribution is more pessimistic than the 2 parameters one.This is essentially due to the bi-modal nature of the two selected records.In fact, analysing the distributions in Figures 9 and 11, it can be seen that both the Weibull regressions are strongly influenced by the lower peaks and, therefore, are generating a poor fit for the higher values.The GPD distribution, thanks to the threshold selection, is cutting off the low peaks population, granting a good fitting for the rest of the data.
From the previous analysis itis obvious that the lower peaks are responsible of the bad fitting of the Weibull distributions.In this respect, it will be straight forward to develop the idea of discarding them in the regression analysis and to use once again the standard Weibull method with the reduced population.However, this simplification in reduced population analysis is fully wrong, because by adopting a threshold value, according to the Pickands-Balkema-de Haan theorem, the GPD must be used and, therefore, the Weibull distribution is not suitable for the fitting.
Different is the case of force2, where the values coming from the different regressions are comparable, except for the 3 parameters Weibull.In this particular case, the lower peaks are influencing the behaviour of 3 parameter Weibull distribution like for forces1 and 3, but the GPD is not able to follow the higher peaks like in the other cases.
In fact, with reference to Fig. 10, the fit adopting the GPD is following the same slope of the 2 parameters Weibull distribution.Looking at the distribution with more detail, it can be seen that the higher peaks distribution is changing slope above p = 1.0%, meaning that probably a higher threshold value could be more suitable to use for the selected case.
However, by selecting a threshold higher than p = 1.0%, no extreme value estimation could be done for p = 3.0% and 1.0%, which are still a matter of interest for the designers.Such a kind of behaviour for Force 2 suggests that in the specific case a multi-modal distribution is present also in the higher part of the peaks data, leading to the impossibility for the GPD to accurately fit the tale of the entire peak distribution.A possible solution for this particular problem could be a data resolution similar to the one proposed by [9] for the Weibull distributions, but applied to the GPD, and hence leading to a mixed-GPD distribution.

Conclusions
Once data coming from a seakeeping experiments in irregular waves, especially for severe sea state conditions, are analysed to search the extreme values of certain forces or motions, standard techniques could lead to a wrong modelling of the phenomenon.This is particularly true when peaks distributions present a multi modal-behaviour.In such a case different techniques should be used to extract the peaks and analyse the results.
A procedure based on the mathematical definition of the POT method has been established adopting the GPD as reference distribution for the extreme values calculation.The newly implemented procedure has been compared and concurrently applied with standard methods available in offshore industry on three test cases where the multi modal behaviour was evident.In order to ensure the same uncertainty in the regression parameters evaluation, the same method based on genetic algorithm has been used to perform all the function fittings during this study.
As final result, the procedure based on POT is, in most cases, giving a better data fitting with respect to the standard methods.However, in certain cases, as in case of Force 2, the proposed method is not matching really well the peaks distribution, being comparable with the standard procedures, probably because a multi-modal behaviour is present even over the selected threshold value.In any case, once the regression is executed in an area where a subpopulation is predominant with respect to the others, the proposed method is matching the peaks distribution better than the standard methods, giving a strong impact on the extreme data prediction.Additional research is also required to study whether the multi-modal behaviour of the measured quantities is associated with the effect of second order forces.

Fig. 1
Fig. 1 Block-Maxima (top) and POT (bottom) peaks extraction from a time series 2.2 Generalised Extreme value Distribution (GED)In the case that the Block-Maxima method is selected for peaks extraction, the limit law to identify the maxima distribution is given by the Fisher-Tippet-Gnedenko theorem[1], stating that the GED should be used to describe the phenomenon.Briefly, GED can be expressed with the following cumulative density function:

Fig. 3
Fig. 3 Shape parameter effect on the GPD distribution.

Fig. 4
Fig.4 Weibull analysis of a time record.

Fig. 5
Fig. 5 Example of sample mean excess plot with different threshold values.

Table 1
Extreme values of Force 1 according to different distributions.

Table 2
Extreme values of Force 2 according to different distributions.

Table 3
Extreme values of Force 3 according to different distributions.