Analysis of a class of adaptive robustified predictors in the presence of noise uncertainty

Original scientific paper A new class of adaptive robust predictors has been considered in the paper. First an optimal predictor is developed, based on the minimization of a generalized mean square prediction error criterion. Starting from the obtained result, an adaptive robust predictor is synthesized through minimization of a modified criterion in which a suitably chosen non-linear function of the prediction error is introduced instead of the quadratic one. Unknown parameters of the predictor are estimated at each step by applying a recursive algorithm of stochastic gradient type. The convergence of the proposed adaptive robustified prediction algorithm is established theoretically using the Martingale theory. It has been shown that the proposed adaptive robust prediction algorithm converges to the optimal systems output prediction. The feasibility of the proposed approach is demonstrated by solving a practical problem of designing a robust version of adaptive minimum variance controller.


Introduction
Tasks typically related to the modern systems theory are control, signal processing (filter design) and prediction.These tasks have been extensively studied in control theory, communication theory, signal processing and statistics [1 ÷ 6, 14, 22].A wide variety of techniques have been developed for solving problems involving these tasks.The basic requirement of all such techniques is, on the one hand, maximum use of available a priori information about the properties of the system.This requirement involves the adoption of an appropriate presentation of the system (i.e. its mathematical model).On the other hand, an important practical requirement is robustness of the developed procedures in terms of insensitivity to departures from the assumptions from which they are derived (unmodeled dynamics, absence of full knowledge of noise statistics, nonstationarity, etc.).Moreover, in practice it is necessary to be concerned with outliers arising from many reasons, such as meter and communication errors, sensor failures, incomplete measurements, errors in mathematical models, etc. [7,8,9].These have very detrimental effects on the statistical estimation schemes based on the Gaussian stochastic disturbance models [10,11].Therefore, from the practical point of view, it is very important to analyse robustness properties of adaptive estimation schemes in the presence of outliers.Robust alternatives abound in the robustness literature [7,8,9].Although there are many meanings of the word "robust", its purely data -oriented version is the word "resistant" [10].Namely, an estimate is called resistant if changing a small fraction of the data by large amounts of results in a small change to the estimate.This requirement is one of insensitivity to outliers.In addition, one may also insist that small changes in most of the data result in only small changes in the estimates.This requirement is one of insensitivity to rounding, grouping and quantization errors, or patchy outliers.Furthermore, the term robustness also has a probabilistic meaning, and at last three distinct probabilistic notations of robustness can be perceived.The oldest and most accessible is that of efficiency robustness.Namely, an estimator is said to be efficiency robust if it has high efficiency, say greater than 90 %, at a nominal Gaussian model, and high efficiency at a variety of strategically chosen non-Gaussian distributions [10].Another notion is that of min-max robustness over a family of distributions [7,11].Typically, the family of distributions is infinite, and asymptotic variances are used as quantifiable performance costs.Finally, the third form of robustness is qualitative robustness [8].This is, in fact, a continuity requirement which is the probabilistic embodiment of the notion that small changes in the data should produce only small changes in the estimates, where small changes involve both large changes in a small fraction of the data and small changes in all the data.An important concept of this robustness is that of the influence curve, which measures the perturbation of the estimate caused by a single additional observation, the so-called contamination [8].Unfortunately, the highly technical character of min-max robustness and qualitative robustness makes them relatively inaccessible to applied workers.However, one can make estimation procedures having readily apparent resistance properties, along with desirable efficiency robustness.The robust adaptive prediction algorithm proposed in this paper is based on the last approach, involving resistant robustness property, along with efficiency robustness.
The paper first develops the optimal structure of the predictor, based on a generalized mean square prediction error [12].Then, in continuation, a robust adaptive onestep predictor is synthesized based on the minimization of a corresponding non-linear criterion.This criterion is derived from the generalized mean square criterion through substitution of the quadratic function by a suitably chosen non-linear function.Starting from the efficiency robustness property and practical importance of achieving insensitivity to outliers contaminating the Gaussian disturbances, this nonlinearity should look like the quadratic function for small value of the argument, whereas it has to grow more slowly than the quadratic function for large values of the argument.Furthermore, the resistant robustness property requires that the criterion function derivative be bounded and continuous, since boundedness provides for no single observation to have an arbitrarily large influence, while continuity provides for patchy outliers not to have a major effects.The unknown parameters of the proposed adaptive robust predictor are estimated in each step by applying a recursive algorithm of the stochastic gradient type [11,13].The convergence of the adaptive robustified predictor algorithm is established theoretically using the Martingale theory [2,15,16,17].Starting from a linear singleinput/single-output ARMAX system representation, it has been shown that the proposed adaptive robust one-step ahead prediction converges, in the Cesaro sense, to the optimal prediction of the system output.The derived theoretical results are used to design a robust adaptive minimum variance controller.

Problem statement
The optimized one-step predictor minimizes the criterion based on the generalized mean square prediction error of the form [12] (1) where the prediction error is and ( ) ŷ i is the one-step prediction of the output from the system ( ) ( ) suitable properties are pre-selected.Specifically, for a dynamic system described by the ARMAX model the relations between the systems output, y(i), systems input, u(i), and noise or disturbance, e(i), are expressed by [1÷4, 15,22].
where q −1 is the unit delay operator, q −1 y(i) = y(i−1), and k is the process delay.Here A, B and C are polynomials defined by expressions ( ) ( ) ( ) Noise e(i) is a white discrete random sequence of zero mean value and variance σ 2 .
For the system defined by Eq. ( 3), the one-step optimized predictor ( ) * ˆ1 y i i − , which minimizes the performance index (1), is defined by the equations (more details are in Appendix A) where the polynomial Here the polynomial ( ) The generalized prediction error is defined by the relation and the criterion (1) reaches the minimum Relation (5), which defines the optimized predictor, may be written in a more compact form as If one denotes the filtered quantities of the input, ( ) and output from the system, ( ) , then the relation (10) reduces to Using ( 4) and (7), relation (12) may be written in the linear regression form as ( ) where the regression vector is and θ is the predictor parameter vector, which contains the coefficients of the polynomials C(q −1 ), B(q −1 ), G(q −1 ), respectively, that is In practice, parameters θ of the optimized predictor (13), ( 14) are generally unknown.The basic requirement is to define a parameter estimation procedure based on available measured data on the input to and output from the system, arriving at the adaptive form of the predictor.

A new robust adaptive predictor
Since the criterion (1) weights all prediction errors (2) equally, one can expect that it will be susceptible to outliers and hence be non-robust.One simple way to robustify estimation procedure is to use a non-linear criterion instead of the quadratic one in (1), where ( ) • is a robust score or loss function that has to suppress the influence of outliers.Let us now show why criterion (1) can be generalized by relation (15).Namely, the expression for the mean square criterion gradient follows from relation (1), and by equating it to zero we derive the condition for the minimum of the criterion Given that Eq. ( 8) applies to the optimized solution in ( 16) for θ= θ*, which corresponds to the criterion minimum, and since ( ) e i and ( ) random quantities, in the general case it is possible to introduce a non-linear function of the generalized prediction error, so that the optimality condition ( 16) is still satisfied.In order to achieve this requirement, this function has to be equal to zero for zero-value arguments and features an even property.Moreover, as mentioned before, with regard to the practical importance of achieving insensitivity to outliers contaminating the Gaussian disturbances, an even loss function ( ) 15) should look like the quadratic one for small values of the argument, whereas it has to grow more slowly than the quadratic function for large values of the argument.In addition, a resistant robustness property requires that the loss function derivative ( ) ( ) • ψ be bonded and continuous.This corresponds, for example, to the choice of the Huber's loss function [7,9] ( ) where c 1,2 are appropriately defined constants and Δ is chosen to give the desired efficiency at the nominal zeromean Gaussian noise sequence {e(i)} in (3), with variance 2 n σ .Therefore, the tuning parameter Δ in (17) has to be chosen so as to provide the desired efficiency at the nominal Gaussian model.A common choice is Δ = 1,5, known as the 1,5-Hiber's robust procedure [7].This is, in fact, an efficient robustness requirement.On the other hand, the derivative Ψ of the H-function in (17), the socalled influence function in qualitative robustness [13,14], is given by ' min max , and is bounded and continuous.This is, in turn, a resistant robustness requirement.Thus, the choice of the Huber's loss function in (17) provides for resistant robustness, along with an efficient robustness property.Application of the Robbins-Monro approach [11,13] results in a stochastic gradient algorithm for estimating unknown vectors of parameters θ of an adaptive robust predictor, where the gradient of criterion defined by Eq. ( 15) is given by In relation (19) the non-linear function ( ) ( ) (15), (17).The form of the stochastic gradient algorithm is [11,13,17] • ψ is given by ( 18), (19).
To accelerate convergence of algorithm (20) in the neighbourhood of the minimum of functional (15), gain ( ) i γ can be multiplied by the positively definitive matrix, resulting in a Newton -Raphson type algorithm [13,16] Matrix ( ) R i is Hessian and is determined by recursive relation [13,16,17] where ( ) ( ) Eq. ( 22) follows from the fact that the second derivative of criterion (15), defining the Hessian matrix, is given by Moreover, in the vicinity of optimal solution * θ ≈ θ , the Eq. ( 23) can be approximated by the relation since then ( ) ( ) , due to the fact that mean value of ( ) i e is zero.Finally, let us approximate the mathematical expectation (24) by the corresponding arithmetic mean, The last relation represents the Eq. ( 22) with ( ) Let us assume further that in algorithm ( 21), ( 22) the scalar factor is ( ) and let us introduce the matrix ( ) ( ) Let us also note that the algorithm ( 21), ( 22) was derived for the general model.Since this research considers an ARMAX model (3), it is possible to determine the prediction derivative ( ) ( ) Moreover, one often resorts to approximation [11,13] ( ) ( ) The approximation (25) reflects that the implicit dependency of vector ( ) Z i on predictor parameter θ is disregarded.If one adopts the matrix trace symbols ( ) ( ) , and replaces the matrix gain factor ( ) i R in (21) with the scalar gain ( ) i r , the definitive form of the algorithm of stochastic approximation type for adaptive robust predictor parameter estimation is given by The Eq. (27) follows from (22) after introducing ( ) and replacing the matrix ( ) R i with the previously defined matrix ( ) i R , as well as by taking the matrix trace operation on the so obtained relation.The Eq. ( 28) and (29) are like before derived equation ( 2), ( 13) and ( 14) respectively.
Eqs. (26) ÷ (29) define the adaptive robust predictor.This algorithm represents a compromise between rate of convergence and computational complexity.The next task is to analyse the convergence of algorithm (26) ÷ (29).It is important because of the definition of stringent conditions in which the algorithm is applicable.

Convergence analysis
The convergence property of the proposed adaptive robust predictor can be investigated using the Martingale theory [2,16,17,18].The basic convergence result is the lemma of Neveu [18].The result is restated in a number of forms that suite better in specific theoretical analysis.A unified treatment of a number of almost sure convergence theorems, based on fact that the processes involved possess a common "almost super-martingale" properties, has been proposed in the literature [19].To be precise, let

{ } (
) Then, the following theorem can be proven [19]., In addition, the following propositions on sequences are frequently used in establishing convergence results [20].
The results of Theorem 1 and Lemma 1 can be used to prove the convergence of the proposed adaptive robust predictor (26) ÷ (29).However, first need prove the following auxiliary lemmas.
Lemma 2. Consider the model (3) and the algorithm (26) -(29).Let us assume further that the first derivative ( ) The proof is given in Appendix B.
Lemma 3. The sequence ( ) ( ) ( )  3) is strictly positive real, i.e. the real part ( ) . C8: The system input and output signals satisfy Then the adaptive robust prediction ( ) converges, in the Cesaro sense, to the optimal one-step ahead predictor 13) with probability one (w.p.1), i.e.
The proof is given in Appendix D. The conditions C1 and C7 are commonly used when the standard martingale results are applied for convergence analysis [16,17].The assumption C2 represents a standard noise condition in the robust estimation [11].The conditions C3 ÷ C6 define a class of nonlinearities ( ) Ψ • , or influence functions, that have to cut off the outliers.Many ( ) Ψ • functions that are commonly used in robust estimation, except the Huber's influence function in (18), such as Hampel's, Tukey's or Andrew's nonlinearity, satisfy the above assumptions [7 ÷ 9].Furthermore, it is fairly obvious that some condition on the input sequence must be introduced in order to secure a reasonable result.Clearly, an input that is identically zero will not be able to yield full information about the system input-output properties.The condition C8 represents a reasonable practical assumption that input-output sequences are discrete-time signals with finite energy.
Remark: The convergence, in Cesaro sense, allows the number of departures of adaptive robust prediction from the optimal one to be infinite.Therefore, a stronger result is to prove almost surely convergence, or convergence with probability one, for which To demonstrate the feasibility of derived theoretical results, the proposed approach will be applied to the problem of designing a robust adaptive minimumvariance controller [1,2,21].Let the dynamic plant under consideration be represented by ARMAX model (3) with unit delay for which parameter k is equal to 1.The dynamic plant has to be controlled to make the behaviour of the entire control system with a stationary random setpoint approach the desired behaviour of the pre-specified reference system [1,2,21].In other words, the system output, y, should differ as little as possible, in some sense, from the desired output y * , with the given set point.The measure of this difference can be specified by performance index (1) with Here one-step prediction of the output, ŷ , is replaced by the desired output, y * .With the system parameters in (3)  known, what we have is the problem of designing an optimal controller, minimizing the adopted criterion (1).The optimal control is given by ( 5), with where the polynomial ( ) G • is defined by (7), and satisfies the equation ( 6), i.e.
( ) ( ) ( ) Taking into account ( 8) and ( 9), one concludes that for the optimal control the misalignment,ν , is equal to the white noise, e.Moreover, the minimal value of the criterion (1) is equal to the noise variance.The controller equation (31) may be represented in an explicitly recursive form, which is more convenient in adaptive systems.Namely, by introducing the notation of the controller parameters vector and the observation vector ( ) where from (

Z i s y i y i n ru i u i u i m s i s i l
Here the parameter r is either 0 or 1, and ( ) Z i in ( 14) corresponds to the value of r equal to one.
However, with an unknown system parameters in (3), the need for an adaptive system arises.This system employs an adaption algorithm which changes controller parameters (33) to make the entire control system meet the requirements.Adaptive control system can be obtained in several ways [1,2].One possibility is to determine directly the unknown controller parameters (33) [21].This can be done by robust algorithms predicting the desired reference value y * , and minimizing the functional of the prediction error, ν , in (15).The solution of the prediction problem relies on one -step ahead prediction, ŷ , of y * , and the controller equation (34).This leads to the recursive algorithm ( 26) -(29) with ( ) . It should be noted that the relation (34) can be rewritten in the form of equation ( 13), that is ; (36) Thus, the minimum variance strategy is obtained by predicting robustly one-step ahead the output, y, with (29) and then choosing a control, u , that makes the prediction, ŷ , equal to the desired output, y * , as is show in (36).The performances of this algorithm, compared with the convenient non-robust minimum variance type adaptive controller, are analysed by simulations in the literature [21].

Conclusion
A new adaptive robust one-step ahead predictor was synthesized by minimizing a suitable chosen non-linear prediction error criterion.Given the importance of occurrence of pulse noise, or outliers, within the Gaussian samples of the measurement noise population, this nonlinearity should look like a quadratic function for small values of the argument, whereas it has to grow more slowly than the quadratic one for large value of the argument.In addition, the non-linear loss function derivative, named the influence function, has resistant robustness property, along with the efficiency robustness.The unknown parameters of the proposed adaptive robust predictor are estimated in each step by applying a recursive algorithm of the stochastic gradient type.The convergence of the adaptive robust prediction algorithm, in the Cesaro sense, is established theoretically using standard Martingale theory.It has been shown that the proposed adaptive robust prediction converges to the optimal systems output predictions.The obtained theoretical results are used to solve the problem of designing a robust version of an adaptive minimum variance type controller.
Further problems in the robust prediction context, that are of practical interest, include a multi-step predictor that plays a significant role in processes involving delay.It is well-known from engineering practice that the delay phenomenon renders the generation of adequate control action rather difficult.It would also be of interest, because of convergence speed, to consider a robust version of the parameter estimation algorithm where the scalar gain factor is replaced with a suitable matrix.This, in turn, increases the computing complexity of the parameter estimation algorithm.

Appendix C: Proof of Lemma 3
Similarly as in the proof of Lemma 2, one concludes that there exists a finite positive constant k 2 , such that The right hand side of relation (C1) is a consequence of the Abel-Dini's theorem [19], which completes the proof.

Appendix D: Prof of Theorem 2
Let us introduce Lyapunov's stochastic function where ( ) ˆi θ is the predictor parameter vector estimate, generated by (26), while θ is the true unknown predictor parameter vector to be estimated, and ( ) ˆi θ is the estimated error in the i-th step.Symbol || || ⋅ denotes the Euclidean norm.Following the methodology presented in [12], one obtains for the prediction error the relation ( ) Taking into account (D2), the relation (26) can be rewritten Additionally, let us define the functions Under the hypothesis C2 and C5 of Theorem 2, one concludes In addition, the hypothesis C4 of Theorem 2 assumes the function ( ) Ψ • in (D5) to be monotone increasing and positive (negative) for positive (negative) arguments.Moreover, under the hypothesis C3, C4 and C7 of the Theorem 2, one obtains ( ) ( ) ( ) ( ) ( ) Using the relations (D4), (D7) and (D8), one can write have to be non-negative The condition i) is fulfilled obviously, while the condition ii) is satisfied due to the Lemma2.Thus, by applying the Theorem 1 on the relation (D9), one concludes

T i p k p i r i r i r i p k p k r i r i r i p i T i T i r i r i k i i Z i Z i T i r i p i T i r i
However, using C6 of Theorem 2 and Lemma 3, it follows from (D22) and (D23) that T*=0, so that the relation (D19) is proven.Following the methodology exposed in [12], one can also show from (D18) that ( )

under the assumptions of Lemma 1 .Theorem 2 : 1 C
The proof is given in Appendix C.Starting from the results of Theorem 1 and Lemmas 1, 2 and 3 one can prove the following convergence theorem.Consider the model (3) and the algorithm (26) ÷ (29) subject to the conditions: C1: All zeros of the polynomial ( ) of bounded, independent and identically distributed (i.i.d) random variables, such that the probability distribution function ( )P • is symmetric, .1.(D11) Furthermore, let us analyse the relation (D11) in more details.Taking into account (B1) and (27), one concludes , ), one concludes, under the relation (D18), that T(i) is a discrete super-martingale,