Accelerated Proximal Algorithm for Finding the Dantzig Selector and Source Separation Using Dictionary Learning

: In most of the applications, signals acquired from different sensors are composite and are corrupted by some noise. In the presence of noise, separation of composite signals into its components without losing information is quite challenging. Separation of signals becomes more difficult when only a few samples of the noisy undersampled composite signals are given. In this paper, we aim to find Dantzig selector with overcomplete dictionaries using Accelerated Proximal Gradient Algorithm (APGA) for recovery and separation of undersampled composite signals. We have successfully diagnosed leukemia disease using our model and compared it with Alternating Direction Method of Multipliers (ADMM). As a test case, we have also recovered Electrocardiogram (ECG) signal with great accuracy from its noisy version using this model along with Proximity Operator based Algorithm (POA) for comparison. With less computational complexity compared with ADMM and POA, APGA has a good clustering capability depicted from the leukemia diagnosis.


INTRODUCTION
Compressed sensing (CS) considers the recovery and separation of signals that have sparse representation in some known transform domain i.e. Fourier transform, wavelet transform, sine and cosine transforms etc. In CS, original signal is recovered from far fewer samples as compared to the conventional Shannon theory [1][2][3][4][5]. Most of the practical signals are sparse in one or other domain, for example the sinusoid shaped signals are mostly sparse in the Fourier domain unlike images which have sparse representation in cosine domain. Applying l1 norm recovery techniques to these types of signals has produced better results.
The stationary and non-stationary signals using CS theory is discussed in [6] with the main goal to achieve high resolution Time-Frequency (TF) distributions. Composite signals are considered sparse by taking a few ambiguity functions around the origin and thus making it an undersampled problem. Instead of using regular optimization techniques for CS recovery, [7] uses twodimensional L-statistics modified techniques for the reconstruction of signals highly contaminated by impulsive noise in the ambiguity domain. A few samples corrupted by noise are removed by sorting operation making a CS function. Some of the applications of sparse signal processing theory and CS, including decomposition of signal, speech processing, medical images denoising and reconstruction are given in [8][9][10][11].
On the other side in comparison with the propositions in [6,7], the algorithms in [12,13] use the overcomplete dictionaries for the recovery of sparse signals by using the low pass characteristics of the signals. If the restricted isometric property is satisfied by the measurement matrix, then recovery through l1-norm of nearly sparse signals is guaranteed.
In [14] an attempt is made to separate sinusoids from other signals by considering the ambiguity function points along the zero time-lag. This approach is inspired by the theory used in direction of arrival and array signal processing. For the recovery of narrow band signals when corrupted by noise in the shape of non-stationary signals using CS approach, [15] makes observations in the TF domain instead of time domain as in the case of standard CS theory. By taking short-time Fourier transform (STFT) of the signal and using different size of windows for mapping the local information of sparsity in STFT to the global information in Fourier transform. The nonstationary signals which are not sparse over the entire data may be sparse in a local window of STFT which can represent and reflect the local behavior of the composite signal. For a successful separation of non-stationary signals, the time-frequency regions corresponding to these signals are identified and removed by using the L-filter as mentioned in [16,17]. L-filter statistics is applied first for the separation of the composite signal and is then followed by CS recovery techniques. The separation of the stationary and non-stationary signals is inspired by its application in radar signal processing, micro-Doppler effects and rigid body points. The signals reflected from the main body of the plane are considered as stationary signals and are sparse in the Fourier domain, while signals from the moving fans are treated as non-stationary signals [18]. To get the signal of interest, the overlapping signals in TF domain are removed keeping the points where narrowband signals exist i.e. stationary.
In some cases composite signals may be degraded by noise. The separation of the signal into its components without loss of a good portion of the signal is not easy and the separation becomes more difficult specifically in the case of undersampled composite signals. Proximity Operator based Algorithm is introduced in [19] which is used for separation of undersampled composite signals and hand writing (images) in the presence of noise [20]. Here Dantzig selector with overcomplete dictionaries is used for the separation of composite signals and images. In [19,20], fixed point proximity operator based algorithm is used which is a special case of proximal mapping and which is also known as indicator operator or projection operator. In this work we have used the accelerated version of proximal mapping with overcomplete dictionaries for better convergence rates. The algorithm has better performance in terms of data clustering and source separation. The comparison of Dantzig selector with that of the Least Absolut Shrinkage and Selection Operator (LASSO) is given in [21][22][23][24]. LASSO does the matching of the candidate solution to that of the observations while Dantzig selector model tries to limit the residuals. The solution produced by the Dantzig selector is always sparser in terms of l 1 -norm [25].
The problem of finding the Dantzig selector is solved in four steps by the authors in [26]. At first the problem is formulated as conic, secondly, its formulation is as dual expression, third step is to apply smoothing, and fourth one is to solve it by optimal first order method. The algorithm is flexible enough and its significance is depicted from its usage in the solution of CS problems. In signal processing, machine learning mathematics etc. different types of convex cone problems arise, which are solved by this approach. Due to the stability and computational efficiency, the algorithm is compatible with that of the LASSO. The ADMM is used for finding the Dantzig selectors in [27]. They use the non-monotone gradient algorithm to solve this problem in multiple steps and compare this method with that of the first-order method used in [26] for finding the Dantzig selector. The use of the ADMM for finding the Dantzig selectors outperforms the method used in [26] in terms of the calculation time while producing the results of comparable quality. In this paper, we are using APGA for finding the Dantzig selector with overcomplete dictionaries and apply the model for source separation. We have used this model for the diagnosis of leukemia in patients along with ADMM. We have also successfully recovered and separated a noisy ECG signal using this model and POA.

The Dantzig Selector with Overcomplete Dictionaries
Consider the estimation of a parametric vector n R β ∈ from a linear regression model: , and m m n ξ  is independent and identically distributed (i.i.d) noise vector. This makes it an underdetermined problem. To find the most suitable vector n R β * ∈ out of possible candidate solutions along with splitting into its components is the goal of the research which makes it an underdetermined problem. If β is a vector of parameters and s p×1 is a composite sparse signal in the overcomplete dictionary p n Ψ × , then β n×1 is given by the relation: The vector β and the composite signal s are unknown, but in most of the problems one can imagine the overcomplete dictionaryΨ . For example, in the sinusoidal signals with periodic impulses, concatenate discrete Fourier transform and identity matrices for proper dictionary representation. Let p = 2n and assume the n × p dictionary Ψ by horizontally concatenating orthonormal is a diagonal matrix having elements and is used for normalization of the dictionary couple. For D to be invertible, the entries of X must be i.i.d and Ψ having entries as nonrandom basis.
Random sensing matrix is non coherent but also not orthogonal to fixed bases. It implies that for D to be invertible, 0 ij d ≠ for each j. The greater the randomness of sensing matrices, the greater the incoherence with the overcomplete dictionaries, the smaller the isometry constant and successful recovery through the l 1 -norm, 0 δ > .

Proximal Algorithms
Dantzig selector is computed by characterizing a solution to model shown in Eq. (4) by employing the APGA and the use of l 1 -norm. The model is then used for separation of composite signals. Suppose a composite while h is convex but not necessarily differentiable. APGA produces better convergence. For sparsity and separation, we use the overcomplete dictionaries so that the signal of interest has sparse representation in that dictionary and can be recovered from it.
According to [19], the proximal mapping (or the proximal operator) of the convex function h is defined as: for ( ) h x , we have the following three different cases.

CASE-1
When ( ) ( ) The proximal algorithm is then reduced to: For this case the convergence rate is slow and is ( )

CASE-2
i.e. projected gradient descent or proximal gradient descent. If n C R ∈ is a closed and convex set then considering [20], Where: Which is an indicator function of C. Hence, is used in [20], with overcomplete dictionaries they have separated the composite signals. The update step is: : i.e. prox h is the "soft-threshold" (shrinkage) operation and is given by: Or more compactly, It is also converging slowly and has a convergence rate of ( )

Accelerated Proximal Algorithm
For all the three previously discussed cases, the convergence rate is ( ) 1 O ∈ in which the projection or the indicator operator ( ) ( ) c h x I x = as used in [20]. It performs usual gradient update and then projection back onto C. The method is faster than subgradient but it is not easy to define the proximal operator for most of the functions in closed form. For a specific problem someone must know the proximal operator in closed form. Also each iteration evaluates prox t ( ) twice as done in [20], iterations can be time consuming or fast, depending on h. We use the accelerated proximal algorithm to find the Dantzig selector and then apply the overcomplete dictionaries to separate the signals into respective components. To do this we consider h = 0 and then apply the acceleration.
The acceleration idea to the proximal algorithm is first introduced by Nesterov [28], in which each step uses entire history of previous steps calling the proximal operator twice. But we use the extension introduced by Beck and Teboulle [29], which accelerates the proximal gradient algorithm by achieving the optimal ( ) rate for composite signals as each step uses information from two last steps and makes one proximal call. For the accelerated version, [29] introduces another term in the update equation, which makes Eq. (7) as follows: First step is to start with k = 1, which makes it just a usual proximal gradient update. After that, which carries some "momentum" from previous iterations. The momentum term with the updates is given by: where 2 1 k k θ = + , substitute the expression for v (k) in expression for y, will give the complete updated accelerated proximal gradient update.
If "s" is a solution to Eq. (21), the iterative update for Eq. (21) through accelerated proximal gradient algorithm is given: Pseudo code for summarizing the algorithm of APGA is given below. , t k is step size k =k + 1 end while Post-processing: Use the appropriate processing scheme to construct the Dantzig estimator ŝ from final output of the while loop.

RESULTS AND DISCUSSION
In this section, we consider the classsification of two different types of Leukemia patients using the standard data set easily available online and also the recovery and separation of ECG signal from its noisy version using APGA with overcomplete dictionaries and ADMM.ECG (bio-signals), are corrupted by the noise due to muscular movements or by sensors, etc. We sparsely recover the components using both the algorithms with great accuracy. However, ADMM is not performing well in case of complex dictionaries [20].

Leukemia Diagnosis in Patients
We have taken the leukemia data set [19] and compared our algorithm with ADMM in [27] for diagnoses of 35 cancer affected patients, whether a patient is suffering from specific type of cancer. We assign "1" to the patients suffering from Acute Lymphocytic Leukemia and "0" to the patients having Acute Myelogenous Leukemia. From Fig. 1 it is clear that APGA values are more outspread (means dividing the data easily into two chunks to reflect the two types of Leukemia patients) than ADMM showing the tendency of APGA over ADMM for adaptation to linear regression model and separation of values into two different clusters. Tab. 1 shows the number of iterations and the run time taken by ADMM and APGA for the recovery of Dantzig selector β * for different values of " ".
is dependent on noise level and is given by

Separation and Recovery of ECG Signal
A composite signal is generated by combining ECG signal and the sparse spike signal. The ECG signal is , with x = 0, 1, 2, 3, ..., 1023. The sparse spike component θ β is formed by choosing a set A of size 'a' uniformly at random. The overcomplete dictionary Ψ is a concatenation of the discrete Fourier transform and the Euclidean Bases (or identity matrices). The composite signal is observed by y Xβ ξ = + . X is m × n sensing matrix with the elements chosen from normal distribution and the columns are normalized, ξ is a noise vector with zero mean and variance σ 2 . The parameters used are n = 1024, m = 512, with stopping criteria ε = 10 −6 and ƞ = 6 for σ = 0.01. APGA is applied to the noisy incomplete observations y X s v Ψ = + for the separation of ECG signal from the sparse spike component.  Simulation results for the recovery and separation of ECG signal through Accelerated Proximal Gradient Algorithm (APGA) and Proximity Operator based Algorithm used in [19,20] and the error between the original and recovered signal by the algorithms are given in Fig. 2. The simulations are run in MATLAB R2016a using the system with Intel i5-5200U CPU (2.20GHz) and 8GB RAM.
Tab. 2 shows pixel wise error for a few pixels between the APGA and POA. Also Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) and Mean Square Error (MSE) values are given in Tab. 2.

CONCLUSIONS
APGA with overcomplete dictionaries is used for the separation of composite signals and denoising. From Fig.  1 and Pseudo Code 1, it is clear that APGA based Dantzig selector model is a more intelligent learning algorithm than ADMM in terms of clustering and separation. This model is at least order one faster than ADMM in terms of the time taken by simulation. Also the results produced by ADMM using real valued dictionaries are good enough but in case of the complex valued dictionaries like Fourier transform dictionary, it is difficult to efficiently recover the composite signals. The use of APGA for finding the Dantzig selector and source separation in combination with the overcomplete dictionaries, is robust to noise levels and size of the problem and works well in both real and complex valued dictionaries. This model is equally applicable in medical signal processing such as Electroencephalogram (EEG) and Electrocardiogram (ECG) etc. Having applications in CS if the sparsity of the noisy signals is improved then the method can be used in further compression of the signal.