Non-Rigid Registration via Global to Local Transformation

: Non-rigid point set and image registration are key problems in plenty of computer vision and pattern recognition tasks. Typically, the non-rigid registration can be formulated as an optimization problem. However, registration accuracy is limited by local optimum. To solve this problem, we propose a method with global to local transformation for non-rigid point sets registration and it also can be used to infrared (IR) and visible (VIS) image registration. Firstly, an objective function based on Gaussian fields is designed to make a problem of non-rigid registration transform into an optimization problem. A global transformation model, which can describe the regular pattern of non-linear deformation between point sets, is then proposed to achieve coarse registration in global scale. Finally, with the results of coarse registration as initial value, a local transformation model is employed to implement fine registration by using local feature. Meanwhile, the optimal global and local transformation models estimated from edge points of IR and VIS image pairs are used to achieve non-rigid image registration. The qualitative and quantitative comparisons demonstrate that the proposed method has good performance under various types of distortions. Moreover, our method can also produce accurate results of IR and VIS image registration.


INTRODUCTION
Image registration is a fundamental problem in many applications, such as pattern recognition, image mosaic and stereo vision, especially multi-sensor image fusion [1][2][3][4]. These applications must be on the basis of successful image registration. The complementary information from multi-sensor can offer more varied and comprehensive scene representations. It is very helpful for human visual perception, object detection, tracking and recognition. Multi-sensor image registration is a prerequisite for image fusion. Therefore, it has always been a research hotspot [5][6][7][8][9][10].
However, due to the complementary of multi-sensor images, there is less mutual information between them, such as infrared (IR) and visible (VIS) images. Some features in VIS images are commonly loss in IR images. Thus, multi-sensor image registration, which is very critical for fusion, is still a challenging task. One of the solutions is to transform image registration into point set registration, and then estimate spatial transformation model from point feature [5,11]. This paper focuses on point set registration to achieve IR and VIS image registration.
Spatial Transformation model is very important for registration. Moreover, it directly determines registration accuracy and computational complexity. At present, various transformation models are proposed for point set and image registration. Overall, the common used transformation models include rigid model, affine model [12], thin-plate spline (TPS) model [13], B-spline model [14] and the transformation model within a vector-valued reproducing kernel Hilbert space (RKHS) [15].
Rigid transformation contains only translation and rotation. On the basis of rigid transformation, the affine transformation contains scaling and skews additionally. The affine model has more dimensions of transformation than rigid model and is suitable for non-rigid registration. Because of its simplicity and high performance, affine transformation has been widely used. The key ideal is to estimate the coefficients of the affine model from correspondence between point sets. In [16], the correspondence point pairs, which are extracted from IR and VIS images by visual saliency and optimization procedure based on random sample consensus (RANSAC), are applied to calculate the coefficients of the affine transformation model. In [17], the optimal affine model is estimated from the correspondence obtained by ranking the geometric similarity and Euler distance. In [18], silhouette extraction is introduced to select the matching point pairs between IR and VIS images so that the affine transformation coefficients can be determined. However, the affine transformation is linear and cannot produce accurate alignment when there exists the anisotropy of deformation between point pairs in many applications, especially in multi-sensor image fusion [19,20]. Therefore, some non-linear transformation models have been proposed to address this problem.
TPS is a good non-linear and non-rigid transformation model. It can be cleanly decomposed into affine and nonaffine subspaces while minimizing a bending energy, and can be regarded as a non-rigid extension of affine transformation. Because the transformation coefficients of the TPS are more than that of affine model, the algorithms used to estimate the optimal TPS model are more complex. In [21], a robust point matching (RPM) is developed with the TPS as the parameterization of the non-rigid transformation and the soft assign of the correspondence. In [22], Speed-up robust feature (SURF) is introduced to extract corresponding feature point pairs so that the optimal TPS coefficients are determined. Changcai Yang et al. [23] propose a self-adaptive weighted objective function based on mixture model, and use expectation-maximization (EM) algorithm as an optimization procedure to achieve nonrigid point set registration. Because B-spline is able to preserve the smoothness of non-linear deformation, it is commonly utilized to construct non-linear spatial transformation model. Suicheng Gu et al. [24] propose a B-spline affine transformation and used the iterative closest point (ICP) method to achieve the registration of three-dimensional (3D) volumetric computed tomography (CT) data. In [25], a B-spline transformation model is constructed by control point parameterization and a gradient optimization procedure is applied to obtain the optimal B-spline basis coefficients. Wei Sun et al. [25] utilize lower-order B-spline basis functions and a random perturbation technique to implement efficient non-rigid registration.
Jiayi Ma et al. [15] propose a non-linear transformation model within RKHS. With the RKHSbased transformation model, the displacement of points is determined by a high dimensional coefficient vector constructed from control points in neighborhood. Due to high flexibility of transformation, it has been applied to various registration approaches. In [27], the L2-minimizing estimator is presented to estimate the RKHS-based transformation coefficients by building robust sparse and dense correspondences between point sets. Ref. [28] introduces an objective function established by Gaussian Mixture Models (GMMs) to measure the registration performance under different transformation coefficients and EM algorithm is employed to determine the global optimal solution.
The non-linear transformation models used in existing registration methods are based on the assumption that the displacement of every point largely depends on local feature in neighborhood. They describe non-linear deformation between point pairs by using feature points or control points. However, the optimization of registration may be accompanied by local optimum and premature convergence. Aimed at this problem, this paper proposes a method with global to local transformation for non-rigid point set and image registration. An objective function is established with Gaussian fields to make a problem of nonrigid registration transform into an optimization problem. The registration in this paper proceeds in two stages called coarse registration and fine registration respectively, in order to overcome local convergence. At the stage of coarse registration, a global transformation model, which can describe the regular pattern of non-linear deformation between all point pairs in global scale, is designed and applied to achieve coarse registration. With the result of coarse registration as initial value, a local transformation model is then employed to complete final registration at the stage of fine registration. To achieve non-rigid image registration, the edge maps are extracted from IR and VIS images by the Canny edge detector, and then, the optimal image transformation model is estimated from edge maps by using the strategy with global to local transformation.
The primary contributions of this work are as follows: (1) A global transformation model, which is able to describe the regular pattern of global deformation between point sets, is proposed to achieve coarse registration to overcome local convergence.
(2) A strategy with global to local transformation is designed and applied to improve the accuracy of non-rigid registration.
The rest of the paper is organized as follows. In section 2, we present the details of the proposed method. Section 3 illustrates the proposed method on synthesized point set registration, and then tests it on real images under various scenarios with comparisons to other approaches. Finally, section 4 presents the concluding remarks for our work.

PROPOSED METHOD
Essentially, our method is a gradient-based optimization process for non-rigid registration. So this section mainly focuses on objective function, global transformation, local transformation and optimization procedure. The objective function for registration is essentially a criterion that can quantify the performance of registration with much accuracy. In this paper, Gaussian fields are used to establish the objective function as follow,

Objective Function for Non-Rigid Registration
c , v f u denotes the correspondence between v n and f(u m ), || · || denotes the L 2 norm, σ d is a range parameter. The first term of Eq. (1) is used to measure the Euclidean distance between corresponding points. The second term ensures that the objective function is a smooth optimization problem, enforcing smoothness to the nonrigid transformation function f. λ is a regularization constant that balances these two terms. The optimal transformation f is determined by minimizing the objective Eq. (1).
Let C represent the correspondence matrix between U and V, and Cmn = c(v n , f(u m )). Ideally, C is supposed to be a binary matrix. If v n is corresponding to u m , C mn = 1, otherwise C mn = 0. However, it is very difficult to obtain a real corresponding matrix unless manual annotation is used in general. Hence, we need to construct a model which is able to indicate the potential correspondence between two point sets. In this paper, C mn is defined as where R is an attribute matrix which indicates similarity between U and V. In Eq. (2), the row and column summation constraints guarantee that the correspondence is one-to-one. Because this work focuses on 2D point registration, shape context [29] is used to construct the attribute matrix R. R mn can be written as where S i (u m ) and S i (v n ) denote the I-bin normalized histogram at u m and v n , respectively. Substituting Eq. (2) into Eq. (1), the objective function becomes (4) where w mn is defined as

Global Transformation Model
Global transformation model is represented as where θ is the angle of rotation in affine transformation, s x and s y are the scaling coefficients, t x and t y are the translation coefficients. G(x, y, ξ) represents the model of non-linear transformation and it is formulated as From Eq. (7) we can see that the model of non-linear transformation consists of quadratic, cubic and quartic polynomial. ξ is a coefficient vector of polynomial transformation. σ is a constant that balances affine and polynomial transformation models. We set σ = 1×10 −4 throughout this work.

Local Transformation Model
In this work, the RKHS-based transformation in [15] is employed as the local transformation model. The RKHSbased transformation model is chosen because it can be optimized by using local neighborhood structure. The local transformation model fL is defined as , y x, y x, y , where a diagonal Gaussian kernel

Optimization
The objective function (9) and (11) are based on Gaussian fields. Thus, they are continuously differentiable with respect to the transformation coefficient Q and T, respectively. The derivative of Eq.
where s m G = [σx 4 , σy 4 , σxy 3 , σx 3 y, σx 2 y 2 , σx 3 , σy 3 , σxy 2 , σx 2 y, σx 2 , σy 2 , σxy] T The derivative of Eq. (11) can be computed using ( ) With Eq. (12) and Eq. (14), a gradient-based numerical optimization technique can be used to determine the optimal transformation coefficient. In this paper, quasi-Newton method is employed to solve the optimization problem. However, quasi-Newton method has high requirement for initial value, otherwise the optimization procedure is limited by local convergence. Thus, a coarseto-fine strategy with global to local transformation is designed to improve the chance of reaching the global minimum. At the stage of coarse registration, the corresponding objective function is minimized by using the global transformation model. At the stage of fine registration, with the result of coarse registration as the initial value, the final registration is then achieved by using the objective function of fine registration with the local transformation model. Because the global transformation model contains affine and polynomial transformation, the offsets of all points are conformed to a regular pattern. Thus, the coarse registration focuses on optimal transformation in global scale. Due to the local transformation model constructed with local feature, individual offset of every point is optimized in its neighborhood at the stage of fine registration. The initial data estimated by the coarse registration is used to improve the performance of the fine registration. The proposed method is outlined in Algorithm 1. Initialize the coefficient vector Q of the global transformation ; 3 By using the derivative (12), optimize the objective function (9) by quasi-Newton method, and then obtain the optimal transformation coefficient vector Q°; 4 Compute the transformation result f G (U) by using Eq. (8); --Fine registration: 5 With f G (U) as input, initialize the coefficient vector T of the local transformation ; 6 By using the derivative (14), optimize the objective function (11) by quasi-Newton method, and then obtain the optimal transformation coefficient matrix T°; 7 Compute the final transformation result F = f L (f G (U)) by using Eq. (10).

EXPERIMENT
To appraise our method, we first test it on synthesized point sets, and then focus on registration for real IR and VIS images. Our method requires four parameters to be set: β, λ, σ d and σ s . We set β = 0.2, λ = 0.1, σ d = 5 and σ s = 0.3 throughout this work.
The algorithms are implemented in Matlab and run on the computer with 3.9 GHz Intel Core CPU, 4 GB memory. The average runtime when using the proposed method for point set registration is about 35 seconds, and the average runtime of IR and VIS image registration without edge discretization is about 1 minute.

Results on Point Set Registration
The synthesized data sets are constructed by Chui and Rangarajan [21], and they consist of two different point sets which are respectively named by Fish and Fu in this paper. The point set of Fish contains 98 points and the point set of Fu contains 105 points, as shown in Fig. 1. Fig. 1 reports the results of registration with the global, local and global to local transformation models, respectively. We see that the results of registration with the global transformation or the local transformation are not accurate enough. It illustrates that the optimization with the global transformation or the local transformation is easy to fall into local convergence and does not reach global minimum. Meanwhile, compared to the results of registration with the global transformation, the performance of registration with the local transformation is degraded more easily by local convergence, especially for Fish point sets. This illustrates that the global transformation model used in the coarse registration is able to prevent optimization from falling into local convergence. Apparently, our method can produce more accurate alignment results. It demonstrates that a good initial data estimated by the coarse registration improves the performance of the fine registration.  To evaluate the proposed method, we test it compared to CPD [32], RGF [15] and RPM-L 2 E [27], which are the state-of-the-art non-rigid registration methods. The Matlab codes of the above algorithms are provided by their authors.
The five models of degeneration including deformation, noise, occlusion, outlier and rotation are employed to test the performance of the above non-rigid registration algorithms. In order to analyze the influence of point set distortion, each above degeneration model is used to generate a target set from an individual point set, and 100 samples are created for each degradation level. Some examples from Fish and Fu point sets are shown in the top row of Fig. 2 and Fig. 3.  The qualitative results achieved by the various registration methods are also shown in Fig. 2 and Fig. 3. CPD performs well on point registration under the degeneration models of deformation, noise and occlusion, especially for Fish point set, while it is failed to implement point registration under the degeneration models of outline and rotation.
RGF is constantly easy to fall into local convergence and is not able to achieve point registration very well. RPM-L2E has a good performance under the degeneration models of deformation, noise, occlusion and outline, especially for Fu point set. However, it is not able to deal with the degeneration model of rotation adequately. Compared with CPD, RGF and RPM-L 2 E, the proposed method produces the best results of point registration under the various degeneration models. It can be seen that the proposed method with global to local transformation has a good chance of reaching the global optima.
The  Fig. 4 reports the quantitative results of the registration methods. In most cases, our method has the lowest registration error mean and standard deviation compared with the other approaches. The advantage of our method becomes more and more obvious as the degree level of degradations increases. This is because the proposed method with global to local optimization helps to estimate non-rigid transformation robustly.

Result on IR and VIS Image Registration
In this section, our method is tested on the four pairs of IR and VIS image pairs, i.e. Stairs, Building, Car and Block, which are from CVC datasets [30]. The resolutions of all images are 320×240. Meanwhile, the performance of our method is compared with those of the state-of-the-art non-rigid registration methods: CPD, RGF and RPM-L2E. All of these methods are able to estimate transformation model from point feature.
The edge maps are obtained by Canny edge detector, and a sampling method introduced in [29] is employed to discretize the edge maps into a point set. The goal is then to register the IR images to the VIS images. To present registration results visually, a simple method of image fusion based on bilateral filter is used in this experiment. The detail layer is extracted from a VIS image and fused with the corresponding IR image by a strategy of average fusion.
The Qualitative results of edge map registration and image registration are shown in Fig. 5, where the edge maps of the IR images are blue and the edge maps of the VIS images are red. The registration results of CPD are not good on the image pair of Block. RGF is failure on the image pair of Stairs. RPM-L2E cannot achieve correct registration on the image pair of Stairs and Building. Meanwhile, all of the IR images are correctly registered to the corresponding VIS images by using our method, and the registration results of our method are better than those of the other methods on the whole tested image pairs, especially on the image pairs of Stairs, Building and Block. This is because the transformation models, which are estimated from point feature by using the strategy with global to local transformation, have strong generalization ability and are able to achieve image registration with global optima. Hence, the qualitative comparison demonstrates that our method is able to produce more accurate registration results consistently compared with the state-of-the-art methods.
Next, we give the quantitative comparisons of CPD, RGF, RPM-L2E, our method with the affine model and our method on the dataset. For each IR and VIS image pair, a set of point correspondence is constructed manually as ground truth which can be used to calculate the recall of registration results.    6 reports the quantitative comparisons of the five methods on the four image pairs. As we can see the quantitative results are in good agreement with the qualitative results. The average matching errors of CPD, RGF, RPM-L2E and our method are about 7.56, 6.61, 6.57 and 5.69 pixels, respectively, while the average matching error of our method with the affine model is about 8.87 pixels. It proves the superiority of non-rigid transformation model for IR and VIS image registration. Meanwhile, our method has the least matching error and its recall curves are mainly above the curves of the other methods on the whole image pairs. This demonstrates that the proposed strategy with global to local transformation is able to improve the accuracy of IR and VIS image registration.

CONCLUSION
In this paper, a method with global to local transformation is proposed for non-rigid point registration and non-rigid registration of IR and VIS image. The global transformation model is introduced to describe the regular pattern of non-rigid deformation and achieve coarse registration in global scale. With the result of coarse registration as initial value, final point set registration is then completed by the fine registration with the local transformation model. To achieve non-rigid registration of IR and VIS images, the optimal transformation models are estimated from edge points by using the strategy with global to local transformation. The experiment on point sets and real images shows that the proposed method has good performance on non-rigid registration. The average matching error of the proposed method is decreased by at least 13.4% compared with the state-of-the-art methods.