Study of Camera Spectral Reflectance Reconstruction Performance using CPU and GPU Artificial Neural Network Modelling

: Reconstruction of reflectance spectra from camera RGB values is possible, if characteristics of the illumination source, optics and sensors are known. If not, additional information about these has to be somehow acquired. If alongside with pictures taken, RGB values of some colour patches with known reflectance spectra are obtained under the same illumination conditions, the reflectance reconstruction models can be created based on artificial neural networks (ANN). In Matlab, multilayer feedforward networks can be trained using different algorithms. In our study we hypothesized that the scaled conjugate gradient back propagation (BP) algorithm when executed on Graphics Processing Unit, is very fast, but in terms of convergence and performance, it does not match Levenberg-Marquardt algorithm (LM), which, on the other hand, executes only on CPU and is therefore much more time-consuming. We also presumed that there exists a correlation between the two algorithms and is manifested through a dependency of MSE to the number of hidden layer neurons, and therefore the faster BP algorithm could be used to narrow the search span with the LM algorithm to find the best ANN for reflectance reconstruction. The conducted experiment confirmed speed superiority of the BP algorithm but also confirmed better convergence and accuracy of reflectance reconstruction with the LM algorithm. The correlation of reflectance recovery results with ANNs modelled by both training algorithms was confirmed, and a strong correlation was found between the 3 rd order polynomial approximation of the LM and BP algorithm's test performances for both mean and best performance.


INTRODUCTION
Colour information of an object surface, or of a coloured patch sample, can be described in many devicedependent or device-independent ways. An ideal and unambiguous description should represent colour independently of the capturing device characteristics. A widely used approach involves the specification of colour with three components in CIE 1931 XYZ colour space, where the illuminant spectral power distribution must be known. These tristimulus values are illuminant dependent, though, thus described as being perceived under a specific light source. By defining the colour in a spectral space via surface reflectance, light source dependency is removed. During the last decades, many research efforts have been put into studying the methods for conversion of camera readings into spectral data. Approaches vary and are based on either trichromatic or multispectral imaging systems and numerous methods, from strictly mathematical, of finding a model that conforms to training data and all the constraints, to some other optimisation methods among which artificial neural networks (ANNs) have gained a lot of popularity in recent years [1][2][3][4][5][6][7][8].
Much research has been done in order to find appropriate mathematical methods for estimating the spectral reflectance. Mansouri, Sliwa and Hardeberg proposed an algorithm that makes Principal components analysis (PCA) adaptive in the framework of reflectance recovery from colour camera tri-stimuli values [9]. Bianco developed a local optimisation-based method to recover reflectance spectra from tristimulus values using adaptive estimation with metameric shape correction [10]. Zhang and Xu assumed that three PCs are not sufficient to accurately estimate the spectral reflectance. Thus the spectral space was first divided into 11 subgroups, and the PCs were calculated for individual subgroups. Then the number of PCs was further extended from three to nine through the residual spectral error of the reflectance in each subgroup [11]. Koh, Moroney and Gottwals upon testing their irregular sampling void filling algorithm in the context of mobile colour sensing application, used in-scene camera captured colour chart with known colourimetric data to perform a dynamic tetrahedral tessellation in RGB space. The selected region of interest of RGB values was afterwards compared with the tessellation, the enclosing tetrahedron was found, and tetrahedral interpolation was performed to transform RGB values into a deviceindependent CIELAB space [12].
Similarly, several studies were conducted focusing on applying ANNs for reflectance reconstruction. Usui, Nakauchi and Nakano built a wine-glass-type five-layered network with the bottleneck three-neuron hidden middle layer to extract three-dimensional colour attributes, where inputs and outputs of the network were the same spectral data of Munsell colour chips [13] as in our study. Mansouri, Marzani and Gouton proposed cascade of ANNs in the form of auto-associative and hetero-associative memory to build robust reflectance reconstruction systems where inputs were readings from a multispectral camera [5]. Cheung, Westland, Connah and Ripamonti compared the usefulness of two camera characterisation approaches. Both polynomial transforms, and ANNs exhibited similar performance of mapping camera outputs to CIE tristimulus values, but the authors concluded that ANNs can be difficult and time-consuming to train [14].
After the turn of the millennium, the rapid development of computer graphics cards offered a platform for performing massively parallel processing on a graphical processing unit (GPU). ANNs are typical parallel computing applications and as such the power of new GPUs can be efficiently harvested for a remarkable speedup of ANNs and their training algorithms [15][16][17][18][19]. Different types of ANNs are known in terms of architecture and learning algorithms, and many new have been proposed, especially in the field of deep learning applications [19][20][21][22][23]. ANNs are usually programmed and run on conventional computers with installed appropriate simulators, but also on supercomputers, on custom neural circuits which are application-specific integrated circuits (ASIC) and on some other forms of computer configurations, like neural accelerators and neuromorphic computers [24]. Training of a larger ANN can be timeconsuming, and many efforts have been made to reduce the training time.
ANNs can serve many different purposes. Common examples include classification and function approximation. A typical ANN has one input layer, one output layer and one or more hidden layers in-between. Each layer consists of several basic components-neurons. Usually the number of inputs is larger than that of the outputs, as in the case of classification of pictures, where inputs can be image pixel values or image features and outputs are names of individual classes. Similarly, when we are dealing with the functions, inputs can be reflectances at equidistant wave lengths, and outputs are CIE XYZ values that unambiguously describe a specific color.
Alternatively, if the task of an ANN is the reconstruction of spectral values from 3D colour space, the topology of an ANN is reversed, having a small number of inputs and many outputs, each representing one spectral component at a specific wavelength. When input colours are precisely defined in a device independent colour space, as is the case with CIE XYZ colour space or, by knowing the light source characteristics -with CIE Lab colour space -the ANN that computes spectral components from theseinputs can be trained efficiently and unambiguously. However, if we only have RGB values retrieved from camera readings, the spectral power distribution of light source is not known, and the characteristics of the camera optics and sensors are not available, an alternative approach has to be adopted. In this case, to reconstruct reflectances, additional information about lighting conditions and the camera has to be somehow implicitly taken into account. One possibility, which we used in our experiment, is the implementation of an array of colour patches with known reflectances that is photographed in the same conditions as the object itself. Known reflectances along with obtained RGB values from the array of patches become new learning set applied to train ANN for these specific conditions, and if training is successful, the reflectance of an unknown object can be reconstructed using newly trained ANN.

REFLECTANCE RECONSTRUCTION METHOD
It has been proven that multilayer feedforward networks are, under very general conditions on the hidden unit activation function, universal approximators provided that sufficiently many hidden units are available [25], and as demonstrated by Cybenko [26] that any continuous function can be uniformly approximated by a continuous neural network having a single internal, hidden layer and with an arbitrary continuous sigmoidal nonlinearity. Taking this into account, we modelled the ANN with an input layer receiving inputs from RGB input vector, one hidden layer with a varying number of neurons, and an output layer, spawning an output vector of reflectance values at equidistant 10 nm steps of wavelength from 380 nm to 730 nm. The network map is depicted in Fig. 1, where xk,j are k-th inputs (RGB), b j (L) and w j,i (L) denote biases and connection weights and y k,j are values of the output layer neurons-spectral components of k-th reconstructed reflectance; j is neuron index in (L)-th layer and i in previous layer.
Having a learning set of (RGB, reflectance) pairs from the photographed set of patches, the supervised training of the network is done by adjusting connection weights and biases via minimising the network cost function, which in our case is the spectral mean-squared error (Eq. (1)).
The cost function is calculated as an average sum of mean-squared errors (MSE), where N p is the number of patches, N λ number of reflectance wavelengths, k is patch index and j wavelength index, r k,j is a j-th spectral component of k-th patch measured reflectance and y k,j j-th spectral component of k-th patch reconstructed reflectance.
For the purpose of function approximation, Matlab supports backpropagation (BP) network training algorithm based on standard gradient descent which can be run on a GPU and is thus very fast but also supports Levenberg-Marquardt algorithm (LM) which is specifically designed to minimise sum-of-square error functions [27]. As stated in the hypotheses below, we assume that LM learning of ANN leads to a better convergence of the cost function and hopefully to a smaller output error and an improved reflectance spectra reconstruction.
LM training updates ANN's parameters by combining standard gradient descent and Newton method (Eq. (2)): Where: w j+1 is new, w j is the previous vector of network parameters (weights and biases), e is network output error vector, and J is a Jacobian matrix (Eq. (3)). If the parameter λ is large, the method acts like the standard gradient descent, while if λ is small, LM algorithm performs like the Newton method. Even though Hessian matrix in the algorithm is approximated by Jacobian matrix product (H ≈ J T J), the LM algorithm requires a lot of computer memory and is not adapted to run on GPU.
If there exists a correlation between the standard BP training algorithm adapted to run on GPU, and LM, the former can be used to search for a narrower span of the number of neurons in the hidden layer. The pursuit of a better performing ANN model would then continue within this span via a much slower but more efficient LM algorithm. If the correlation exists, it should manifest itself through the MSE function depending on the number of HLN.
Our hypotheses in this study are as follows: 1. Standard BP training algorithm converges slowly, but when running on highly parallel processing GPU, it is computationally fast.
2. LM training algorithm runs on CPU, becomes computationally time-consuming with an increasing number of neurons, but converges much better in comparison to BP, and gives better reflectance reconstruction results. 3. A correlation between BP and LM learning algorithms is manifested through a dependency of MSE to the number of HLNs. 4. Running BP training algorithm on GPU can lead to a successful search of a narrower span of the number of HLN, and training ANNs within this HLN span with LM algorithm results in a better performing reflectance reconstruction model.

MATERIALS AND METHODS
Our research focuses on recovering reflectance spectra from trichromatic camera values by supervised learning of a multilayer perceptron, realised with Matlab Neural Network Toolbox on a general purpose personal computer. Inputs of the ANN model are RGB values of camera readings and outputs are vectors of reflectance spectral components -readouts of the spectrophotometer.
As a source of colourimetric data for the experiment, we selected The Munsell Book of Color Matte Collection, which includes 44 sheets with 1301 patches with different chroma, value and hue values, including many natural colours of soil, skin, foliage and neutral tones. 40 sheets are divided into 2,5 steps Munsell hue circle (2,5; 5; 7,5; 10 for R, YR, Y, GY, G, BG, B, PB, P and RP), and four sheets contain neutral colours -neutral and lightly hued greys. Each patch was measured by i1Pro 2 spectrophotometer (X-Rite) at five points (near corners and at the patch centre). Reflectance vectors with 107 components for wavelengths from 376,66 to 730 nm with the step size of 3,33 nm were obtained with a maximum standard deviation for five measurements of less than 0,4%. The average values for the five patch measurements were computed, and reflectances were downsampled to 36 components from 380 nm to 730 nm.
The sheets were photographed in the photo studio, under the constant lighting conditions with Nikon D600 camera and constant manual settings (Fig. 2). Lightsource's spectral power distribution was measured with a spectrophotometer at the position of the sheet with colour samples, and the correlated colour temperature (CCT) of 3019°K was obtained. To ensure constant conditions and to process all the RAW images equally, images were normalised using Adobe Lightroom software to the CCT, tint was balanced to 0, and chromatic aberration, even though unnoticeable, was corrected for the profile "Nikon AF-S NIKKOR 50 mm f/1,4G". Median RGB values for each patch were acquired at the inner 50% square surface of the patch. The complete set consisting of (RGB, reflectance) pairs for all 1301 Munsell Matte patches, with three values for each RGB and 36 values for each reflectance was assembled, to feed into an ANN supervised learning algorithm. Before training, independent samples were separated from the complete set to calculate additional measures of reflectance reconstruction performance. The remaining set formed the learning set, which was randomly split into training, validation and testing sets in a 70:15:15 ratio at the beginning of each ANN model training.
Due to the nature of ANN training algorithms, which tend to find local, instead of global minima of the cost function, the search of optimal ANN parameters was repeated 41 times for each model, with sizes of the hidden layer changing by the step of 1 from 3 to 48 neurons. In our previous experiments with 21 learning repetitions it was observed that by increasing the number of HLN, the probability of finding better ANN parameters does not improve noticeably. In the presented study, however, we wanted to find the best ANN models with even greater certainty, so the number of repetitions was doubled, i.e. to 42. The odd number (41) was chosen due to the calculations of some additional statistics not mentioned in the article. The calculations were performed for five different learning set sizes (Tab. 1) and both learning algorithms (BP and LM).

RESULTS
To reiterate, for each of the 46 ANN models with 3 to 48 HLN, and for each of the five learning sets, the learning with BP and LM training algorithm was repeated for 41 times. The data of the average and the best results were collected, and the results were visualized and compared. In total, more than 200 hours of 4-core parallel processing CPU time and almost 12 hours of GPU time were spent. Parallel CPU computing is supported by Matlab Parallel Computing Toolbox. ANN modelling on all CPU cores was activated by useParallel training option.
The average time for one ANN model learning with both compared algorithms, and with the largest learning set, is visualised using logarithmic y-axis in Fig. 3 along with LM/BP learning time ratio. The speed of convergence expressed as the "number of epochs" how many times the output error had to back-propagate through the network to adjust its independent parameters in order to minimize the cost function-is visualised in Fig. 4 along with the BP/LM number of epochs ratio. After the training of every single ANN model with the particular number of HLN is finished, performance is calculated on the test set of samples. After 41 training repetitions of ANN with the same HLN size, the average test performance is calculated and the best result at the minimum of test performance is found. Average and minimum test performance-MSE values-according to the number of HLN is shown in Fig. 5 and Fig. 6 along with the BP/LM performance ratio and the performance polynomial trend lines of the 3 rd order. Correlation between LM and BP algorithm performances was calculated for each learning set along with the p-value at the significance level 0,05. Correlations were obtained for mean performance, min performance and the 3 rd order polynomial approximation of mean and min performance (Tab. 2). All correlations were found to be statistically significant (p < 0,05).
Minimum of LM and BP algorithm average performance and best performance, and of their 3 rd order polynomial approximation was found for each learning set, along with the number of HLNs at these points of best ANN models for reflectance reconstruction (Tab. 3). Differences between the number of HLNs at these points were calculated between "LM algorithm average performance" and "LM algorithm average performance 3 rd order polynomial approximation", "BP algorithm average performance" and "BP algorithm average performance 3 rd order polynomial approximation". Also, the same differences were calculated for the best performances. Mean values and standard deviations of obtained differences were calculated (Tab. 4).
Based on the observed results, we proposed and tested a new procedure to search for the best ANN model. Reflectance spectra of independent samples, which had been excluded from each of the five learning sets-see Tab. 1 were reconstructed with the best performance ANN models found with the LM algorithm using the proposed procedure, and with the BP algorithm. L×a×b×values were calculated from the measured and reconstructed spectra for the illuminants A, D50, D65 and F2. Colour differences were calculated and classified. As ΔE00 colour difference formula was used, classification into seven quality groups, as proposed by Yang, Ming and Yu [28], was performed. Average relative differences between LM and BP algorithm performance, normalized to BP performance were calculated through all learning sets, to assess the efficiency of the LM algorithm. Classification of colour differences and relative differences between LM and BP algorithms are presented in Tab. 5.    Peak signal to noise ratio (PSNR) and goodness of fit coefficient (GoFC) for reconstructed reflectances of independent samples were calculated (Eq. (4) and Eq. (5)) and classified into three classes based on PSNR value span and into four classes based on GoFC value span [29,30]. Relative differences between LM and BP algorithm were calculated and are presented in Tab. 6.
where r is the measured, and y is the reconstructed ndimensional reflectance vector.   Examples of some bad and good reflectance reconstructions are shown in Fig. 7. Burnt Orange, Dark Olive Green and Amethyst Violet have the "appreciable" perceptible colour difference, and either "poor" MSE or "poor" GoFC, Medium Olive Green, Dark Turkish Blue and Thulian Pink have "hardly" or "slight" perceptible colour difference, good GoFC and either "accurate" or "good" MSE. Shown reconstructions are made by ANN trained upon learning set containing 30% of patches. Reconstructed samples belong to the separate independent test set.

DISCUSSION
Comparison of the convergence of BP and LM training algorithms (Fig. 4) indicates a linear growth of BP/LM number of epochs ratio with respect to the number of HLN. As soon as the number of neurons in the hidden layer is higher than five, the LM algorithm converges faster than BP. At ten HLN it is 3-times, at 20 HLN 5-times and at 30 HLN almost 8-times better. For example, modelling an ANN with 20 neurons with LM algorithm takes 50 epochs but with BP almost 250 epochs. This clearly shows the superiority of the LM algorithm in terms of convergence efficiency. Why then use the BP algorithm at all?
When looking from a different point of view, i.e. when focusing on the calculation speed (Fig. 3), it can be stated that when comparing the speed of the LM algorithm running in parallel on quad-core CPU versus the BP training algorithm running on GPU, the latter is found to be faster right from the start. LM/BP average learning time ratio shows that at five HLN, BP algorithm is five-, at ten about ten, at twenty about twenty times faster and so on. Modelling an ANN with 20 neurons with LM algorithm takes 150 seconds, while with BP algorithm only 7 seconds. Such a time saving evidently favours BP training algorithm.
Finally, we can examine the accuracy of reflectance reconstruction ( Fig. 5 and Fig. 6), i.e. the performance, calculated as the mean squared error between the reconstructed and the measured reflectances. When comparing the average performance of the ANN models trained with LM and BP algorithms, the ratio shows about 30% better LM algorithm performance. When comparing these with the best performing ANN models, we found the best performance to be by approximately 30% better than the average performance, at the same BP/LM performance ratio.
Result of a reduction in the learning set size (Tab. 1) is that the BP algorithm running on GPU still remains much faster in terms of an ANN modelling time, LM algorithm still converges faster while the ratio of BP/LM performance remains the same. The first two hypotheses have therefore been confirmed.
When monitoring the shape of the mean-and the best performance curves for the test sets, the fluctuation of values is evident and results from the randomness of initial values of the ANN free parameters (weights and biases) so that every time a different local minimum of the cost function is found. These fluctuations remain even after a larger number of ANN modelling repetitions. Linear correlations between LM and BP algorithm average-and best performances were found to be high and statistically significant, with small p-values, and thus the third hypothesis was also confirmed. In an attempt to effectively smooth the performance curves, different methods were tested. Averaging of neighbouring values and the polynomial approximations were promising approaches, and a high correlation was found between the 3 rd order polynomial approximations of LM and BP training algorithm mean-and best performance curves. Test performance correlation coefficients for mean and best test LM and BP values are, understandably, lower than those for smoother 3 rd order polynomial approximations. It was also found that the fluctuation of the best curves of LM and BP algorithms becomes more pronounced at a certain point (at 20% of samples or less), and the correlation coefficient rapidly decreases, although high correlation coefficient values for the polynomial approximation of the third order still indicate a strong correlation.
Test performance mean-and best (i.e. minimum) curves as a function of the number of HLN at some point reach their minimum values, and by increasing the number of neurons, performance does not get any better. This minimum value is the point we are searching for. This point, we propose, indicates the optimal number of HLN to get the best reflectance reconstruction results. To narrow the search area for finding this point at the test performance curve for LM algorithm, differences were calculated between the number of HLNs at the positions of the best ANN models. We compared the number of HLNs at the minimum of the LM algorithm average performance curve with its 3 rd order polynomial approximation, with the BP algorithm average performance curve and with the 3 rd order polynomial approximation of the latter. In the same way, differences were calculated between the number of HLNs at the minimum of the LM algorithm best performance curve, its 3 rd order polynomial approximation, the BP algorithm best performance curve and the 3 rd order polynomial approximation of the latter. Mean, and standard deviation were calculated for the obtained differences for all five learning set sizes. It can be seen that the mean and standard deviation for differences between LM algorithm performance and 3 rd order approximation of BP algorithm performance are far smaller than between LM algorithm performance and BP algorithm performance, for both average and best performances.
Based on these findings, we propose the following procedure to search for the best ANN model at any other specific size of the learning set: 1. Run the ANN with the BP algorithm on GPU with varying the HLN number in a broader span and repeat the calculations many times at each HLN number, to increase the possibility of finding a satisfactory ANN model. After executing this procedure, we found a good LM algorithm ANN model in less than 30% of time necessary for scanning through the complete span from the smallest to the largest number of HLN (where the finding of the best ANN model is expected) with LM algorithm alone.
Results of reflectance reconstruction with the best performance ANN model found with the LM algorithm using the proposed procedure, and with the best model found with BP algorithm alone, are presented with differences between the measured and the reconstructed colours in Tab. 5, taking into account different illuminants. It can be seen that LM algorithm puts, compared to BP algorithm, about 40% more reconstructed colours into the best category, where the colour difference is hardly perceptible. About 30% more samples fall into the second best category, with a slight perceptible colour difference, while in the third category, where the colour difference is noticeable, LM puts about 15% less reconstructed colours than BP. Samples classified into lower categories with LM algorithm reach in most cases less than 10% of all. In Tab. 6 results are put into three categories based on the logarithm of mean squared error (PSNR) and into four based on the goodness of fit. LM algorithm relative to BP places more than 40% reconstructed reflectances into the highest category (Good), and almost 15% less into the low one (Poor), while the middle one (Accurate) is almost equally populated. In terms of GoFC, LM algorithm puts (relatively to BP) about 25% more reconstructions into a category where GoFC is higher than 0,999 and about than 20% less into the lowest category with GoFC under 0,995.

CONCLUSIONS AND FUTURE WORK
Executing the proposed procedure, effective ANN models with an optimal number of neurons in the hidden layer can be found in less than 30% of the time necessary to find such models using only the LM algorithm. If the search step is wider, when instead of testing performance for every number of HLN, search advances in a step of 2 or 3, time could be further reduced.
Observation of results, especially comparison of measured and reconstructed reflectance values (Fig. 7), reveals some discrepancies, where the reconstructed reflectance values fall into the category of "appreciable" perceptible colour difference, but these only form less than 10% of all reconstructed reflectances and by increasing the learning set, this figure falls below 5%.
We assume that by increasing the number of hidden layers, the results could get even better and, presumably, by capturing RGB values with two or more different cameras, performance could increase. We will test both assumptions in our following experiments.