Condition Monitoring and Fault Diagnosis of a Marine Diesel Engine with Machine Learning Techniques

A marine engine room is a complex system in which many different subsystems are interacting with each other. At the center of this system is the main diesel engine which produces the propulsion force. Many other components such as compressed air, cooling, heating, lubricating oil, fuel, and pumping systems act as auxiliary machines to the main engine. Automation of many functions in the engine room is starting to play an important role in new generation ships to provide better control using sensors monitoring the engine and its environment. Sensors exist in the current generation ships, but engineers evaluate the sensor data for the presence of any problems. Maintenance actions are taken based on these manual analyses or regular maintenance is carried out at times determined by manufacturers, whether such actions are needed or not. With machine learning, it is possible to develop an algorithm using past evaluations made by engineers. Recent studies show that highly accurate results can be obtained using machine learning methods when there is sufficient data. In this study, we develop new learning-based algorithms and evaluate them on data obtained from a realistic ship engine room simulator. Data for a predetermined set of parameters of a high-power diesel engine were collected and analyzed for their role in a set of fault situations. These fault conditions and the associated sensor data are used to train a set of classifiers achieving fault detection up to 99% accuracy. These are promising results in preventing future damage to the engine or its supporting components by predicting failures before they occur.


Introduction
Nowadays, computers can carry out many tasks that used to be done by human experts.This is achieved using machine learning provided there is sufficient data.Since computers can acquire and store very large amounts of data and process them quickly, for some tasks they can achieve better results than humans.There are various machine learning methods, and good performance can be obtained by choosing the appropriate method with their best parameterization.Of course, better than human performance depends on the type of problem and data.
The well-known machine learning algorithms offer solutions for regression [1] and classification problems including Support Vector Machines (SVM) [2], Decision Trees [3], Decision Forests [4], Boosting [5], and Artifi-cial Neural Networks (ANN) [6].Selection of the best method or algorithm depends on the nature of the algorithm as well as the given modeling problem and its data.One important factor in this decision is the dimensionality of the data.Most problems are usually very high dimensional (e.g., a typical image VGA-resolution image has over 300,000 pixels, hence dimensions), and some machine learning methods suffer what is called curse of dimensionality where increasing the number of sensors degrades the performance of the trained model.In practice, the selection is done via trial and error supported with experience of the expert building or training the algorithm.One of the recent popular methods is deep learning [7].Highly accurate results can be obtained using this approach, yet they rely on even more data for training.
Ship engine room is a very complex machinery system consisting of many subsystems.Since automation is getting more and more important in new generation ships, data from hundreds or even thousands of sensors can be processed and recorded in real time.These data are usually observed and evaluated by the engineer officer on the monitors in the control room.The engineer officers use their experience and engineering knowledge to evaluate the operation and performance of the system.In addition, the critical values for the systems are decided and set by the engineers and the system gives an alarm when these levels are reached.However, there is no system that can understand and evaluate a faulty condition while the values are slowly changing through a serious of malfunctions.In other words, there is no intelligent system that can do the work of the engineer officer (in terms of evaluating the data).In this study, we are targeting an intelligent system that can evaluate the status of machinery systems.This system is to predict a malfunction before happening when the sensor indications are at significant levels but unobservable by a human expert.This therefore allows cost reductions by switching from the planned maintenance to the predictive maintenance.
There are studies developing machine learning models to predict engine room failures using multitude of data such as vibration, speed, tribology, etc. Li et al. in [8] uses the simple dynamic model of a 4-stroke small-power (40 kW) diesel engine and the data received from a real engine with an encoder that measures the instantaneous angular velocity.They define the ratio of the instantaneous angular velocity to the average angular velocity and use this ratio mainly for fault detection.Situations such as incomplete lubricating oil in a cylinder and no oil in are examined.Input properties are arranged with kernel independent component analysis (KICA) and Wigner bispectrum analysis.A total of 300 inputs were used in the study [8].This work is expanded in [9] to learn error detection with fuzzy neural networks (FNN) after fusing the vibration data they received from 4 acceleration sensors placed on the engine in different directions with fast independent component analysis (FastICA).Xi et al. [10] obtained vibration data from a small-powered 4-cylinder diesel engine (boat or yacht engine) by running the engine at constant speed with the accelerometers placed on the cylinders, and error detection and visualization were performed using the ICA method.Antonic et al. [11] performed fault detection in a fully equipped engine room simulator.They proposed a method to correct the expert opinion using fuzzy logic methods without a need to train a machine learning model.Khelil et al. [12] modeled the lubrication system of a ship diesel engine in terms of hydraulics and thermals and made fault detection on this model using artificial neural networks.As input data to an artificial neural network, they first entered the machine speed and the fuel lever position, but then they reduced these to a single parameter because they observed that more accurate results can be obtained with only the machine speed.Kowalski et al. [13] measured 15 parameters such as ambient air temperature, air humidity, exhaust gas temperature, fuel injection pressure and combustion pressure from a small power (250 kW, 1000 rpm) 4-stroke ship diesel engine operating at constant speed (750 rpm).10 different failure conditions such as air inlet valve leak, exhaust valve leak, injector blockage were examined.A total of 798 observation data were created and learning was carried out with the ensemble method.Nixon et al. [14] used sensor data and unscheduled maintenance records to monitor the condition of a diesel engine.The fuel pump failure was studied, but no information was given about which sensors were monitored and how much data was used.They experimented with LDA, SVM and Random Forest methods.Lazakis et al. [15] used the data of the noon-report (a set of important parameters from the ship is reported to the operating company on land at noon every day) to evaluate the machine performance.From the noon-report, the power, lubricating oil inlet temperature, lubricating oil pressure, cooling water inlet temperature, cylinder maximum exhaust temperature, cylinder minimum exhaust temperature data of the diesel generator were extracted and used in modeling.SVM method is utilized with 804 data obtained in a 317-day trip.Zhong et al. [16] modeled an AVL BOOST based fault simulation model.They took data for six different fault conditions at 100% load of the machine in their simulations.The conditions studied were variable oil inlet to the cylinder, insufficient air cooler, injector timing error (late injection and early injection), and normal condition.The data (with 21 thermodynamic parameters and 1320 samples) was used to build a model using deep belief networks (DBN).The paper does not give any details on the 21 parameters used.Tan et al. [17] conducted a study to detect faults in the fuel injection system of a ship diesel engine.SVM algorithm was applied to detect failure conditions such as fuel circulation pump wear, circulation pump motor failure, and fuel filter pollution.Hou et al. [18] reported some failure conditions such as cylinder liner crack, burst fuel pump high pressure pipes, scavenger fire, fuel pump wear in a ship diesel engine, using the cylinder exhaust temperature, cylinder coolant temperature, piston cooling oil temperature, cylinder combustion pressure, and cylinder compression pressure as parameters.7000 samples were used to train a multilayer ANN along with a genetic algorithm to detect failures.The long run-time of the proposed method is stated as a disadvantage.Qi et al. [19] proposed a simple regression model in a ship diesel engine using machine power, engine speed, scavenge volume, scavenge pressure, exhaust temperature, to detect malfunctions (by thresholding the regressor output).A total of 15800 samples from a MAN B&W 6S42 type machine were used.Ellefsen et al. [20] used error-type independent spectral anomaly detection method to determine the degradation in a diesel engine of an autonomous ferry.Samples were taken from a diesel engine in a laboratory environment for two different load cases.The data was observed for two different load states for normal operation and for each degradation state.Air filter, turbo and cooling deterioration conditions were inves-tigated.47 parameters (including machine power, coolant temperature, exhaust gas temperature, and engine speed) were used to train a model with a variational autoencoder (VAE) to predict deterioration conditions.
As one of the most important subsystems in a ship, the main engine is frequently addressed in many automated prediction studies.However, most of these studies concentrate on a part of the main engine.In this study, we address some fault conditions in a ship's main engine.We collect a large amount of data from a high-fidelity simulator representing the fault conditions and use this data to train machine learning models with high accuracy and run-time performance.We evaluate the effectiveness of various machine learning methods.The data and results of the models are presented in detail.

Methodology
We address the problem of failure detection in a diesel engine used as the main engine of a tanker ship.As mentioned above, learning-based methods to solve this problem requires enough data representing all the possible failure and non-failure cases in a balanced way.A realistic engine room simulator shown in Figure 1 was used to obtain such data.The utilized simulator is the engine room simulator of the Kongsberg Maritime (Norway) and model is K-Sim Engine MAN B&W 5L90MC VLCC (Very Large Crudeoil Carrier) tanker.The specification of the modelled tanker is 187997 tons deadweight, and the navigation speed is 14 knots while the main engine is 18 MW (CSR: Continuous Service Rating) at 74 rpm.The biggest advantage of using a realistic simulator is that we can create a fault and observe how the observations (sensors) change before, during and after these faults.We examined a few fault conditions in the main engine system.The sensors or parameters are also selected.These fault conditions were created in the simulator and sensor readings were recorded.In addition, data were obtained in the simulator for the normal or non-defective situation.The obtained data and their relations with each other were examined and various machine learnings models were trained, and their performance are analyzed.

Cylinder liner crack (Fault code: M2507)
There are many sensors in the ship engine room.When modeling each fault condition, the parameters related to this fault need to be examined.We decide on the parameters to be used for each fault situation separately.There are different parameters for different faults as well as some common features.For example, 'G02050', 'T02040', 'T02041', 'T02043' parameters are common to all faults.Table .1 lists all these parameters and which parameter is used in which fault condition.After determining the necessary set of parameters, scenarios of failure situations were generated in the simulator, and the changes in the parameters were observed.Faults were inspected for the first cylinder of the main engine.The image of the simulator for this cylinder is shown in Figure 2.
Scenarios for all fault cases were created as follows.After the simulation starts, it works normally for 5 seconds and then the fault is triggered.Faults start from 0% as a percentage and reach their maximum value within 30 minutes.The maximum value differs according to the fault type.The reason for this difference is that the main engine automatically slows down depending on the type of fault.More data after this will not be meaningful to us as the whole system behavior changes when the main engine is at slow down condition.The maximum fault percentage for fuel injection valve clogged is 59%, because when the fault value is around 59-60%, the main engine slows down.The maximum fault percentage for the exhaust valve leakage is determined as 62%.The maximum value is taken as 100% in case of cylinder liner leakage, because  the main engine does not slow down.Creation of scenarios in the simulation is explained in Figure 3 showing the general view of a scenario.It is seen how "trigger" and "action" operations are modelled in the left menu of the scenario screen.Here trigger specifies the state we want to trigger."action" can be any failure condition or another process.It is possible to create time dependent scenarios by using these processes and operations in the timer.
Figure 4 shows the "action" activated by this trigger.The action created here is the clogging of the fuel injection valve of the main engine cylinder no 1.In this example, we can see that there are multiple parameters to model the fault condition such as ramp up duration, on value, off value, on duration, etc.
Figure 5 shows how the values of two selected parameters change during a fault scenario.Automating the data collection was not possible because the simulator allows resetting only the fault state, while the other data continued from the last value in the previous fault state.Therefore, at each iteration the parameters are reset by    Once the data is obtained (and after the analysis of the data), the machine learning stage starts.We are aiming for classifiers to detect failure cases.This can also be posed as a regression problem with fault values (with percentages, 0% indicating no fault, 100% indicating full fault).The number of features (or the dimensionality) is selected per fault problem (three situations discussed above).For ex-ample, while the input data is 10 dimensional in fuel injection valve clogged failure, it is 13 dimensional in exhaust valve leakage failure.If we think of it as a 13-dimensional vector, we add the fault percentage, which is the output parameter, to be the 14th dimension.The machine learning algorithm actually uses the test data and estimates the fault value which is the output.Here, the normal and faulty states of the parameters are collected with corresponding output value and the machine learning algorithm carries out the model building process using the given data.At the normal or fault-free condition the fault status is taken zero.About 60% of the data was used for learning and 40% for testing.While the system is running, it is possible to obtain the percentage of the modeled fault condition continuously.Thus, it will be possible to take precautions when the fault condition is at a certain level.The flow chart of the algorithm used is shown in Figure 8.
Prediction models can be explained as a process of obtaining an output using specified system parameters or sensor inputs.The resulting output may differ depending on the model.In this study, the percentage status of the failure is predicted.Regression models explain the relationship between the multidimensional input parameters and the continuous output parameters.By utilizing the supervised learning method, it is possible to predict the value of the output parameter when building these models.The purpose of the teaching process is to minimize the cost function generated to find the best function representing the data.Thus, the cost function also allows us to measure the error.The most commonly used cost functions are the mean squared error (MSE) and root mean squared error (RMSE).In this study, RMSE was used as a cost function in the learning phase.
Linear regression predicts a dependent variable value based on a particular independent variables.Hypothesis function for linear regression is: where ŷ is the predicted value of a dependent variable, x 1 , x 2 , ..., x n are independent variables, and θ 0 , θ 1 , …, θ n are the regression coefficients.
If the effect of one variable depends on the value of another variable, we must consider interactions between the variables in linear regression.For two input data, interaction produces another input field for their multiplication: Here θ 3 is a regression coefficient and x 1 x 2 is the interaction.The interaction between x 1 and x 2 is called a twoway interaction as it is the interaction between two independent variables.Two-way interaction was also used in this study.
Another linear regression method is stepwise regression.This regression model performs multiple regressions several times, removing the weakest correlated variable each time.In the end, the variables that best explain the distribution remain.Robust linear regression, on the other hand, provides a more accurate model by clearing outliers.In this study, linear regression, interactive linear regression, robust linear regression and stepwise linear regression models are used.
The support vector machine (SVM) is a popular machine learning tool for classification and regression.There are both linear and nonlinear SVM methods.SVM regression can be extended to a nonlinear domain using a kernel.A nonlinear function moves the original dataset to a higher dimensional space, which makes the data hopefully separable.The polynomial kernel function equation used in this study is as follows: where d is the degree of the polynomial function.Value of d is parameter defining the increase in dimensionality where d = 1, 2, 3 is yield linear, quadratic, and cubic kernels respectively.Apart from these, the Gaussian kernel provides good discrimination in high dimensions for nonlinear problems.Linear SVM, cubic SVM, quadratic SVM, coarse Gauss SVM, medium Gauss SVM and fine Gauss SVM methods were used in this study.Another family of non-linear algorithms are regression trees.Tree-based models split data multiple times based on certain cutoff values.Different subsets of the dataset are created, with each sample belonging to a subset.Final subsets are called end or leaf nodes, and intermediate subsets are called internal nodes or split nodes.The average result of the training data at that node is used to predict the outcome at each leaf node.The depth of the tree defines may affect the fitness of the tree (overfit if a fully grown tree is built).Ensemble models of regression trees are also possible.Ensemble models combine many weak (and potentially overfit) learners into a high-quality ensemble model.In this study, various levels of fully grown and pruned (fine, medium, and coarse) trees, as well as boosted ensemble trees and bagged ensemble trees are built.
Performance of a model's prediction (regression) can be measured using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), or square error (R2).While RMSE is used as the objective function in finding the best regression model, these additional metrics provide different aspects of the error made during testing.Assuming the expected output is y j and the estimated output is ŷ j (and baseline algorithm's performance is y ̅ ) , the following equations can be used calculate the error metrics (assuming we have j = 1, ..., n samples in our test set):

Results
The methods described in the previous section were implemented and run using the data obtained for three different fault conditions.An i7-10750H 2.60GHz, 16GB Ram capacity computer was used to train and test the failure data.
In addition, principal component analysis (PCA) method is applied to observe how each parameter represents the data.The parameters obtained by determining the  2. The most distinguishing parameter is 'G02012', while the least distinguishing parameter is 'Q02004'.The parameters 'G02012' and 'E02050' were observed as the main separators.The learning and test results for this fault condition are shown in Table 3.The best performance is obtained with the "Fine Tree" method.During learning, a high R 2 value of 0.9996 was achieved with this method.Even though the fully grown tree is expected to overfit the data, this model generalizes well during the tests with a high R 2 value of 0.9992.For the fault condition M2507, the 14 relevant parameters are 'G01154', 'G02050', 'G02052', 'G02053', 'G02057', 'L01150', 'P01005', 'T01010', 'T02040', 'T02041', 'T02043', 'T02044', 'T02046', 'Z01164'.For this fault, 270,000 samples were used for learning, and 180,000 samples were used for testing.The output range is 0-100% since the main engine does not slow down (as opposed to the first failure case analyzed above).
Parameter selection was done by determining the PCA 95% variance threshold as shown in Table 4.It is observed that the most distinguishing parameter was the 'G2057' parameter, while the least distinguishing parameter was 'T02044'.The main distinguishing parameters are 'G02057', 'G02053', 'G02050', 'P01005' and 'T02046'.
In the case of this failure, 100% learning performance was achieved with the 'Ensemble Bagged Trees' method during the learning phase.This performance can be expected due to the use of expansion tank level and main engine cooling water gas detector parameters which are known to be related to this fault.During the test phase, the highest performance value is obtained for 'Fine Gaussian SVM'.The third fault is the exhaust valve leakage labelled as M2506.For this fault condition parameters 'E02005', 'G02050', 'N01610', 'P01602', 'Q02004', 'T01603', 'T01613', 'T02040', 'T02041', 'T02043', 'T02044', 'T02045', and 'T02046' were analyzed.
150,000 samples were used for learning and 100,000 samples for testing.The fault value ranges between 0% and 62%.When PCA is applied, only the primary components provide 98.8% representation of the failure.It has been observed that the 'E02005' and 'P01602' parameters have no distinguishing characters.Since these parameters have no effect on the model, they do not need to be monitored for this fault condition.
An R 2 value of 0.9994 was obtained with the 'Fine Tree' and 'Coarse Tree' methods.In testing, it is seen that 0.9993 R 2 value is achieved with the 'Fine Tree' and 'Coarse Tree' methods, while the other errors are lower in the 'Coarse Tree' method.These metrics are very close but 'Coarse Tree' may be preferable considering that it uses a smaller tree.While the accuracy of the trained models is very important, it is critical to look at the run-time performances as well.Table 8 shows how long it took to process 180,000 samples for each learning method.It is possible to increase the speed even more with a higher capacity computer and using parallel processing.With the current configuration, the fastest methods, as expected, are linear regression methods.

Conclusion
We developed automated fault detection algorithms and evaluated them on a realistic ship engine room simulator.When all error cases are evaluated, we see that decision trees have high accuracy rates in both learning and testing phases.High accuracy rates are also obtained for other algorithms.The lowest value was seen in the linear regression model in the test phase for the cylinder liner crack.Even this is good enough to be used in a practical system.Linear regression gave the best run-time performances among all the algorithms, followed by decision trees.When the fault detection algorithm needs to run on an embedded system, trained linear regression models can be a good choice since they also provide comparable accuracy.
Using a simulator allows us to obtain useful information about the malfunction status of the machines and systems in the engine room.As a result, a step has been taken to develop a system that can understand the status of machine systems.Considering the dozens of systems and hundreds of malfunctions in the engine room, the need for multiple models emerges.Starting with the most important systems and malfunctions, such a system can be realized in stages.While engineers working on the ship evaluate the data, their knowledge can be captured in the data.Using these types of learning-based systems, expert knowledge can be captured and augmented with simulated data for building better predictive maintenance systems.

Figure 1
Figure 1 Two screenshots from the simulator used in the experimental studies: Engine room system (top), process directory (bottom) Source: Author using KS Model MC90V

Figure 2 Figure 3
Figure 2 Main engine cylinder no. 1

Figure 4 Figure 5
Figure 4The action settings of a scenario

Figure 6
Figure6shows the normal and fault state values of some of the parameters used.Temporal variation of four parame-

Figure 6
Figure6 The temporal behavior of some parameters for faulty (left column) and normal (right column) situations for Fault M2508 Source: Author

Figure 7
Figure 7 The temporal behavior of some parameters for faulty (left column) and normal (right column) situations for Fault M2507 Source: Author

Table 1
The parameters and the related malfunctions Source: Author

Table 3
Training and test results of M2508 fault

Table 2
95% PCA results for M2508 fault Source: Author

Table 4
95% PCA results for M2507 fault Source: Author

Table 5
Learning and test results of M2507 fault Source: Author

Table 7
Training and test results of M2506 fault Source: Author

Table 6
95% PCA results for M2506 fault for only the first component Source: Author

Table 8
Run-time performance of algorithms for 180,000 samples in testing Source: Author