Reliability Analysis of Full-Face Tunnel Boring Machines with the Monte Carlo Simulation Technique

The high boring capability of full-face tunnel boring machines, especially in urban tunnels, has led to their increased use in tunnel excavation in various and unfavourable geological conditions. Therefore, many efforts have been made to predict and improve the performance of these machines. In this regard, most of the previous studies have focused on the effect of geological and mechanical properties of rock or soil. However, delays due to the maintenance and repair of these machines, which contribute to a major share of unwanted and unpredicted stops at work, have not been considered. Reliability analysis is a practical method based on studying the behaviour of breakdowns and maintenance of machines and systems. This approach can be suggested as part of the appropriate planning for machine maintenance and consequently reducing downtime and costs. In this way, it is possible to identify weaknesses and critical points of a machine or system of the boring process. In the present study, the reliability of the full-face tunnelling machine was analysed with the Monte Carlo simulation method. The studied machine is divided into 5 subsystems including mechanical, electrical, hydraulic, water and compressed air subsystems. Using breakdown data of about 24 months of boring operation, the reliability of each subsystem was simulated and evaluated. Eventually, the reliability of the boring machine was simulated using the Kamat-Riley (K-R) method. The results showed that if no maintenance operation is performed on the subsystems, the overall reliability of the boring machine will decline to zero after about 38 hours of continuous boring operation. Finally, to improve the overall reliability of the boring machine, based on accomplished reliability analysis, we suggest an effective preventive maintenance and repair system for keeping the machine in optimal operating conditions for a longer period.


Introduction
The high capability of full-face mechanized tunnel boring machines (TBM) in adverse geological conditions, including rocky and alluvial structures, has led to the widespread use of these machines in the construction of urban mechanized tunnels, water conveyance tunnels, and road tunnels (Amini Khoshalan et al., 2017). Evaluating the performance and progress of these machines is of great importance in project planning and control and estimating the project costs. Currently, factors affecting the progress of full-face tunnel boring machines are categorized into three groups including the characteristics of intact rock, rock mass, and machine characteristics (Jahed Armaghani et al., 2019). The focus of research on TBM machine performance includes geological and mechanical properties of rock or soil-based on experimental models, theoretical-experimental mod-els based on laboratory data and field observations, and the use of models based on artificial intelligence methods to predict penetration rate and machine advance rate. Examples of reported experimental studies in this regard include the determination of penetration rate using the tensile strength of rock and the average force of cutter discs (Farmer and Glossop, 1980), prediction of penetration rate using uniaxial compressive strength, cutter face strength, and the angle between the tunnel axis and discontinuity plane at the tunnel face (Boyd, 1986), and the estimation of TBM penetration rate and advance rate based on the Q rock mass classification system (Barton, 1999). Also, some examples of experimental-theory models are the Colorado School of Mines (CSM) model based on rock characteristics (Ozdemir, 1977) and cutter disc geometry for penetration rate prediction (Rostami, 1997), TBM penetration rate prediction using rock mass characteristics and data from mechanized tunnelling projects in Norway (Bruland, 2000), evaluation of tunnel boring machine performance in hard rocks (Ramezanzadeh, 2005), field penetration rate predic-Rudarsko-geološko-naftni zbornik i autori (The Mining-Geology-Petroleum Engineering Bulletin and the authors) ©, 2022, pp. 149-160, DOI: 10.17794/rgn.2022.3.12 tion (FPI) using uniaxial compressive strength (UCS) of intact rock and discontinuity spacing (Hassanpour et al., 2009), determining the average advance rate of TBM based on rock mass drillability (Bieniawski et al., 2007), prediction of TBM penetration rate using water conveyance tunnel data in New York City (Yagiz, 2008), and the effect of rock mass parameters on the efficiency of full-face boring machines in hard rock (Frough, 2012). Several other studies have been reported on the performance of TBM machines using a wide range of methods of artificial intelligence and soft-computing (Alvarez Extensive studies on the performance of TBM show that these research studies are based on the characteristics and geological conditions of the tunnel environment and parameters, such as the geometry of cutter discs and the thrust force of the machine. Therefore, the estimated penetration rate and the advance rate of TBM are not very accurate. However, the advance rate is equal to the ratio of the distance drilled and supported in the tunnel to the total time required, including delays due to breakdowns and maintenance of the tunnel boring machine. Since most studies report delays and these delays in mechanized boring are due to delays related to the boring machines themselves, to improve the performance of these machines, it is necessary to conduct extensive and comprehensive studies on the delays and failures of these machines. According to previous relevant studies, in most mechanized tunnelling projects, the average active boring time is shorter than 50% of the total working time of the machine. The average boring and downtimes of 10 mechanized tunnelling projects (see Figure 1) show that the portion of breakdowns due to failure and maintenance of TBM is about 44% of the total working time and more than 60% of the total downtimes (Laughton, 1998).
Also, in another comprehensive study to determine the efficiency of TBM using a rock mass classification system and a database consisting of 682 days of tunnelling operations, the delays due to geological conditions were about 20% of the drilling operation time; nevertheless, the related delays of the boring machine take up to 60% of the boring time (Frough and Torabi, 2013).
Given the importance of downtime and delays of the full-face boring machines, a reliability analysis of these machines seems necessary. This analysis is a practical method based on the study of the behaviour of failure and maintenance of machines and systems. The method allows for the proposal of the appropriate planning for machine maintenance and cost reduction (Kumar et al., 1989). In the most comprehensive study on the reliability of earth pressure balance (EPB) TBM, this machine is divided into 5 subsystems including mechanical, electrical, hydraulic, compressed air, and water subsystems followed by recording the failures and delays of each subsystem. It is noteworthy that these subsystems are considered as a series network. Finally, by calculating the reliability of each subsystem, the reliability of the whole machine is calculated, the results of which are shown in Figure  The results of the above research showed that compressed air and water subsystems have the lowest failure rate and therefore the highest reliability. Hence, the mechanical subsystem with the highest frequency of failures and delays has the lowest reliability. Also, according to Figure 2, if no preventive maintenance and periodic technical inspection are performed on the subsystems, after about 45 hours of continuous operation, the reliability of the machine will be zero and the machine will stop due to failure in one of the subsystems. TBM is a tunnelling machine with high investment and operating costs utilized in many tunnelling projects. Due to the importance of the reliability issue of these machines, in this study the Monte Carlo simulation method was applied to simulate the reliably of TBM such as to provide a suitable maintenance program to improve its reliability. It is worth noting that the method used in reliability analysis in this research could be used for other TBMs working in various mechanized tunnelling projects with minor adaptations.

Methodology
Regarding the widespread use of EPB TBMs in most mechanized tunnelling projects, in this research, the TBM used in Line 1 of the Tabriz Metro was selected as a case study.

Tunnel route geological condition
Geologically, the route of the tunnel of line 1 of the Tabriz Metro (about 8 km length) mainly consists of grains of sand, silt, clay (with high plasticity), and rubble. The ground materials of this project have an average composition of 8% gravel, 66% sand, and 26% silt and clay. Since urban subway tunnels pass under the dense residential part of the city, excavation of these tunnels with traditional and common tunnelling methods is difficult and sometimes impossible such that it requires the use of mechanized tunnelling methods. According to the engineering geology studies and geotechnical parameters of the project route, the EPB TBM was selected for this project (TURO, 2004).

Structure of the EPB TBM
The machine applied in this project is the product of NFM, one of the major manufacturers of TBM. The machine is an EPB-type TBM with a cutter face of a diameter of 6.88 m and a total length of about 103 m. The TBM consists of three main parts: a cutter head (including cutter face, screw conveyor, propulsion cylinders, and segments installer), a connecting beam (including foam preparation units, segments conveyor control box, a drainage pump, a power box, bentonite pressure vessels, a dewatering pump, a conveyor, and control room), and a backup (consisting of 9 trailers including required water tanks, slurry and bentonite tanks, hydraulic systems, electricity and lighting, and an emergency generator).

Data Collecting
To analyse the reliability of systems (machines), each system is divided into the appropriate subsystems. Then, the failure data of each subsystem are recorded separately and the reliability of each subsystem is calculated. Finally, according to whether subsystems are parallel or in series, the reliability of the whole system is calculated. In this study, 5 subsystems in a series network including electrical, mechanical, hydraulic, compressed air, and water subsystems were considered for the studied TBM (according to Amini Khoshalan et al., 2015). The required data were collected from the operation of about 24 months of tunnelling in Line 1 of the Tabriz Metro. For this purpose, raw data recorded by maintenance personnel, operators, and operations shift supervisors in various reports were used and data related to all breakdowns related to the TBM were extracted from these reports. Finally, data of each subsystem were classified and arranged in chronological order. The results of the Pareto analysis showed that 49.4% of the breakdown data were related to the mechanical subsystem, 18.1% to the hydraulic subsystem, 14.1% to the electrical subsystem, 9.4% to the compressed air subsystem, and 9% to the water subsystem. For example, Table 1 and Table 2 show some of the prepared failure data of the hydraulic and electrical subsystems of the studied machine in the database.

Reliability
Reliability is a function in terms of time that expresses the probability of correctly performing the assigned An electrical problem of propulsion jacks and fixing it 3.83 2 An electrical problem of the foam injection system 27.67 3 20 kV power cabling of boring machine 208 4 A problem in the main water tank thermostat 26.33 5 An electrical problem of the foam injection system 105.83 6 20 kV power cabling of boring machine 53.33 7 An electrical problem of the foam injection system 23.5 8 An electrical problem in the monitoring of the machine 11.83 9 Screw conveyor sensor problem 25.83 10 Erector sensor problem 15.33 A problem in the hydraulic valve of the erector jack 33 5 Lowering the oil level of the main tank and injecting oil 47.83 6 Injection of hydraulic oil into the main tank of the machine 9.5 7 A sudden increase in the pressure of hydraulic pumps 31 8 Burst hydraulic hose of the segmented conveyor 154.5 9 Segment conveyor hydraulic problem 6.5 10 Defect in the oil filter 65.17 and predetermined tasks in specific conditions and schedules for a device or system. In other words, reliability is the probability of no breakdown time during a period, which is expressed by Equation 1: (1) Where: R(t) is reliability in time t and f(t) is the probability density function of failure (Dhillion, 2008).

Monte Carlo simulation
Since reliability is a probabilistic value, knowing the accuracy and confidence of the results has been one of the concerns of activists in this field. A simulation can be used to address these issues to some extent and provide practical solutions to these problems. The Monte Carlo simulation is a random simulation method to generate random samples of the density distribution function of the probability of failure of each subsystem. Next, this method performs calculations according to the interaction of the subsystems in the overall system.

K-R Monte Carlo simulation method
The first step in starting this method is to determine the cumulative distribution functions of the failures of each subsystem and the characteristics of those distribution functions (Kamat and Riley, 1975). The steps of this method to assess reliability at different times are as follows (Hoseinie, 2011): A) Find paths for different subsystems of the device that make the whole device active. B) Generate random failure times (t i ) from the cumulative failure distribution functions of each subsystem (i denotes the subsystem number and n is the number of subsystems, which should be 0 <i <n) C) The values of t i relative to t are measured for all subsystems if t i > t is the active subsystem and t i <t is the subsystem in question. D) According to the results of parts A and C, the whole device is checked for activation or inactivity. So, if there is at least one path of well subsystems that causes the whole device to be active, then at time t the device is active, but if there is a faulty subsystem on all routes, it indicates that the device is inactive at time t. E) Steps B, C, and D are repeated m times and the values of n s (t) (i.e. the number of healthy subsystems in the total number of repetitions) and n f (t) (i.e. the number of failures in the total number of repetitions) are determined. So, Equation 2 is established as follows: (2) F) Determine the reliability at time t using Equation (3): (3)

Reliability simulation of Tabriz Metro TBM
Since reliability is a probabilistic value, it is always associated with uncertainty. Therefore, in this study, the Monte Carlo simulation method was used to evaluate the reliability of TBM. To simulate the reliability of TBM, which is a repairable machine, the Kamat and Franzmeier method could be used, as the most widely used method for these machines. However, because this method considers both times between failures and repairs in performing calculations, the assessment of an operation's reliability with this method has many practical complexities, making the calculation difficult and long. Accordingly, in cases where the amount of downtime for repairs relative to the operating time of the device is negligible, the K-R method can be employed to simulate non-repairable devices. However, it is of note that the basis of Kamat and Franzmeier's method is taken from the Ka-mat and Riley method (Hoseinie, 2011). The studies of the TBM showed that the total downtime of the machine for repairs was negligible compared to the useful working time of the machine. Therefore, as mentioned, to simplify and reduce the simulation process time, the machine downtime can be ignored, and the K-R method can be used to simulate the reliability of the TBM used in Line 1 of the Tabriz Metro. To perform a simulation with this method, first, it is necessary to identify all the paths including different subsystems that cause the device to be active. Due to the arrangement of the subsystem series of this device, it is required to perform this step be- Running the simulation requires programming in the environment of programming software. MATLAB software is one of these computer programs with many advantages, highly regarded by programmers and researchers in various fields. Therefore, in this research, a computer program was developed in MATLAB software to perform the simulation. This program calculates the reliability of the device at time T and with m numbers of iteration. The main inputs of developed code are time, iteration number, and cumulative distribution functions of device subsystems. The iteration number is one of the factors that affect the running time of the code and the accuracy of the results. Therefore, it is necessary to determine the optimal iteration number before running the main code. For this purpose, in this study, before starting the main simulation process, the appropriate value for the code iteration was determined in such a way that the reliability of the device at time T = 10 was assessed using 100 to 10,000 iterations and an epoch value of 50 units. The results of these calculations are presented in Figure 5.
As shown in Figure 5, with an increase in the iteration number, the calculated reliability values tend to range between 0.25 and 0.35, and exceeding 3000, the software-simulated reliability value indicates neglectable variation. Therefore, to perform the main simulation, the number 3000 was selected as the appropriate iteration of the code developed for the reliability simulation.
After determining the optimal iteration number, the failure data of each subsystem were checked and the best-fitted distribution function based on the minimum Kolmogorov-Smirnov (KS) value for each subsystem data were realized. Afterward, the parameters of the cu-mulative distribution function related to each subsystem were recorded as the input of the developed code to generate random failure times. Now that the main running step of the code has arrived, the program is such that the user is first asked for time T, this is the time when the user intends to calculate reliability by that time. The calculations in the code start from 0 and continue until the time desired by the user with increases of 1 hour. In the next section, each subsystem is reviewed and the reliability of each subsystem, and then the whole TBM reliability are calculated.

Mechanical subsystem
Mechanical subsystem failure data were evaluated and the histogram of this data was plotted in Figure 6. According to the available data and calculations, the three-parameter Weibull distribution function has the lowest K-S value of 0.0755 among the other distribution functions. So, this distribution function with the parameters α: 0.798, β: 12.257 and γ: 0.5 as distribution function parameters was selected as the best-fitted function for this failure data. By running the code, the reliability diagram of the mechanical subsystem of the TBM was calculated. As shown in the mechanical subsystem reliability diagram (see Figure 7), if no maintenance is performed on this subsystem, the reliability of the subsystem will be 0 after about 64 hours. On the other hand, the mechanical subsystem will be completely stopped after about 64 hours of continual machine boring operations. Figure 8 presents the histogram of failure data of the water subsystem. According to the available data and calculations, the normal distribution function has the lowest value of K-S (i.e. 0.031) among other distribution functions. Also, for σ = 53.11 and μ = 106.45, it is the best-fitted function for these failure data. By running the developed code, the reliability diagram of the water subsystem of TBM was calculated (see Figure 9). As can be seen from the diagram of the reliability of the water subsystem, if no maintenance is performed on this subsystem, the reliability will be close to 0 after about 258 hours.

Electrical subsystem
In this section, electrical subsystem failure data were analysed. According to the available data and calcula-tions (see Figure 10), the three-parameter log-normal distribution function has the lowest K-S value equal to 0.081 among other distribution functions and with parameters σ: 0.93, μ: 3.79, and γ: -4.69 is the best-fitted function for this breakdown data. The reliability of the TBM's electrical subsystem was calculated by running the developed code and related curve is presented in Figure 11. As shown in the electrical subsystem reliability diagram, if no maintenance is performed on this subsystem, the reliability will be 0 after about 165 hours.  Figure 12 presents the histogram of this subsystem failure. According to the available data and calculations, Figure 12: Histogram of failure data for the compressed air subsystem of the TBM the three-parameter Weibull distribution function has the lowest value of K-S equal to 0.127 among other distribution functions. Moreover, for α = 1.13, β = 100.9, and γ = 9.266, it is the best-fitted function for these failure data. By running the developed code, the reliability of the compressed air subsystem of TBM was calculated and shown in Figure 13.

Compressed air subsystem
As shown in the compressed air subsystem reliability diagram, if no maintenance is performed this subsystem will be stopped after about 418 hours of machine boring operations.

Hydraulic subsystem
In this section, the failure data of the hydraulic subsystem were analysed and the histogram and the optimal distribution function of these data were selected and plotted (see Figure 14). According to the available data and calculations, the generalized gamma distribution function has the lowest value of K-S equal to 0.083 among other distribution functions. Moreover, with parameters k: 0.889, α: 0.874, and β: 54.25, it is the bestfitted function for these failure data. By running the developed code, the reliability diagram (see Figure 15) of the hydraulic subsystem was prepared.
As shown in Figure 14, if no maintenance is performed on this subsystem, the reliability will be 0 after about 182 hours.

Overall reliability of the boring machine
In this step, the code is run by entering the desired time and the optimal iteration number of 3000. Next, the reliability of the whole TBM is simulated by considering the cumulative distribution functions of all subsystems. The overall reliability diagram of line 1 of the Tabriz Metro's EPB TBM is shown in Figure 16.
As shown in Figure 16, the overall reliability of the machine decreases with a relatively steep slope, and after 10 hours of continuous boring operations, its reliability is reduced by about 70%. Also, if no maintenance operations are performed on the subsystems, the overall reliability of the TBM reaches 0 after about 38 hours of continuous excavation operations.
As a limitation of this study, it is notable to mention that a comprehensive study on TBM Reliability analysis needs the consideration of conditional or environmental variables, such as the effects of the adverse geological conditions, the skill of the operators and maintenance personnel, temperature, changes in design, etc. but due to unavailable sufficient data, this was not possible in the present study.

Improving the reliability of the Boring Machine
One of the most important points to consider in a system is the proper timing of maintenance. In the repair program provided for the system, the entire downtime for repairs is predetermined. In addition to this factor, reducing the occurrence of unwanted breakdowns, increasing the service life of machines, and increasing the efficiency of maintenance staff, which could increase the overall productivity of the system (Levitt, 2011), are all considered before starting the project. The most widely used method for preparing this repair program is the time-based and reliability -centred maintenance (RCM). This program sets a schedule for inspection and maintenance of the device and its various parts (Hosseini et al., 2013). John Moubray (Moubray, 1997) defines maintenance based on reliability as a way for a device or system to perform its mission properly according to a set schedule without unwanted breakdowns and time delays. Reliability plays an important role in this method because the maintenance program is based on keeping the overall reliability of the device in a desirable and specific value. However, regarding the difference in the reliability level for different subsystems of the TBM and according to the reliability level of subsystems and its maintenance, the program presented for each subsystem is optimal (Dhillion, 2008). According to previous works conducted on the desirable level of reliability in different industries, the reliability level of 80 to 90% is usually considered as the suitable reliability (Amini Khoshalan et al., 2017). In this study, by analyzing the reliability results of different subsystems of the device (especially the low reliability of the mechanical subsystem and the expert opinion of maintenance engineers), reliability of about 80% was determined as the target reliability and the minimum level of operational reliability for the machine subsystems. Therefore, the start of maintenance for each subsystem is based on maintaining an 80% reliability value for the relevant subsystem. Table 3 shows the time when each subsystem reached the 80% reliability level.
Although employing the repair program at short intervals maintains the desired reliability, excessive downtime will reduce the productivity of the device. Therefore, Table 4 presents the preventive maintenance program for the TBM with a slight change in the hours when the reliability of the subsystem reaches the desired level of 80%. Another noteworthy point about the proposed maintenance program is that the repair program is designed to repair at least two subsystems in 10-and 15-hour repetitions, at least four subsystems in 30 hours, and the entire machine every 60 hours. This type of maintenance program design, in turn, plays a key role in increasing the productivity and efficiency of the repair staff.
According to Table 4, the mechanical, electrical, hydraulic, and compressed air subsystems have been repaired and maintained after every 30 hours of boring operations, and the whole machine has been maintained after 60 hours of continuous boring operations.

Results of preventive maintenance
To design a preventive maintenance system in a device, the subsystems need to be studied in terms of a maintenance point of view. Studying the five subsystems of TBM revealed that due to the repairs performed in each subsystem, the reliability of that subsystem, which is a type of machine with renewable reliability, returns to the original value of one or the same 100%. Regarding the effects of applying a preventive maintenance system, which increases the reliability, productivity, and lifetime of the machine by reducing malfunctions and unwanted stops, Table 5 presents the results of applying a preventive maintenance program for the TBM, during which  the reliability of the machine will be improved. According to the obtained results (see Figure 17), by applying this preventive maintenance program based on reliability and accurate and correct repairs, the reliability of the machine is maintained at the desired level and reaches 80% in a complete repair period after 60 hours. However, if the boring operation is performed without interrup-tion, the reliability of the machine will reach 0 after about 38 hours and the machine will stop. Over time, reliability decreases and increases periodically. Nevertheless, it is clear from the curve that the process of reducing reliability after repairs is much slower than the reliability curve without preventive repairs. Also, the rate of the reliability changes shows that by applying the schedule proposed in this research, the reliability of the TBM could be maintained above 60%. Maintaining the reliability of the machine at this level allows controlling and reducing the failure potential of the machine, as well as maintaining the boring continuity.

Conclusion
In this study, the reliability of the EPB TBM of Line 1 of the Tabriz Metro was evaluated based on the Monte Carlo simulation method. For this purpose, the studied machine was divided into 5 subsystems including mechanical, electrical, hydraulic, water and compressed air subsystems. Then, using operational reports of about 24 months of machine boring, the database containing the failure times of each subsystem was extracted. The results showed that after 3000 and more iterations, the simulated reliability value changed slightly. Thus, to perform the simulation, the number 3000 was selected as the appropriate iteration number of the code developed to simulate the reliability in MATLAB software. Since the K-R method is widely used in the Monte Carlo simulation, this technique was applied to simulate the reliability of individual subsystems and ultimately the whole machine. According to the obtained results, regarding the minimum values of Kolmogorov-Smirnov (K-S) test, the best-fit distributions for mechanical, electrical, hydraulic, water and compressed air subsystems were three-parameter Weibull, three-parameter lognormal, generalized gamma, normal distribution and three-parameter Weibull distribution respectively. The reliability of each subsystem was simulated according to these distributions and the results showed that if no preventive maintenance operations are performed on the subsystems, the overall reliability of TBM will reach zero after about 38 hours of continuous boring operations. Finally, to improve the reliability of the machine, a suitable preventive maintenance system based on reliability was proposed to maintain the machine in optimal operating conditions, reduce breakdowns, and help maintain the continuity of boring operations.