Robustness Analysis and Experimental Validation of a Fault Detection and Isolation Method for the Modular Multilevel Converter

This paper presents a fault detection and isolation (FDI) method for open-circuit faults of power semiconductor devices in a modular multilevel converter (MMC). The proposed FDI method is simple with only one sliding-mode observer (SMO) equation and requires no additional transducers. The method is based on an SMO for the circulating current in an MMC. An open-circuit fault of power semiconductor device is detected when the observed circulating current diverges from the measured one. A fault is located by employing an assumption-verification process. To improve the robustness of the proposed FDI method, a new technique based on the observer injection term is introduced to estimate the value of the uncertainties and disturbances; this estimated value can be used to compensate the uncertainties and disturbances. As a result, the proposed FDI scheme can detect and locate an open-circuit fault in a power semiconductor device while ignoring parameter uncertainties, measurement error, and other bounded disturbances. The FDI scheme has been implemented in a field-programmable gate array using fixed-point arithmetic and tested on a single-phase MMC prototype. Experimental results under different load conditions show that an open-circuit faulty power semiconductor device in an MMC can be detected and located in less than 50 ms.

MMC. An open-circuit fault of power semiconductor device is detected when the observed circulating current diverges from the measured one. A fault is located by employing an assumptionverification process. To improve the robustness of the proposed FDI method, a new technique based on the observer injection term is introduced to estimate the value of the uncertainties and disturbances, this estimated value can be used to compensate the uncertainties and disturbances. As a result, the proposed FDI scheme can detect and locate an open-circuit fault in a power semiconductor device while ignoring parameter uncertainties, measurement error and other bounded disturbances. The FDI scheme has been implemented in a field programmable gate array (FPGA) using fixed point arithmetic and tested on a single phase MMC prototype. Experimental results under different load conditions show that an open-circuit faulty power semiconductor device in an MMC can be detected and located in less than 50ms.
Index Terms-Fault detection and isolation, modular multilevel converter, sliding mode observer.

I. INTRODUCTION
T HE modular multilevel converter (MMC) is the state of the art in multilevel converters and is receiving great interest both from academia and industry. It has a number of desirable features such as modular configuration, low harmonic distortion, low voltage stress on the semiconductor devices, high voltage and high power capability and simple realization of redundancy [1]. In addition, the cells of an MMC are fed by capacitors and no multi-phase transformers are required. A comprehensive introduction of the operation of the MMC is given in [2]. The review paper [3] summarizes the latest achievements regarding the MMC in terms of modeling, control, modulation, applications and future trend.
Power semiconductor switches are amongst the most failure prone components in a power converter and each of these devices is a potential failure point [4]. With large numbers of semiconductor devices, the possibility of fault occurrence is much larger than for normal two-level voltage source converters (VSCs). Faults in power semiconductor devices cause a power converter operating far away from its setting point and this abnormal operation cannot be overcome by a feedback controller. If the faulty operation is allowed, other devices may be damaged and a shut-down of the plant may follow. Therefore, it is vital to detect and isolate these faults immediately after their occurrence.
Fault detection and isolation (FDI) deals with detecting anomalous situations (fault detection) and addressing their causes (fault isolation) [5]. An FDI scheme can be implemented either by hardware method or analytical (software) method [5], [6]. Hardware FDI employs repeated components or additional sensors, and a fault can be obtained if the behaviour of the process components are different from the redundant ones, or the additional sensors detect anomalous signals. It is straightforward and reliable but increases the cost, size and hardware complexity of the plant. The basic idea of analytical FDI is to check the consistency between the actual system behaviour and its estimated behaviour [7]. The estimated behaviour can be obtained either from a mathematical model of the system (for example using observers) or an analysis of the historical data (for example using data mining or neural networks). Although the algorithm is more sophisticated, the cost and hardware complexity of employing analytical method is less than that for hardware method. The application of the analytical FDI methods is boosted by the great advances of the computer technology in recent decades [6].
There are two types of faults seen in a fully controlled power semiconductor device: short-circuit fault (remains ON regardless of the gate signal) and open-circuit fault (remains OFF regardless of the gate signal). Any short circuit fault needs to be detected within 10µs to save the semiconductor devices from destruction and to avoid a shoot-through fault with the complementary device [8]. A short circuit in an insulatedgate bipolar transistor (IGBT) is usually detected using a hardware circuit, often with additional sensors and associate circuits. These sensors and circuits are usually integrated in a gate driver to form an active/smart gate driver [9], [10]. The additional sensors and circuits add extra cost and size to the system. Furthermore, these active gate drivers can fail themselves due to their complexity and hence decrease the reliability of the power converter.
This paper deals with detection and isolation of an open-  Fig. 1 where the parameters are same as an industrial 24MW MMC [11] and an open-circuit fault occurred at 0.1s. Only one of the phases is considered. It can be seen that an open-circuit fault is not fatal immediately to an MMC, however the fault needs to be detected and removed within 0.1s to avoid secondary damages on other devices. The cause of an open-circuit fault can be various: lifting/fusing of bonding wires, a driver failure, or a communication problem between the controller and driver. The gate driver is recognised as the third most failure prone components according to an industry based survey [12]. The simplest detection method is to use an active gate driver as mentioned previously. Analytical redundancy can be used detect an open-circuit fault as this type of fault is not fatal immediately and can be tolerated by the power converter for some time [13]. Several analytical FDI methods based on the analysis of the output voltage waveform are reported. In [14], a faulty cell in a flying capacitor (FC) converter is detected and localized by analysing the switching frequency of the output phase voltage. This technique has also been applied to a cascaded H-bridge [15] where an open-or short-circuit fault can be detected. In [16], the characteristics of the output phase voltage are analysed in the time domain, and the occurrence of a fault is detected by the degradation of the output voltage, while the fault is located by comparing the output phase voltage with all the possible phase fault voltages. In [17], an artificial intelligence (AI) FDI algorithm is proposed, where the historical data of the output phase voltages both in normal and faulty conditions are used to train a neural network. Survey [18] has presented a comprehensive review of the reliability of power electronics systems including methodologies of assessing reliability, methods to detect and locate faults as well as fault tolerate operation. Survey [19] has summarised the recent fault tolerance techniques for three phase voltage source converters. A sliding mode observer (SMO) based FDI technique for an MMC was proposed in [20], [21], where a faulty power semiconductor device can be detected and located within 100ms. The work presented in this paper is an improved method. This method is simpler using only one SMO equation, and can detect and locate an open-circuit fault in less than 50ms. Furthermore, a technique is proposed to compensate for any parameter uncertainties, measurement errors and other bounded disturbances. The resultant FDI scheme can detect an open-circuit faulty power semiconductor device while rejecting any uncertainties and disturbances. The practical implementation of the SMO based FDI scheme in an FPGA (field programmable gate array) is also discussed in this paper and the experimental results at different load conditions are presented.

II. SLIDING MODE OBSERVER A. Introduction
An observer is a mathematical replica of a system to estimate its internal states, driven by the input of the system and a signal representing the discrepancy between the estimated and actual states [22]. In the earliest observers such as the Luenberger observer, the differences between the estimated outputs and the actual outputs of the plant are fed back to the observer linearly, and the estimated states cannot converge to the measured states in the presence of a disturbance [22], [23]. The sliding mode observer employs a high-gain switching function of the discrepancy between the estimated and actual outputs to force the estimated states to the actual states asymptotically. A first order system (1) is used in this paper: An SMO for (1) is introduced: wherex donates the estimated/observed state of x and L denotes the observer gains designed to drivex → x in finite time. Subtracting (2) from (1) yields the dynamic error between the observed and measured states: Choosing L > |ax|, we obtaiñ which will forcex andẋ to zero and keep zero thereafter, this motion along a line is the so-called sliding mode [24].

B. Sliding mode observer for an MMC
An SMO can be built for an MMC based on (2). In this paper a single-phase eight-cell MMC is considered, nevertheless, the method is versatile and can be used for MMC with hundreds of cells.
The circuit diagram and parameters of the MMC used for the analysis and simulation are presented in Fig. 2 and Table I.  According to the Kirchhoffs voltage law (KVL), we obtain the following equation for the MMC (Fig. 2): where i p and i n are the upper and lower arm currents, l is the inductance of arm inductors, E p and E n are the DC voltages, v ci and S i are the capacitor voltage and switching state of the Cell i respectively. S i is defined in Table II, where g 1 and g 2 are the gate signals for the upper and lower switch in a cell.
Since the circulating current of the MMC converter is i z = (i p + i n )/2 [25], (6) can be rewritten as Based on (2) and (7) an SMO can be obtained for the MMC: It is noted that a saturation function sat(x) (9) is utilized instead of sgn(x) for less chattering of the observed states according to [26].
where h is a constant.
A simulation has been carried out in SIMULINK/PLECS to verify the SMO (8). The parameters of the MMC are listed in Table I and the observer gain L is 6 × 10 4 and h = 1. Fig. 3 shows the simulation results where it can be seen that i z follows i z closely.

III. FAULT DETECTION AND ISOLATION USING SMO A. Mathematical Basis
The fault detection is firstly considered and a fault is added to the first order systeṁ where f represents the value of the fault and k the corresponding coefficients. It is noted that f is often a very large value and cannot be overcome by the feedback control. The difference between the observed and measured states can be obtained by subtracting (10) from (2): If we choose then at the faulty conditionx 1ẋ1 > 0, the observer cannot enter the sliding mode andx will diverge from x significantly. For an open circuit fault at Cell i in the MMC, f = v ci /(2l), k i = 1 and therefore L needs to satisfy the following condition to detect an open-circuit faulty switch: The occurrence of a fault can be detected by comparing |x −x| with a given threshold value.
For the fault isolation an assumption-verification method was proposed [20], [21]. The procedure is to assume a location for the fault, modify the observer equation accordingly and to again compare the observed states with the measured states.
x will converge to x if the assumption is correct. In this case kf is included in the observer as well: Subtracting (14) from (10) yields the dynamical error: which is the same as (4) where sliding conditionxẋ < 0 is satisfied andx → x in finite time. On the other hand, if the assumed fault location is incorrect,x will keep diverging from x. In this way the fault can be located.

B. Flowchart
The flowchart of this algorithm is shown in Fig. 4. There are two modes in this algorithm: FD (fault detection) mode and FI (fault isolation) mode: [FD mode] This mode monitors whether a fault occurs. If the difference between the observed and measured circulating current |i z −î z | is larger than a threshold value I th1 and this condition persists for 0.4ms, then an open-circuit fault occurs and the FDI scheme enters FI mode; otherwise the FDI scheme stays in FD mode.
[FI mode] This mode locates where is the open-circuit fault. The assumption-verification process is employed. The Cell i, T j is assumed to be the faulty device, the switching state S i in SMO (8) is modified according to Table II in [20]. If Cell i, T j is the actual faulty device,î z converges to i z , otherwisê i z diverges from i z . It is important to note that during some points in the faulty period the current of the faulty arm can be clamped to zero because of the fault, and the converter is unobservable in these moments. Thereforeî z is set toî z = i z when the current of the assumed faulty arm is 0 as shown in Fig. 4.
It is noted that the threshold values I th1 and I th2 are load dependent. In the case of faulty power semiconductor device, i z diverges from i z slower under light load than that under heavy load. The divergence rate betweenî z and i z is also related to the observer gain L according to (11  many choices for I th1 and I th2 and, for example, one of them can be: where L o denotes the observer gain under the full load, I z the circulating current, I zo the circulating current under full load. As shown in (16), it is recommended that L, I th1 and I th2 are larger than certain values to reject the parameter uncertainties and measurement noise.
Simulations have been carried out to verify the proposed algorithm with the parameters listed in Table I. L needs to satisfy L < V c /2l = 2.5 × 10 5 according to (12), and L is set to 6 × 10 4 so that an open-circuit fault can be detected and located within 50ms.
In Fig. 5 to 7, an open-circuit fault occurs at Cell 1, T 1 at 0.1s. In Fig. 5, no FDI scheme is applied andî z diverges from i z at a very high rate after the occurrence of the fault. In Fig. 6 and 7, the FDI algorithm enters FI mode once |i z − i z | > I th1 persists for 0.4ms. The FI mode is indicated with a grey background. In Fig. 6 the assumed faulty switch is the actual one andî z converges to i z in FI mode; in Fig. 7 the assumed faulty switch is Cell 2, T 1 , which is not the actual faulty device,î z diverges from i z in FI mode and |î z − i z | > I th2 in 50ms.

COMPENSATION
In any analytical FDI scheme certain assumptions including accurate physical parameters, precise measurements and linear, time-invariant operation are made when modelling a plant [5]. However, these assumptions may not be accurate. The parameters may contain uncertainties, for example the parasitic resistance of an inductor, and may degrade over time. Measurements usually have errors superimposed on the signals. These errors can include electronic white noise and incorrect scaling factors between the measured and actual variable. Furthermore all dynamical plants are non-linear, but behave almost linearly. These uncertainties and disturbances may lead to divergence between the actual system behaviour and its estimated behaviour, giving false alarms. The robustness of an FDI scheme is the degree to which the system can maximise the sensitivity of the detection of actual malfunctions whilst discriminating between apparent faults and disturbances due to measurement noise, parameter uncertainty or transients [5].
The desirable features of this FDI method are: • White noise in the measurement dose not affect the observed states, so it does not affect the FDI. • The value of the parameter uncertainties, scaling errors in the measurements and other bounded disturbance is estimated using the observer injection term, this estimated value is used to compensate for the uncertainties and disturbances. In summary, the proposed method is able to detect and locate an open circuit fault of a power semiconductor device whilst ignoring parameter uncertainties, measurement noise or other bounded disturbances. This desirable feature will be discussed in this section.

A. Mathematical basis
The first order system (1) and its SMO (2) are considered to demonstrate the features described above. By adding the uncertainties and disturbances to (2), we obtaiṅ where ∆a and ∆b denote the values of parameter uncertainties, ∆u the value of the measurement noise consisting of white noise ∆r and a scaling error between the measured and actual variable ∆s. It is assumed that the values of these uncertainties and disturbances are bounded and are smaller than the value of a fault.
Subtracting (17) from (1) we obtain the errors between the measured and observed states: If we choose L satisfying: thenxẋ < |x|(|ax| + |D| − L) < 0, the sliding mode in (18) occurs andx → 0 (namelyx → x) in finite time.x is not affected by the uncertainties or the disturbances. Based on (12) and (19), the observer gain needs to satisfy the following condition to discriminate an open-circuit fault from uncertainties and disturbances: Two simulations have been carried out to verify the above analysis. In these simulations the parameter uncertainties and measurement noise are added to the observer, all other conditions are the same as for Fig. 6 and 7. An open-circuit fault in Cell 1, T 1 occurred at 0.1s and in FI mode the assumed faulty switch is the actual one. In the first simulation (Fig. 8) 5% white noise is added to all the measurements as shown in (21). In the second simulation ( Fig. 9) parameter uncertainties and 1% scaling errors in measurements are added to the SMO as shown in (22).
where the subscript mes denotes measured variables, r 1 , r 2 and r 3 are random numbers ranging from -1 to 1 and change at every calculation cycle,l denotes the inductance used in the observer, R l denotes the parasitic resistance of the arm inductors.
In the fault free condition it can be seen in Fig. 8 and 9 thatî z converge to i z and is not affected by the uncertainties and disturbances. It can also be seen in Fig. 8 that white noise in the measurements does not affect the fault isolation which is indicated with grey background. Since the average value of the white noise is zero its effect on the observer is selfcancelling and therefore the observer and FDI scheme are not affected. However, parameter uncertainties and scaling errors in the measurements will lead to incorrect fault isolation. As shown in Fig. 9, there is noticeable difference between theî z and i z . Larger observer gain and threshold values can be used to alleviate the incorrect fault isolation, but more time will be needed to detect and locate a fault.

B. Compensation of uncertainties and disturbances
In this section the value of parameter uncertainties, scaling errors in the measurements and other bounded disturbances are estimated and this estimated value is used to compensate the observer to achieve robust FDI. Once (18) enters the sliding mode,x → 0 andẋ → 0 and it can be obtained: When the MMC is fault free (0 to 0.1s in Fig. 8 and 9) the uncertainties and disturbances D is counterbalanced by the observer injection term −Lsgn(x) according to (23). Therefore the value of D can be extracted from −Lsgn(x). Since −Lsgn(x) is a high frequency switching term, a low pass filter is applied to obtain the estimated value of D: whereD denotes the estimated value of the uncertainties and disturbances, and τ denotes time constant of the low pass filter. A simulation has been undertaken with the white noise (21), scaling errors and parameter uncertainties (22), and the simulation results are shown in Fig. 10. The value ofD is about 20000 A/s and is caused by the parameter uncertainties and scaling errors in the measurements (the effect of the white noise is self-cancelling). Because of the uncertainties and disturbances, the observer injection term Lsgn(ĩ z ) operates at a biased condition with an offset of 20000 A/s, as a result the observer becomes sensitive to noise and incorrect fault isolation is caused. In order to achieve robust FDI,D is added to SMO to compensate for the uncertainties and disturbances: It is noted thatD only updates when the system is fault free. Simulations have been carried out to test the FDI with compensation of the uncertainties and disturbances. The white noise (condition (21)), parameter uncertainties and scaling errors in measurements (condition (22)) are considered.D is added to compensate for the uncertainties and disturbances. Simulation results are shown in Fig. 11 and 12. It can be seen in Fig. 11 and 12 that the uncertainties and disturbances are compensated and the open-circuit fault can be detected and located without influenced by the uncertainties and disturbances.   The diagram and a photograph of the laboratory set-up are shown in Fig. 13 and 14. The assembled power module with gate driver and heatsink is shown in Fig. 15. The power module is soldered to a module interface board and attached to a heatsink. The cell capacitances are selected such that the ripple of the capacitor voltages is less than 10% [27] and arm inductances are chosen such that the switching harmonic is less than 60% of the nominal circulating current. The parameters of the experimental rig are listed in Table III. The control scheme of the MMC experimental rig is shown in Fig. 16. The subscripts p and n denote the upper and lower arms respectively. K v (s) and K i (s) are the PI compensators for the regulation of the average capacitor voltages, G P R (s) is a proportional resonant (PR) compensator to suppress the second harmonic of the MMC circulating current. The details of these compensators are listed in Table IV. v z is the output of the these compensators and V * o is the command for the AC voltage. Modulation indices for the upper and lower arms m p and m n can be obtained with v z and V * o . m Bi is the term for balancing the capacitor voltages and can be obtained using block diagram shown the Fig. 17. m i,p and m i,n are the modulation indices for Cell i in the upper and lower arms respectively. The phase-shifted PWM is used to generate gate signals for the IGBTs.

B. FPGA implementation of the SMO
The sliding mode observer is implemented in the FPGA to obtain the quasi-analog behaviour of the observed states. The observer is implemented using fixed point as there is no floating point unit (FPU) in the A3P1000 FPGA. The implementation includes three steps.
Step 1: Convert the analog observer into discrete form.
)/T s , the discrete sliding mode observer (8) can be expressed aŝ Step where m I , m V and m E are the scaling factors. Substituting (27) into (26) we obtain Step 3: Convert the parameters from floating point to fixed point and implement the observer in the FPGA using Verilog. The observer equations are break down into three parts as shown in (28). The block diagram of FPGA program is illustrated in Fig. 18. The subtraction is performed by adding the complement of the subtracted number and the multiplication is carried out by shifting.

C. Experimental results
In the experimental tests, to create the open-circuit fault condition on a power semiconductor device, the gate drive signal of the device is set to low. The experimental results are taken using a C6713 host-port interface (HPI) daughtercard and the waveforms are shown in Fig. 19     In these experimental tests, parameter uncertainties and measurement noise are considered: 10% error in the inductance l, 0.11Ω parasitic resistance in the arm inductors and 5% scaling error in the measurement of the e p . A low pass filter with a time constant of 0.1s is used to filter the switching  (24). This filter is implemented in the DSP. The estimated value of the uncertainties and disturbances is about -2400 A/s, as shown Fig. 19. This estimated value is put into the observer to compensate for the uncertainties and disturbances. In the experimental results in Fig. 20 to 26 this compensation has been added. Fig. 20 shows experimental waveforms of the fault occurrence. An open-circuit fault occurs at Cell 6, T 1 occurs at 0.1s, no FDI algorithm is applied here. Before the fault,î z follows i z closely; after the fault occurrenceî z diverges from i z significantly. Fig. 21 and Fig. 22 show waveforms with different assumed fault locations. In these two figures, an open-circuit fault occurs Cell 6, T 2 at 0.1s. At full load the circulating current is 5.2A and the threshold values for FDI are chosen as I th1 = 10.4A, I th2 = 5.2A according to (16). I th2 = 5.2A is indicated using a black dash line. In Fig. 21, the assumed faulty switch is the actual one-Cell 6, T 2 ,î z converge to i z ; in Fig. 22, the assumed faulty switch is Cell 7, T 2 ,î z diverges from i z .
In Fig. 23 and 24 the MMC rig operates under light load with a circulating current I z = I zo /12 = 0.43A . According Transient operation does not disturb the proposed FDI method. An experimental test is undertaken with modulation index of the AC voltage changes from 0.6 to 0.95 at 0.07s and changes back at 0.12s. The experimental results are shown in Fig. 25 whereî z follows i z nicely regardless of the i z fluctuation.

D. Discussion on the detection time
The choice of threshold value in a fault detection system such as the one we have described is always a compromise between the time for detection and the certainty of a correct detection. In the simulation and experimental results above, we have used a very conservative value for the threshold which yields a detection time of 50ms. During the this time,  the capacitor voltage of the faulty cell in the 24MW MMC rises to approximately 2300V according to Fig. 1. Whilst this is unlikely to be an issue for the semi-conductors (usually rated at 3.3kV), it might be unacceptable in terms of the headroom on capacitor voltage rating. In addition careful coordination would be required with any local overvoltage protection. The detection time can be reduced by selecting a less conservative threshold as indicated in the results of Fig. 26 for the experimental rig, where an open-circuit fault occurs at Cell 5, T 1 at 0.05s and is automatically detected and removed once located. Here we have selected a threshold of I th2 = 2A (indicated in Fig. 21 to 24), which still gives good certainty of fault detection and yields a detection time of 20ms, reducing the impact on the capacitor voltages considerably. Clearly the exact situation in a practical converter will differ from that in our laboratory prototype and selection of an appropriate threshold will be an important consideration.

VI. CONCLUSION
This paper has presented a sliding mode observer based fault detection and isolation technique applied to a modular multilevel converter (MMC). The technique can detect and locate an open-circuit fault of a power semi-conductor device or a gate driver failure in less than 50ms. This method is simple with only one sliding mode observer equation and requires no additional transducers or circuits. However this method is not suitable for the detection and isolation of a short-circuit faulty device due to the very fast response requirement (10µs). It is suggested that the proposed method works together with the hardware detection methods (for shortcircuit fault) to achieve a more reliable MMC.
To improve the robustness of the fault detection and isolation method, a technique is proposed to estimate parameter uncertainties, measurement errors and other bounded disturbances, and the estimated value is used to compensate for the influence of the uncertainties and disturbances. As a result the proposed technique can detect and locate an open-circuit faulty power semiconductor device whilst ignoring the parameter uncertainties, measurement noise or other disturbances.
The fault detection and isolation algorithm has been implemented in an FPGA using fixed point arithmetic and has been tested on a experimental scaled-down, single phase, eight cell MMC converter. Experimental results have verified the analysis and simulation results. According to the experimental results, it is possible to use a smaller threshold value to detect and locate an open-circuit fault in less than 20ms.
This fault detection and isolation method can be applied to other converters with modular topologies employing similar analysis and principles. Furthermore, it is possible to apply this method for the detection and isolation of multiple opencircuit faults in an MMC, although it will take longer to find the faults as there are many possible fault scenarios to be assumed.