Removal of masking eﬀect for damage detection of structures

Damage detection of civil engineering structures relies heavily on the use of outlier analysis/novelty detection analysis. Generally, data captured from a structure in its normal environmental condition are used to create a model and compute control limits to represent the normal range of variations of damage sensitive features of the structure. However, the training database used usually includes outlier measurements, which may introduce masking eﬀect. These outlier measurements can aﬀect the mean and standard deviation/covariance matrix of the training database, and hence, aﬀect the model and the control limits. As a result, small damage may not be detected. Therefore, this paper proposes an approach of selecting a ‘clean’ training database for the construction of the baseline of the undamaged structure so as to detect damage at an earlier stage. The approach makes use of Principal Component Analysis and Median Absolute Deviation to identify outlier measurements. This approach can be applied before the application of damage detection methods to allow damage to be detected at an earlier stage. The proposed approach is applied to a numerical beam model and the Z24 Bridge, in Switzerland. The results obtained demonstrate that damage can be detected at an earlier stage using the approach proposed in this paper. The proposed method also allows the determination of the model (e.g. linear or nonlinear) to be used for damage detection.


Introduction
The development of damage detection methods in the past decade is focused on separating/eliminating the effects of the changing environmental and operational conditions from the effects of damage affecting civil engineering 5 structures. This is because the damage sensitive features (e.g. natural frequency) analysed are also affected by the changing environments the structures face, hence leading to false alerts if not considered [1,2]. A wide range of damage detection methods have been proposed by researchers 10 using different approaches (e.g. regression analysis [3,4,5,6,7,8,9], multivariate statistical tools [10,11,12,13,14] and a combination of both [15,16,17]) to solve these environmental and operational issues.
Although different approaches are adopted, most of 15 them have the same concept of first creating a baseline/model of the undamaged structure using features captured under a range of environmental conditions. Using the baseline/model created, new measurements can be compared to it to obtain a deviation index. This deviation index 20 Email address: William.soo@nottingham.edu.cn (William Soo Lon Wah 1,2 *) represents how much the new observation deviates from the ideal state of the structure. Then, through the use of an outlier analysis or a novelty detection analysis, the deviation can be classified as a normal measurement or an outlier measurement. An outlier measurement was defined 25 by Hawkins [18] as 'an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism'. Thus, for damage detection, this outlier measurement can be attributed to damage of structural components since the features were 30 generated from a new state of the structure.
In both outlier analysis and novelty detection analysis, control limits are created using a training database to represent the normal condition of the structure. By definition, the difference between these two analysis techniques lies in 35 the training database. For the novelty detection analysis, the training database consists of measurements free of outliers, while for the outlier analysis, outliers can be present in the training database [19]. For damage detection, since outliers are usually considered to represent damage events, 40 the training database which is composed of undamaged measurements, is usually assumed to be free of outliers. However, even though data are from the undamaged state of the structure, outliers not in the form of damage are gen-erally present. Fuentes[20] mentioned that outliers from 45 the undamaged structure can be in two forms. The first is when observations manifest themselves as extreme values, which is generally due to high noise level coming from the data collection process or due to data corruption. This type of outlier data is usually in number but with very 50 large values which may cause a small increase in the variance and control limits as well as moving the mean of the database towards them [4,20]. The second type is when the measurements are created through different mechanisms; one example is features gathered from temperature condi-55 tions below and above zero degrees. For example, some bridge structures [5,21,22,23] change behaviour when the temperature condition drops below freezing point. This change in behaviour is usually attributed to the increase in stiffness of the structure due to the asphalt layer on the 60 bridge or to the stiffening of the supports [5,21,22,23]. It is important to identify and take into account these outliers in the database to avoid creating a defective model. However, as mentioned by Dervilis et al. [4], generally the outlier measurements are not known a priori, and hence, the inclusive approach which consists of including outliers in the training database is usually adopted.
Outlier measurements present in the training database may cause a masking effect where real outliers (due to damage) are hidden during future analysis, or undam-70 aged measurements are flagged as damaged when they are not [4]. The masking effect generally occurs because most control limits in either novelty detection analysis or outlier analysis are computed using the mean and standard deviation/covariance matrix of the training database. Hence, 75 if the outliers increase the standard deviation or move the mean towards them, the sensitivity of the method to small damage will be reduced. Therefore, it is important to identify and remove these outlier measurements to have a 'clean' database for the computation of the control lim-80 its. The term 'clean' is adopted here to represent free of outliers.
Outlier measurements can also affect the baseline/model constructed to obtain the deviation index. For example, one popular technique adopted to obtain the deviation 85 index is the Mahalanobis Square Distance [24]. It relies heavily on the mean and covariance of the features of the undamaged data set. Thus, if outlier measurements are present, the model will be affected. This also usually occurs in other damage detection methods, such as the re-90 gression analysis method. The model created through the regression analysis will tend towards the outlier measurements, creating a defective model. Therefore, these outliers need to be identified and removed at the beginning before creating any baseline/model, and adopt a more ro-95 bust model (usually nonlinear) to tackle the second type of outliers (measurements coming from different environmental mechanisms).
Damage detection methods presented in the literature can give the presence of damage under changing environ-100 mental and operational conditions. However, no assess-ment has been given to the lowest level of damage that can be detected. This is because structures can be subjected to a wide range of damage scenarios and types, thus, it is difficult to define the severity of damage for each sce-105 nario. If the lowest damage severity that can be identified through the damage detection methods already makes the structure unsafe or the scenario can be easily seen through visual inspection (severe cracking or loss of pier), then the methods become redundant. Hence, the factor that should 110 be considered and improved is to make these methods more sensitive to alert damage so that damage can be detected at an earlier stage. This is related to how sensitive the methods are to changes in damage sensitive features and how clean the undamaged database is to represent the un-115 damaged state of the structure. However, having a clean database makes the methods also more prone to alerting undamaged cases as damage, because features with high noise levels will lie outside the control limits.
Therefore, an approach is proposed in this paper to 120 create a clean database to detect damage at an earlier stage whilst not compromising the performance by alerting healthy condition. This approach consists of a data cleaning step where outlier measurements in the training data set are identified and removed before applying dam-125 age detection methods. It also allows the determination of the model (linear or nonlinear) to be adopted for the baseline. The approach uses Principal Component Analysis (PCA) and Median Absolute Deviation (MAD) to identify the outliers. This approach can be implemented with 130 existing damage detection methods to improve their sensitivity to detect small damage. To test the proposed approach, two case studies are analysed in this paper; the first is a numerical beam structure model and the second is the Z24 Bridge, in Switzerland, which was subjected to 135 complicated environmental and operational conditions. The rests of the paper starts with an introduction on the different types of measurements that can be obtained from the undamaged structure. The method to detect outlier measurements with the different mathematical tools 140 used are then introduced. The application of the method to a beam structure model and the Z24 Bridge is then presented. A conclusion then closes the paper.

Introduction on normal and abnormal observa-
tions for creating the baseline 145 As mentioned previously, measurements from the undamaged state of a structure consist of features captured under a normal range of environmental and operational conditions, and some unwanted outlier measurements. The normal range of conditions may include the daily and sea-150 sonal change in ambient temperature conditions, the daily flow of traffic, and the normal wind speed acting on the structure. An outlier observation, in other terms, an abnormal observation not due to damage, may be attributed to several factors. Excessive traffic due to severe traffic 155 congestion or extremely high wind speed due to a cyclone may represent events leading to outliers. For example, the natural frequencies of the Tamar Bridge, in UK, were found not to be affected by low wind speed, but at high wind speed (>25 mph), the frequencies were affected [25].
These cases are outliers since they rarely occur and hence they need to be identified to be taken into account while constructing the baseline of the undamaged structure.
Another common example of outliers come from noise and processing errors. Farrar et al.[26] mentioned that 165 variabilities in modal testing procedures and data reduction can cause changes in the identified vibration properties of a structure. These effects, if high, will affect the measurements greatly leading to outliers. Therefore, these effects also need to be detected before constructing the 170 baseline and computing the control limits of the features of the undamaged structure. Fig. 1 gives a graphical representation of the different types of measurements that can be obtained from the undamaged condition of a structure. The plot represents 175 one damage sensitive feature versus another feature (e.g. first natural frequency versus second natural frequency). It should be noted that the data has not been generated by any model or distribution. The plot represents data gathered continuously, since generally structures are perma-180 nently and continuously monitored (i.e. there is no large gap in the environmental and operational conditions that was not monitored). The blue and yellow observations in the plot are the clean observations. Both types of measurements follow the 185 linear model given by the black line. The observations do not lie exactly along the linear model due to the presence of noise, variability in modal testing procedures and data reduction, and some minor environmental and operational conditions modifying the features. The blue data set is 190 more compact that the yellow data set. This is because the two data sets represent observations generated by two different mechanisms. In the context of this paper, the different ways the changing environmental and operational conditions (e.g. different temperature range or different 195 wind speed range) affect the features are referred to as the different mechanisms generating the features. A real-life example of two different mechanisms that gives a similar plot is when natural frequencies of a bridge structure are gathered below and above zero degrees [5,21,22,23].

200
Below freezing point, different structural components contribute to the stiffness of the structure, while above zero degrees, these components contribute less. As a result, the vibration properties are affected differently, and the relationship between temperature and frequencies are differ-205 ent. The effects of different mechanisms affecting the features should be taken into account during the construction of the baseline so as to avoid creating a defective model and giving false alerts.
Another type of measurement is the one given by the 210 grey observations. These observations are due to high noise level, large errors occurring during the extraction process of the features, and some minor environmental and operational effects. The clean data set are surrounded by these unwanted observations. If these observations are in-215 cluded in the training database, the sensitivity of damage detection methods to detect small damage may be reduced. This is because, the standard deviation of the database will be larger and hence, a larger range of normal conditions represented by control limits will be created. Therefore, 220 these grey observations need to be identified and omitted from the baseline. However, it should be noted that if these observations are omitted, a few damage alerts may be raised for the undamaged measurements. A compromise should be made between detecting small damage and not 225 alerting undamaged cases with high noise level, as damage.
It should be noted that, flagging outliers due to high noise level are generally not continuous. Hence, a threshold can be established to indicate the amount of continuous observations outside the control limits to raise the damage 230 alert.

Methodology for identifying outlier measurements
This section describes the proposed approach for identifying and removing outlier measurements to create the baseline of the undamaged structure. PCA, MAD and 235 Gaussian Mixture Model (GMM) are used in this paper, thus, an introduction to those mathematical tools is first given. The approach proposed in this paper is then described in details.

240
PCA is a multivariate statistical tool used to reduce the dimensions of a data set while still retaining most of the information. It is a generative latent variable model where the data being analysed is seen as being generated by a set of latent, unobserved variables [20]. The data here 245 is the damage sensitive features, and the latent variables could represent some mathematical abstractions or actually have physical meaning [20]. For example, the latent variables may represent the changing environmental and operational conditions (e.g. temperature) affecting the 250 values of damage sensitive features. It may also represent damage of structural components affecting the features.
PCA creates new non-correlated variables called 'principal components' to represent the factors/latent variables creating the largest variances in the original data set. The 255 principal components are obtained through a rotational transformation of the original data set as shown in Fig. 2. These components highlight the directions of maximum variances in the data set. Mathematically, these principal components are formed through a linear combination of 260 the variables in the original data set [27]. The coefficients in Eq. (1) are used to compute one principal component only. A set of coefficients is required to compute all the principal components. These coefficients are grouped in a data set and the set is commonly 275 referred to as the loading matrix (matrix L with dimensions b×b, and dimensions m×b if only the first m principal components need to be constructed). This loading matrix corresponds to the eigenvectors of the covariance matrix of the original data set (original data set S with dimensions b × q). The covariance matrix is a non-singular matrix and is assumed to be positive semi-definite. This means that the eigenvalues are non-negative. It should be noted that the data set S should be mean centered before the application of PCA.

285
The principal components of the matrix S can therefore be given as where, Y (dimensions b×q and dimensions m×q if only the first m principal components are retained), called the score matrix, combines all the principal components into 290 one single matrix. In the score matrix, the first principal component accounts for most of the variances of the original data set, the second principal component to second most variances and so on. Therefore, in the loading matrix, the eigenvectors are arranged based on a descending 295 order of their eigenvalues (i.e. the first principal component will have the largest eigenvalue while the last principal component will have the smallest eigenvalue).
For a data set of damage sensitive features, temperature is generally considered to be the dominant environ-300 mental effect creating the variations in the features [28,5,29,30,25], while other environmental conditions, and noise and errors occurring during data processing, have minor effects. For example, Desjardin et al. [31] found out that the variations of natural frequencies under constant 305 environmental conditions had a standard deviation of 0.5 % from the mean. This change was attributed to errors and noise occurring in the extraction process of the frequencies. When the structure was subjected to varying environmental conditions, the frequencies varied with a 310 standard deviation of 1.1 % and these frequencies were found to vary linearly with the changing temperature conditions.
Therefore, for a data set of natural frequencies captured under a range of environmental conditions, the first 315 principal component will represent the temperature effect (latent variable) which is creating most of the variations in the data set. The other principal components will represent the other minor effects such as noise and minor environmental conditions (e.g. humidity) affecting the fre-320 quencies. To demonstrate this, consider a data set composed of three observations with two features (e.g first and second natural frequencies of a structure).
where, S i,j represents the i th feature from the j th observation.

325
In this data set, temperature is the main effect defining the values of the features. Other minor environmental effects (e.g. humidity) and errors occurring during the extraction process of the features are then combined together as a single effect and are called noise. Therefore, 330 the features can be obtained as where, T is the temperature condition of the j th observation, β is a coefficient representing the rate of change of the feature due to change in temperature, and 335 N E is the noise term and has a small range of variation.
Since two different features are monitored here, two coefficients (β 1 and β 2 ) are used. All the noise terms (N E 1 , N E 2 , ..., N E 6 ) have different values in the data set.

340
As mentioned previously, the principal components are arranged based on their eigenvalues which represent the amount of variances the components account for in the original data set. Therefore, these principal components are directly related to the eigenvalues and to the factors/latent 345 variables generating the variances in the data set. In this data set, the variables that vary are the temperature conditions and the noise level. Thus, the eigenvalues depend on those variables. Performing an eigenvalue analysis on the covariance matrix of data set S, the eigenvalues can 350 be obtained. Since Eq. (5) is a 2 × 3 matrix with eleven variables, it is difficult to perform the eigenvalue analysis on the covariance matrix of the data set, manually. To give an illustration of the dependency of the eigenvalues on these variables, an example is given here. Consider the 355 example of the data set (Eq. (5)) with the variables given in Table 1.
The plots of the two eigenvalues of the covariance matrix of the data set is shown in Fig. 3 for T 1 ranging between 1 • C to 20 • C. It can be seen that, the larger the 360 temperature range, the larger the largest eigenvalue is and the smaller the smallest eigenvalue is. Thus, it can be concluded that, the eigenvalues and the principal components are mostly dependent upon the temperature range. Since the noise levels are restricted to a small range of variation, 365 they have limited effects on the eigenvalues. For a damage sensitive features data set captured under a range of environmental conditions, the first principal component will represent the temperature condition creating most of the variances in the data set, while the other component will 370 represent the noise effects.
In this example, it was assumed that temperature was the dominant effect, however, if for some structures other environmental condition (e.g. traffic) or a combination of conditions (e.g. temperature and traffic) create the main 375 variation in the features, then the first principal component will represent that condition. Therefore, if the features were generated by different environmental mechanisms, this will be reflected on the first principal component. Moreover, the outlier measurements given by the grey color (due to high noise level) in Fig. 1 above, can be identified in the minor principal components.

Median Absolute Deviation
To detect the presence of outliers, the mean plus/minus three standard deviations method is common practice in 385 univariate statistics. This method is based on the char-acteristic that for a normally distributed data set (including both clean and outlier measurements), 99.7 % of the observations will lie within this range and are assumed to be clean. The remaining 0.3 % is then designated as 390 outliers. However, using the mean and standard deviation poses several problems [32]. For example, although most features captured from civil engineering structures are usually assumed to follow a normal distribution (or almost normal distribution), the presence of outliers may 395 alter this distribution. Also, the mean and standard deviation are strongly affected by the presence of outliers, which may alter the control limits.
An alternative of using the mean and standard deviation is the median. The median is similar to the mean, 400 a measure of central tendency. However, it has the advantage of being insensitive to the presence of outliers [32]. An indication of insensitivity to outliers is the breakdown point [33]. It is the proportion or amount of (large/small) extreme values that must be introduced into a data set 405 to cause the estimate to yield an arbitrarily bad result. The breakdown point for the median is 0.5 which means that only when more than 50 % of the observations are extreme that the median becomes unreliable. For the mean it is 0 [32]. 410 MAD, which involves finding the median of absolute deviations from the median given by Eq. (6)[34], can be used to detect the presence of outlying measurements.
where, A j is the data set of the original observation, 415 med represents median, and cons is a constant and is linked to the assumption of normality of the data, disregarding the abnormality introduced by the outlier measurements. In this paper, it is assumed that data gathered follows a normal distribution, 420 thus the coefficient is 1.4826 [34].
Similar to the method of mean plus/minus three standard deviations, the control limits using the MAD method can be computed as follows  where,

425
U CL med is the upper control limit calculated using the MAD method, LCL med is the lower control limit calculated using the MAD method, and α is a coefficient defining the range of the control limits 430 (usually 2, 2.5 or 3) [32]. To demonstrate the advantage of using MAD over the mean and standard deviation approach, consider a data set (Fig. 4) composed of fifteen observations with values 2, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 70 and 100. Two obser-435 vations (70 and 100) are outliers which represent around 13 % of the data set. Two outliers are chosen because, if the percentage of outliers was very small and their values not extreme, the analysis would not be affected by the outliers. The mean of the data set is 16.2, which is incon-440 sistent with the majority of the observations. The mean plus three standard deviations is calculated as 101.9, and is larger than both extreme observations. The median and the U CL med (α value as 3) are calculated as 6 and 10.4, respectively, which are consistent with the majority of the 445 observations. Using the MAD method, the two extreme values are classified as outliers. Thus, if data gathered from civil engineering structures include outliers, MAD is better suited to compute the control limits. In this paper, MAD will be used to identify outliers in the undamaged 450 data set before the application of damage detection methods.
To the authors knowledge, it is the first time that MAD has been used in the context of damage detection of civil engineering structures under changing environmental and 455 operational conditions.

Gaussian Mixture Model
As mentioned previously, damage sensitive features may be generated under different environmental mechanisms. To take this into account, the features can be clustered 460 into different data sets before the applications of damage detection methods, as was adopted by Kerschen and Golinval [35], Yan et al.
[36] and Kullaa [37]. In this paper GMM is used as the clustering technique. GMM is a probabilistic clustering method which assumes that a data set 465 which is not normally distributed, can be represented by a set of normally distributed components. It is adopted because it uses the mean and covariance of the clusters as basis of clustering. This has the advantage that the relationship between the variables being clustered are taken 470 into account. A brief introduction on GMM is given here.
Consider a multivariate data set X composed of nonlinear data {x 1 , ..., x N } of the likes of natural frequencies captured under bilinear effects of changing temperature conditions from N observations. Nonlinear data are not 475 normally distributed, thus, they cannot be modeled as a single Gaussian standard normal distribution. A mixture of Gaussian components whose distribution can be written as a linear superposition of K Gaussian densities can be assumed to model the data [38].
Each Gaussian component of the mixture given as N (x|µ k , Σ k ) has its own mean and covariance given as µ k and Σ k , respectively. The parameters π k in Eq. (8) are called the mixing coefficients, and they ranged between 0 and 1 (0 ≤ π k ≤ 1) and sum to one. The goal is to maximise 485 Figure 4: Plots of the fifteen observations with control limits calculated using MAD, and mean and standard deviation. *Circle represents clean measurement and dot represents outlier measurement.
the likelihood function given in Eq. (9) with respect to the parameters (µ k , Σ k and π k ) so as to determine the component each data point x n belongs to.
However, these parameters are unknown since it is not known which observation belongs to which component.

490
Thus, these unknown parameters can be estimated using the Expectation-Maximization (EM) algorithm. The EM algorithm is an iterative process which is composed of two steps, namely the expectation (E) step and the maximization (M) step. In the E step, the parameters (initial guess 495 at the beginning) are held fixed and the posterior probability of the component k given the observation x n is evaluated (called responsibilities γ(z nk )) as follows In which z nk is an element of a K-dimensional binary random variable z which has a 1-of-K representation in 500 which a particular element z k is equal to one while all other elements are zero.
Then, in the M step, the parameters are re-estimated using the posterior probability calculated above in the E step as follows where, The log-likelihood given in Eq. (9) can then be evaluated. Convergence of either the parameters or the log likelihood is checked, and if the criteria is not satisfied, the process will iterate using the up to data values until 510 the criteria is met.

Identifying outlier observations approach
The approach of identifying outliers proposed in this paper is described here. It consists of using PCA to transform the original data set into a new coordinate system 515 to highlight the locations of the clean and outlier observations. As mentioned previously, outliers usually surround the clean observations. These observations cluster separately on the principal component axes. Therefore, this paper proposes to first apply PCA on the features data 520 set before the application of damage detection methods.
After the application of PCA, this paper proposes to rank the observations in an ascending order of their principal component scores to arrange the observations based on their locations relative to one another. Since the outlier 525 observations surround the clean observations, these outliers will cluster together at the two ends of the principal component axes while the clean observations will cluster in the middle portion as demonstrated in To separate the clean observations from the outliers, this paper proposes to apply MAD analysis on each set of principal component scores. The analysis creates control limits (U CL med and LCL med ) that can be used to iden-535 tify the extreme outlier observations at the two ends of the principal component axes. The advantage of applying MAD analysis on the principal component scores instead of on the damage sensitive features directly is that, each principal component reduces the dimensions of the fea-540 tures data set to a smaller dimension. Worden et al. [24] mentioned that detecting outliers in a multivariate data set is more difficult than the univariate situation because the outliers may hide in the data mass. Therefore, representing the multidimensional features data set by the 545 principal components makes the analysis more sensitive to identifying outliers. Moreover, as mentioned previously, the noise space is represented by the second and lower principal components, and the extreme outliers (grey color in Fig. 1) usually appear in those spaces. Therefore, these 550 lower principal component axes can be used to locate these observations. It is important to clean those spaces since Kullaa [39] mentioned that damage is usually detected in the noise space.
As demonstrated in the previous section, the first prin-555 cipal component highlights the environmental conditions creating the main variance in the features data set. This  component can be used to determine whether the data set is composed of features generated under the same or different environmental mechanisms. Generally, data cap-560 tured from civil engineering structures are assumed to follow a normal distribution. This is assuming that only one mechanism is affecting the features. However, if different mechanisms are present, then each group of features will have its own normal distribution, and will form clusters 565 in the data set. To determine the number of mechanisms generating the features, this paper proposes to plot the arranged scores of each principal component against observation number. On the plot, the observations generated under the same mechanism will cluster together and will 570 have their own pattern as shown in Fig. 6(a). Since the first principal component represents the dominating environmental conditions affecting the features, the different environmental mechanisms will appear on the first principal component plot. However, if the different mechanisms 575 also affect the noise space, this will also be reflected on the minor principal component plots. For this case, the noise space is not affected by different mechanisms as the second principal component plot ( Fig. 6(b)) has only one main cluster of observations. By analysing the principal 580 component plots, an indication on the model (linear or nonlinear) to adopt for damage detection can be obtained. It should be noted that this approach points out the observations affected by different environmental effects only.
No indication on the effects themselves is given.

585
The identified extreme outlier measurements can be removed from the features data set to obtain a clean database to be used for damage detection. The model (linear or nonlinear) to use to create the baseline can be decided and damage detection methods can then be applied on 590 the new database free of outliers to detect damage at an earlier stage.
To take into account nonlinear effects of environmental conditions, two approaches are usually adopted in the literature; either using nonlinear analysis tools [40] or clus-595 tering the features into different linear data sets before the application of damage detection methods [35,36,37]. In this paper, the latter is adopted, and GMM is used as the clustering tool. GMM is applied on the principal component scores highlighting the different mechanisms 600 and the dominating environmental effects affecting the features. The different clusters then represent the different groups of observations affected by different environmental mechanisms. Damage detection methods can then be applied on each group of observations.

605
To summarise, this paper proposes to apply PCA on the damage sensitive features data set first. It is then proposed to rank the observations based on an ascending order of their principal components scores. To identify whether the features data set is composed of features generated un-610 der different mechanisms, this paper proposes to plot the arranged scores of each principal component versus observation number. Analysing the plots, the different groups of observations (generated by different environmental effects) can be identified. A decision can then be made on the type of model (linear or nonlinear) to adopt for damage detection. In this paper, it is proposed to use GMM to cluster the principal component scores with the environmental conditions affecting the data set so as to group the observations with the same environmental effect, to-620 gether. It is then proposed to apply MAD analysis on the arranged principal component scores. This will separate the clean observations from the outliers, and these unwanted measurements can then be discarded from the database. Damage detection methods can then be applied 625 using the data set free of outliers. A flowchart on the procedures to follow to identify outlier measurements is given in Fig. 7 6. Apply MAD analysis to identify the extreme outlier measurements and discard them.
7. The new clean data sets of damage sensitive features can be used to construct the baseline/model of the undamaged structure using damage detection methods

Case studies
To illustrate the proposed approach of identifying out-630 lier measurements to allow detection of damage at an earlier stage, two case studies are examined in this section. The first is a numerical beam model which is subjected to changing temperature conditions and to varying mass distribution. The second is a real-life bridge structure, the 635 Z24 Bridge, in Switzerland, which was subjected to complicated environmental conditions.

Beam structure model
The beam structure model under consideration is presented in Fig. 8. The structure is 10 m long and consists 640 of ten beam elements of 1 m each. The cross-sectional area and second moment of area of the structure are 0.08 m 2 and 0.0006 m 4 , respectively. The Young's modulus of the material is assumed to be temperature dependent. The relationship between the Young's modulus and temperature 645 is assumed to be bilinear (Fig. 9) and is the same as the one proposed by Kullaa[41]. It should be noted that in reality, such a Young's modulus-temperature relationship does not exist. It is adopted here to simulate the common bilinear relationship between natural frequencies and tem-    the other 420 observations obtained at temperature conditions between -20 • C and 0 • C. For this first set, the density of each element varies by ± 10% from the original density of the material. This variation in mass may rep-665 resent traffic loading or pedestrian loading on a structure. This creates a variation of the natural frequencies from their ideal values. The second data set has 610 observations (400 from 0 • C to 40 • C and 210 from -20 • C to 0 • C) with a ± 25% variation in density. This second data set is 670 included in the database as a set of outlier measurements. These outlier measurements may move the mean towards them and will increase the standard deviation of the undamaged database. It should be noted that although the second data set consists of half of the measurements of the 675 first data set, not all the measurements will act as outliers; some measurements will cluster together with the first data set. This number of observations is chosen because if the number of outliers is insignificant when compared to the clean measurements, it will not affect the mean and stan-680 dard deviation.
Five damaged cases with increasing severity are applied to the structure where damage is simulated as a reduction in elemental stiffness of the 4 th element. The reduction severities of the five cases are 25 %, 30 %, 35 %, 40 % and 685 45 %, respectively. These damage severities are chosen because the variations in density (± 10%) prevent smaller damage severities to be identified. This is because the change in frequencies from the ideal values (no effect from change in density or damage, only effect from temperature) 690 due to the variations in density is larger than those due to damage. Since the purpose of this case study is to demonstrate that damage can be detected at an earlier stage using the proposed approach, these severities are not that important because the case study can demonstrate that 695 damage can be identified at an earlier stage when the proposed approach is applied when compared to the normal situation. Data from each damaged case is assumed to be obtained 200 times with temperature conditions ranging between 0 • C and 40 • C, and 100 times for conditions be-700 tween -20 • C and 0 • C. Moreover, each element is assumed to have a ± 10 % variation in density.
The plot of temperature conditions versus the four natural frequencies of the undamaged (black) and damaged (red) observations are presented in Fig. 10. Only the dam-  These outliers are the undamaged measurements that are far away from the majority of the undamaged observations. Therefore, these outliers need to be identified and removed from the database before applying damage detection methods.

725
PCA is applied to all the undamaged observations, and the observations are arranged based on an ascending order of their principal component scores. The plots of principal component scores versus observation number are given in Fig. 11. In the first principal component plot (Fig. 11(a)), 730 two different regions exist which are attributed to two different environmental mechanisms the structure experienced. These two regions represent the observations ob-tained at temperatures below and above 0 • C. Therefore, it is important to separate the observations into two data 735 sets to represent the different conditions, or to use nonlinear data processing techniques while applying damage detection methods. In this paper, the former is adopted and GMM is used to cluster the frequencies into linear regions. GMM is applied on the first principal component 740 plotted against temperature conditions. The clustering is shown in Fig. 12 where the two regions of temperature (blue for temperatures below 0 • C and red for temperatures above 0 • C) can be seen.
In the principal component plots (Fig. 11), the outlier 745 measurements can also be seen at the two ends of each plot. It can be seen that at the two ends, the rate of change of principal component scores is high. This is because outlier measurements usually have extreme values of damage sensitive features with large change between them. Therefore, 750 large change between these observations also occurs with the principal component scores.
To separate the clean measurements from the bad ones, MAD analysis is applied to each set of principal component scores. A coefficient of 2 (α in Eq. (7)) is used for the 755 control limits in the MAD analysis. The coefficient is taken as 2 to represent the confidence that around 95 % of the observations will lie within the control limits. 2 is chosen instead of the other coefficients so as to make the con- trol limits to have a smaller range to identify more outlier 760 observations. A new database is created after the outlier measurements have been identified and discarded from the original database. From now on, this new database will be referred to as the clean database.
The plots of temperature versus the first four natural 765 frequencies of the beam sturcture model for the identified outlier undamaged observations and clean undamaged observations are also given in this paper in Fig. 13. The damaged cases (temperature conditions of 10 • C to 20 • C, and -5 • C and -15 • C only) are also included in the figure.

770
It can be seen that most of the outliers given in purple colour surround the clean measurements given in black. Moreover, it can be seen that some of the outlier measurements are mixed together with the clean observations. These measurements cluster together with the clean ob-775 servations in some of the plots while in other plots, they are separated from the clean observations and act as extreme value outliers. In the plots it can also be seen that the damaged cases lie in the space of the identified outlier observations. As mentioned previously, it is important 780 to clean that space to allow damage to be detected at an earlier stage.
To test whether eliminating the outlier measurements improves the sensitivity of damage detection methods, the linear regression damage detection method is used. This 785 linear regression method creates a model for natural frequency as a function of temperature conditions. Natural frequencies during future monitoring is predicted using the measured temperature conditions. The residual error between the predicted value and the one obtained from the 790 structure given in Eq. (15) can be used as a deviation index.
where, δf is the residual error between the predicted and the real value of the natural frequency and is used as a devia-795 tion index in this paper, f p is the predicted value of the natural frequency, and f o is the original value of the natural frequency. An outlier analysis can be performed on this deviation index to classify whether the structure is damaged or not.

800
Since damage detection methods proposed in the literature usually use the mean and standard deviation/covariance matrix to compute the control limits, the mean plus/minus three standard deviations method (Eq. (16)) is adopted here.
where, U CL is the upper control limit calculated using the mean and standard deviation, LCL is the lower control limit calculated using the mean and standard deviation, 810 δf is the mean of the residual error δf , and σ is the standard deviation of the residual error δf . The damage detection method is applied using both the clean database and the original database. Damaged observations with temperature conditions below and above 0 815 • C are analysed separately using their respective database (after clustering was applied to separate the features into two data sets for conditions below and above 0 • C). To show samples of the results of the outlier analysis, the five damaged cases with temperature conditions between 0 • C 820 and 40 • C are presented in Fig. 14. It should be noted that similar results are obtained for those with temperature conditions below 0 • C.
In Fig. 14, the plots on the left are those computed with the clean database, while those on the right are from 825 the original database. More damaged observations are outside the control limits when using the clean database. The results demonstrate that damage can be detected at an earlier stage when the clean database is adopted. The mean of the residual errors for each case (each damaged 830 severity) is also given in the figure through the bold horizontal line. It can be seen that, for the first natural frequency, the mean starts to move outside the control limit from the third case for the clean database, while for the original database, it is outside only for the last case.

835
Similar results are obtained for the second and fourth frequencies. For the third frequency, since the undamaged and damaged observations are mixed together, the mean is within the control limits. It can also be seen that for all the frequencies, the range of variation represented by 840 the control limits is smaller for the clean database than the original database. This smaller range enables smaller damage scenarios with small deviations from the healthy state of the structure to be identified. Therefore, it is important to remove outlier measurements before applying 845 damage detection methods so that damage can be alerted at an earlier stage.

Z24 Bridge
The Z24 Bridge (Fig. 15), a post-tensioned concrete box girder bridge, was located in Switzerland connecting 850 Koppigen and Utzenstorf and overpassing the A1 highway. It was a three spans bridge with a main span of 30 m and two side spans of 14 m each. It was monitored for almost a year to collect different environmental parameters as well as acceleration measurements. The acceleration 855 measurements were recorded for almost every hour and an automatic system identification system was in place to derive the modal parameters of the bridge. The bridge was gradually damaged near the end of the monitoring period to create a benchmark structure for structural health mon-860 itoring. The damaged cases that the bridge was subjected with are presented in Table 2 and a detailed description of the cases can be found in Kramer et al. [42].
The first four natural frequencies of the bridge are used as damage sensitive features in this paper. A bilinear re-865 lationship between the four natural frequencies of the undamaged cases and ambient temperature conditions can be found as shown in Fig. 16. Peeters and De Roeck [5] suggested that the bilinear relationship was attributed to the asphalt layer which, at temperatures below 0 • C, con-870 tributed to the increase in stiffness of the structure, while at warmer temperatures, it had less influence.
The damaged cases are also included in Fig. 16. It can be seen that for the first, third and fourth natural frequencies, the damaged cases lie mostly in the space containing 875 the outlier observations (undamaged observations that are far away from the majority of the undamaged measurements). Therefore, as mentioned previously, it is important to clean that space so that smaller damaged cases can be detected using damage detection methods. For the sec-880 ond natural frequency, the undamaged and damaged cases are separated from each other.
The first 4000 undamaged observations are used as the undamaged database used to create the baseline, while the rest of the observations are used for testing. It should be 885 noted that the natural frequencies of some observations were not extracted, hence, these observations cannot be analysed.
PCA is first applied to the undamaged database used to create the baseline. Fig. 17 gives the plots of the princi-   Rupture of 2 tendons out of 16 16 Rupture of 4 tendons out of 16 17 Rupture of 6 tendons out of 16 the structure at temperature conditions below and above 0 • C. GMM is applied on the first principal component 895 scores and temperature for clustering (Fig. 18). The outlier measurements can also be seen at the two ends of each principal component plot. A MAD analysis is conducted on the principal component scores to identify these outlier measurements and these outliers are removed from the 900 database. The regression damage detection method is applied with both clean and original database. All the undamaged cases (those not included in the database used to construct the baseline) and the damaged cases are analysed. Since 905 the temperature conditions of these observations are above 0 • C, only the features of the database (after clustering) with conditions above 0 • C are used to compute the control limits. The results of the outlier analysis are presented in Fig. 19. The successful rates of alerting damage when 910 the structure was damaged and alerting not damage when the structure was in a healthy condition using both clean and original database are also given in Table 3.
For all four natural frequencies, more damaged observations are alerted as damage using the clean database 915 with an improvement of up to 19 % when compared to the original database. This indicates that damage can be detected at an earlier stage using the clean database. The results obtained using both database are not close to 100 % because of the damage detection method used. If more 920 robust methods are applied, better results are expected. However, the main purpose of this paper is to propose an approach of cleaning the undamaged database to improve the sensitivity of damage detection methods. Therefore, the simple linear regression method is deemed adequate 925 for demonstration purposes.
The successful rate of alerting undamaged when the structure was in healthy condition is slightly better using the original database. It should be noted that, although more undamaged observations are alerted as damage us-930 ing the clean database, these observations are spread and are not continuous as can be seen in Fig. 19. Therefore, a threshold can be set to indicate the number of continuous measurements outside the control limits, to raise the damage alert. This will prevent classifying undamaged 935 measurements with high noise level as damage.

Conclusion
A wide range of damage detection methods have been proposed in the literature. However, most of these methods create the baseline of the undamaged structure with 940 outlier measurements present in the training database. Even though damage can be identified, these outlier measurements prevent the methods from alerting small damage. Therefore, an approach is proposed in this paper to identify these outliers to allow a clean baseline to be con-945 structed. The proposed approach makes use of Principal Component Analysis and Median Absolute Deviation to highlight the outlier measurements. As per knowledge of the authors, it is the first time that Median Absolute Deviation is used in the context of damage detection of civil 950 engineering structures under changing environmental and operational conditions. A beam structure model and the Z24 Bridge are analysed using the proposed approach and the results demonstrate that damage can be detected at an earlier stage. The results obtained also highlight the 955 importance of identifying outliers before the application of damage detection methods, because a defective model of the undamaged structure can be created if these outliers are not taken into account.