Break Date Estimation for Models with Deterministic Structural Change

In this article, we consider estimating the timing of a break in level and/or trend when the order of integration and autocorrelation properties of the data are unknown. For stationary innovations, break point estimation is commonly performed by minimizing the sum of squared residuals across all candidate break points, using a regression of the levels of the series on the assumed deterministic components. For unit root processes, the obvious modification is to use a first differenced version of the regression, while a further alternative in a stationary autoregressive setting is to consider a GLS‐type quasi‐differenced regression. Given uncertainty over which of these approaches to adopt in practice, we develop a hybrid break fraction estimator that selects from the levels‐based estimator, the first‐difference‐based estimator, and a range of quasi‐difference‐based estimators, according to which achieves the global minimum sum of squared residuals. We establish the asymptotic properties of the estimators considered, and compare their performance in practically relevant sample sizes using simulation. We find that the new hybrid estimator has desirable asymptotic properties and performs very well in finite samples, providing a reliable approach to break date estimation without requiring decisions to be made regarding the autocorrelation properties of the data.


I. Introduction
The recent literature is replete with analysis focusing on structural change in the trend function of a time series, motivated by the apparent prevalence of breaks in level and/or trend in macroeconomic time series; e.g. Stock and Watson (1996, 1999 and Perron and Zhu (2005). Correct specification of a break in the deterministic trend path of a series is vital for modelling, estimation and forecasting efforts, and is crucial for achieving reliable unit root test inference (see, inter alia, Perron, 1989). Given that in most macroeconomic series, uncertainty also exists as to whether the underlying stochastic component is best modelled by a stationary (I(0)) or unit root (I(1)) process, much recent work (e.g. Vogelsang, 1998;Taylor, 2009, 2010;Perron and Yabu, 2009; Sayginsoy and JEL Classification numbers: C22 Vogelsang, 2011) has been directed at testing for the presence of structural break(s) when the true order of integration of the series is assumed unknown.
Of equal importance to the presence of a break in level and/or trend in a series is the related issue of the timing of the change, and it is the estimation of such break-points that this article addresses. While a number of methods of break date estimation have been proposed in the literature, selection of an efficient break fraction estimator is complicated by the aforementioned fact that the order of integration is typically not known with certainty. In the context of stationary innovations, Bai (1994) and Bai and Perron (1998), inter alia, consider choosing the break date which corresponds to minimizing the sum of squared residuals, across all candidate break points, from a regression of the level of the series on the appropriate deterministic regressors. In a unit root setting, a more efficient approach is obtained by minimizing the sum of squared residuals from a first-differenced version of the relevant regression; see Harris et al. (2009). A further alternative, adopted by Carrioni-Silvestre, Kim and Perron (2009) in an assumed local-to-unity setting, is to again date the break according to the minimum residual sum of squares, but using a quasi-differenced regression. Practitioners are then faced with choosing between a number of candidate break fraction estimators, inevitably without knowledge of the underlying integration properties of the series.
In this article, we focus on developing a minimum sum of squared residuals-based break fraction estimator that performs well across unit root and stationary processes, and in the stationary case, across a range of serial correlation structures. In common with recent literature on this topic, for example, Perron and Zhu (2005) and Yang (2012), we view our analysis as complementary to methods of break detection, for two reasons. First, many testing procedures explicitly require an estimated break date, and the power of such break detection tests is inherently limited by the accuracy of the dating procedure. An accurate dating procedure is what this article provides, hence our proposed estimator could feed into a number of break detection methods. Second, even for break detection procedures that do not require an a priori break date estimator (e.g. the exp-Wald statistic proposed by Perron andYabu (2009)), interest still lies in the timing of the break should one be detected, therefore our proposed procedure is equally relevant there. The relevance also extends to unit root testing in the presence of a break at an unknown time.
The outline of the article is as follows. In section II, we begin by establishing, using a local-to-zero break magnitude assumption, the asymptotic properties of break fraction estimators based on both quasi-differenced (which includes levels as a special case) and first-differenced regressions, and confirm that the former is to be preferred for a stationary series, and in general the latter for a unit root process. In section III, we then develop a hybrid estimator which selects between the first-differenced-based estimator and a range of quasi-differenced-based estimators according to which achieves the global minimum sum of squared residuals. The large sample behaviour of the new estimator is also established. Section IV demonstrates through a finite sample Monte Carlo analysis that the hybrid estimator performs extremely well across a wide range of possible DGPs, outperforming established break fraction estimators (which perform badly outside of their respective assumptions regarding the integration order of the data). The hybrid estimator is simple to compute, and is found to comprise a reliable approach to break date estimation without requiring a priori decisions to be made regarding the autocorrelation properties of the data. Section V concludes. Proofs of the main results are provided in the online Appendix S1 to this article.
In the following, ' . ' denotes the integer part of its argument, '⇒' denotes weak convergence, and 1(.) denotes the indicator function.

II. The model and standard break date estimators
We consider the following model allowing for a break in level and trend with u 1 = " 1 , where DU t ( * ) = 1(t > * T ) and DT t ( * ) = 1(t > * T )(t − * T ) with * T the break point with associated break fraction * and level and trend break magnitudes 1 and 2 respectively. Here, * is unknown but satisfies * ∈ , where = [ L , U ] with 0 < L < U < 1. To make our theoretical developments as transparent as possible, we assume that the innovation process {" t } of equation (2) is an IID sequence with variance 2 " and finite fourth moment. The partial sum process of {" t } then satisfies a functional central limit theorem [FCLT], where W (r) is a standard Brownian motion process on [0, 1].
We consider two cases for the order of integration of the autoregressive process, u t . The I(0) case for u t is represented by setting | | < 1 in equation (2). In this situation, the long run variance of u t is given by 2 u = 2 " / (1 − ) 2 . Here, we will also assume that 1 = 1,T = 1 T −1/ 2 and 2 = 2,T = 2 T −3/ 2 . The T −1/ 2 and T −3/ 2 scalings provide the appropriate Pitman drifts for the asymptotic analysis of break date estimators in this case. The I(1) case for u t is represented by setting = 1 in equation (2). Here, we assume 1 = 1 and 2 = 2,T = 2 T −1/ 2 which are now the appropriate scalings. 1 For future brevity, the two cases are summarized as follows: Assumption I(0): | | < 1 with 1 = 1,T = 1 T −1/ 2 and 2 = 2,T = 2 T −3/ 2 . Assumption I(1): = 1 with 1 = 1 and 2 = 2,T = 2 T −1/ 2 . We consider estimating * by minimizing the residual sum of squares from a quasidifferenced version of equation (1), that is, where S(¯ , ) denotes the residual sum of squares from an OLS regression of y¯ on Z¯ , with y¯ = [y 1 , y 2 −¯ y 1 ,…, y T −¯ y T −1 ] , If were known, standard GLS-based efficiency considerations would lead us to set = . For example, if = 0, we would obtainˆ 0 from the levels of y t regressions aŝ while if = 1, we would obtainˆ 1 from the first differences of y t regressions aŝ In practice of course, the value of is typically unknown, so we begin by establishing the asymptotic behaviour of different estimators under both I(0) and I(1) specifications. To this end, the next two theorems detail the large sample properties ofˆ ¯ for an arbitrary¯ where −1 <¯ 1.

Remark 2
The corresponding limit equation (4) forˆ ¯ with¯ = 1 can only be written in an implicit form, because it only depends on the two disturbances u T +1 and u T hence no FCLT is applicable. The more pertinent feature here, however, is that equation (5) does not involve * . Hence,ˆ 1 can never be considered an effective estimator of * under Assumption I(0).
Figure 1 provides histograms of the limit distribution L 0 ( 1 , 2 , * ) for various combinations of non-zero values of 1 and 2 for the case of * = 0.5. Here, we set = 0 and " = 1, such that u = 1 and hence i = i . We approximate the limit functionals by normalized sums of 1,000 steps using normal IID(0, 1) random variates. In the simulations here and in the remainder of the article we set = [0.15, 0.85] and employ 10,000 Monte Carlo replications. All simulations were programmed in Gauss 9.0. The results are largely as we would expect; accuracy ofˆ ¯ , |¯ | < 1 as an estimator of * , measured subjectively, improves with increasing 1 and/or 2 . When 1 is zero,ˆ ¯ , |¯ | < 1 has a bimodal and (near) symmetric distribution around * = 0.5, and when neither 1 or 2 are zero the distribution is bimodal but not symmetric; both these properties are consistent with the results documented by Perron and Zhu (2005) under an assumption of fixed magnitude (as opposed to local-to-zero) breaks. 3 Theorem 2 Under Assumption I(1), where 2 = 2 / " and J( 2 , * , ) = 2 G 21 ( * , ) + 1 W (r)dr 1 0 W (r)dr 2 (1 − * ) 2 (2 + * )/ 6 + 1 0 rW (r)dr with B 1 ( ), B 2 ( ), G 21 ( * , ) and G 22 ( * , ) as defined in Theorem 1, Remark 3 It is shown by equation (5) thatˆ ¯ has a limit distribution which is invariant to all |¯ | < 1. The more pertinent feature here, however, is that equation (5) does not involve Figure 2 provides histograms of the limit distribution L 1 ( 1 , 2 , * ) for various non-zero values of 1 and 2 when * = 0.5. Again " = 1, so that i = i . Once more, we observe the accuracy ofˆ 1 improving with increasing 1 and/or 2 . Figure 3 gives the corresponding histograms forˆ ¯ , |¯ | < 1; note that Figure 3a,b are identical, since 2 = 0 here and the limit does not depend on 1 (see Remark 3). For non-zero trend break magnitudes,ˆ ¯ , |¯ | < 1 detects the break with increasing accuracy as 2 rises, but a comparison with the corresponding histograms forˆ 1 in Figure 2 shows that while it is competitive for 2 = 5 (although neither estimator could be considered anyway decent here), it is clearly inferior toˆ 1 for 2 = 15. In related work, Yang (2012) considers the relative performance of levelsand first-differenced-based estimators for a model with unit root errors and a local break in trend only, showing that the levels estimator can outperform the first-differenced estimator for very small breaks. However, for such small break magnitudes in this region, both break point estimators display very poor accuracy (cf. Figures 2c and 3c), so the relative differences here are of limited practical importance.

III A hybrid break date estimator
The above asymptotic results suggest, fairly unambiguously, that in constructingˆ ¯ , we should choose some |¯ | < 1 if | | < 1, and choose¯ = 1 if = 1. This follows sinceˆ 1 cannot be considered a suitable estimator of the break fraction under Assumption I(0), and ¯ , |¯ | < 1 is effectively outperformed byˆ 1 under Assumption I(1). However, given that we consider the true value of to be unknown in practice, we now consider developing a hybrid break fraction estimator that selects between theˆ ¯ , |¯ | < 1 andˆ 1 possibilities depending on the sample's properties. To begin, if we consider just two possible values for :¯ = where | | < 1, and¯ = 1, that is,ˆ andˆ 1 are the only possible estimators of * , then we might consider selecting betweenˆ orˆ 1 according to which corresponds to the lowest residual sum of squares, that is, choosê .
Another way of writing this is to define the hybrid estimator To examine the asymptotic behaviour of this hybrid estimatorˆ D 2 , we first establish the limiting properties of S(¯ , ) under Assumption I(0) and Assumption I(1).
If we first consider behaviour under Assumption I(1), where = 1, it follows from Theorem 4 that T −2 S( , ) converges to a distribution while T −1 S(1, ) converges to 2 " for all . Asymptotically then, min ∈ S( , ) > min ∈ S(1, ) > 0. Next, under Assumption I(0), where | | < 1, Theorem 3 implies that for all . Since 1 − > 0 here, it follows that, asymptotically, Consequently, we find thatˆ D 2 =ˆ 1 asymptotically if = 1 which is as we would desire. For | | < 1, (7) shows us thatˆ D 2 =ˆ , the desired outcome, unless > and is closer to 1 than it is to , in which caseˆ D 2 =ˆ 1 which is the ineffective estimator in the I(0) case. By way of an example, suppose | | < 1 and we set = 0 (so thatˆ D 2 selects between the levels-and the first differences-based estimatorsˆ 0 andˆ 1 ); we find thatˆ D 2 =ˆ 1 (the ineffective estimator) in the region > 0.5. If on the other hand we choose = 0.9, we find thatˆ D 2 =ˆ 1 now only in the region > 0.95. Purely asymptotic considerations would therefore indicate that the problem region associated with | | < 1 can be made arbitrarily small by setting = 1 − , where > 0 is made arbitrarily close to 0, thereby reducing the problem region to > 1 − / 2. Notwithstanding our asymptotic results under Assumption I(0), in finite samples the choice of will have a significant influence of the behaviour ofˆ D 2 even when the condition < ( + 1)/ 2 is satisfied. Therefore, from a finite sample (i.e. empirical) perspective the idea of setting = 1 − is unlikely to prove an attractive proposition unless is actually very close to 1, despite its asymptotic appeal.
As noted above, efficiency considerations suggest we should (infeasibly) always set = . As a step in this direction, we consider generalizing the hybrid estimator by at least allowing¯ to cover a subset of possible values for , replacing the two element set D 2 with the m element set D m = { 1 , 2 ,…, m−1 , 1} where | i | < 1 for all i and, without loss of generality, −1 < 1 < 2 < · · · < m−1 < 1. Therefore, we now consider the following hybrid (pseudo GLS) estimatorˆ Note thatˆ D m could equivalently be defined as choosing one ofˆ 1 ,ˆ 2 ,…,ˆ m−1 ,ˆ 1 according to which of these estimators achieves the smallest residual sum of squares.
The next Corollary is a useful initial step in explaining the limit behaviour ofˆ D m .
Corollary 5 Part (ii) follows directly from Theorem 4, since T −1 S(¯ , ) diverges to +∞ in T unless = 1. Part (i) follows from Theorem 3 since we have arg min Notice that this condition coincides with that in the second part of (10) (on replacing with m−1 ). Intuitively, in the limit, arg min¯ ∈D m S(¯ , ) is always the element of D m closest to the true value of .
The asymptotic behaviour ofˆ D m can now be established under both Assumption I(0) and Assumption I(1).
We find, therefore, that under Assumption I(1), the hybrid estimatorˆ D m has the same limit behaviour asˆ 1 , as we would wish, while under Assumption I(0), it has the same asymptotic properties asˆ ¯ , |¯ | < 1, again as desired, unless happens to lie in the problem region > ( m−1 + 1)/ 2. The hybrid estimator therefore has a clear asymptotic appeal, particularly if one makes the judicious choice of setting m−1 very close to 1. In the next section, we examine the behaviour ofˆ D m in samples of practically relevant size.
Finally, our asymptotic results imply thatˆ D m has the same asymptotic properties as the following sequential break fraction estimator: in the first step, minimize S(¯ , ) across D m for any single value of ∈ , in the second step, minimize S(¯ , ) across imposing the value of¯ obtained from the first step. Such a sequential approach, while asymptotically valid, is likely to perform rather poorly in finite samples, since it is entirely possible that two different choices for in the initial minimization will lead to two different¯ and, 6 Notice that if it happens to be the case that ∈ D m then arg min¯ ∈D m |¯ − | = (at which point the limit of consequently, two different break fraction estimates. We therefore do not advocate use of this two step approach, instead recommendingˆ D m as defined in equation (8).
A choice must be made regarding D m . Here, we set D m = {0, 0.2, 0.4, 0.6, 0.8, 0.9, 0.95, 0.975, 1}. This choice is motivated by two empirical observations regarding economic time series. The first is that serial correlation is not usually found to be negative, so that we exclude negative values of¯ . The second is that the serial correlation is often found to be very strongly positive (as exemplified by the ongoing I(0)/I(1) debate), so we include some large values of¯ < 1 as well as¯ = 1; moreover, inclusion of the value 0.975 confines the problem region discussed above to the small interval region 0.9875 < < 1.
For further comparison, we also examine an AR(1)-based estimator of * . This is calculated from minimizing the residual sum of squares from the fitted OLS regression across . Here, the one-time dummy variable D t ( ) is included to identify a level break in the I(1) case (corresponding here to = 1) . We denote this estimator asˆ AR .
With four different break fraction estimators, sixteen combinations of break magnitude and two sample sizes, it is not practical to show full histograms across different values of . Instead, in Tables 1-5, we simply provide the empirical probability that each estimator lies in the range * ± 0.010, * ± 0.025 and * ± 0.050. Other things equal, the larger these probabilities, then the better the estimator. Table 1 concerns the case of I(0), white noise errors, a situation whereˆ 0 represents the optimal estimator. What is immediately apparent is thatˆ 1 is by some considerable margin the poorest performing estimator across all 1 and 2 . When T = 150 it does have some ability to detect the larger breaks, but even then remains much inferior to the other three; for T = 300 its performance levels for non-zero 1 and 2 are similar to the no-break reference case, in line with the result of Theorem 1 (ii) which showed thatˆ 1 is asymptotically ineffective in this setting. We also see that, on balance,ˆ AR does not perform as well as eitherˆ 0 orˆ D m . It is competitive when there is a pure trend break, but loses out everywhere else. The estimatorsˆ 0 andˆ D m show almost identical behaviour everywhere, highlighting the attractive performance of the hybrid estimator in this white noise case.
In Table 2, the errors are I(0), AR(1) with = 0.5. The comparative behaviour of the estimators remains pretty much the same as seen in Table 1, only with the differences between worst and best being slightly less emphasized. Table 3 considers the case of I(0), AR(1) errors with = 0.925. Here, the stronger autoregressive component begins to diminish the ability of the estimators to identify the true break point, other thanˆ 1 , which improves relative to the = 0.5 case. However, rather encouragingly, it is the hybrid estimatorˆ D m that shows the best performance overall. Table 4 represents a problem case of I(0) but near I(1) errors, with = 0.994, this value of being chosen to lie in the middle of the asymptotic problem region forˆ D m . For T = 150, perhaps not surprisingly, given the strength of the autoregressive component,ˆ 1 is generally the best performing estimator here, although againˆ D m performs very well and is a close competitor toˆ 1 , comfortably out-performingˆ AR andˆ 0 . When T = 300, although the probabilities associated with all the procedures are lower, it could be argued thatˆ 1 is still the best performing estimator (despite its asymptotic shortcomings); once again,ˆ D m behaves very similarly toˆ 1 here, so it appears that the asymptotic problem region associated withˆ D m is unlikely to be of much concern in any practical setting. Table 5 shows the results for I(1), random walk errors, whereˆ 1 now assumes the role of the optimal estimator. The worst performing estimator is nowˆ 0 , except for the pure trend break case, whereˆ AR performs most poorly. Bothˆ 1 andˆ D m are very clearly better performing estimators and behave very similarly everywhere. The results for T = 300 also show that the probabilities associated withˆ 0 are largely insensitive to 1 , and close to the no-break reference case when 2 = 0, which accords with our asymptotic results in Theorem 2 (i).
Results for additional values of 0 < < 1 are reported in Harvey and Leybourne (2013). These further highlight the attractive properties ofˆ D m discussed above; in particular, we find that as the value of increases further towards unity, results forˆ D m become increasingly similar to those forˆ 1 , with both these estimators outperformingˆ 0 andˆ AR . This discussion article also presents results for where " t follows an MA(1) process. In summary, when the errors are ARMA(0,1), the behaviour of the estimators is actually qualitatively similar to that in Table 1, hence similar comments made above for this case apply here also. When the errors are ARIMA(0,1,1), the rankings ofˆ 0 ,ˆ 1 andˆ AR appear quite highly dependent on the particular 1 , 2 settings; what is clear throughout, however, is thatˆ D m either performs the best, or if not, then virtually always as well as the highest ranking of the other three estimators.
Finally, we also experimented with a finer grid of values for D m , re-running all the finite sample simulations with D m = {0, 0.01, 0.02,…, 0.99, 1}. We found the results to be almost identical to those obtained using our recommended grid, with the probabilities being within ±0.01 of the values reported in the tables.

V. Conclusion
In summary, we have considered the asymptotic and finite sample performance of a number of minimum sum of squared residuals-based break fraction estimators. We first considered the asymptotic performance of estimators based on a levels or quasi-differenced regression of y t on the relevant deterministic components, and also an estimator based on a first differenced version of the regression. It was found that the levels/quasi-differenced approach performed well under an assumption of I(0) errors, while the first differencedbased estimator was inappropriate in this context. Essentially the reverse was observed under I(1) errors, with the first differenced approach now preferred. Given this inherent lack of robustness in the performance of the estimators across I(0) and I(1) environments, we proposed a hybrid estimator,ˆ D m , which selects between the first differenced estimator and a number of quasi-differenced alternatives according to which achieves the smallest minimum sum of squared residuals.
This new procedure was found to achieve most of the desirable properties of the appropriate estimators for the stationary and unit root worlds, without the inherent downsides involved in selecting purely one approach. This finding was also shown to carry over to sample sizes of practical relevance, with the hybrid estimator always competitive with the better of the levels-and first differenced-based estimators across a range of I(0) and I(1) data generating processes. Indeed, the qualitative behaviour ofˆ D m would extend to more general dynamic processes, since the autoregressive filtering inherent inˆ D m is only ever intended to remove the dominant autoregressive behaviour present in a series; there is no need to whiten the series entirely.
An alternative approach to constructingˆ D m using a discrete number of quasi-difference parameters in D m would be to use all values in the continuous set (−1, 1], that is, modify the pseduo GLS hybrid estimator (11) to instead minimize S(¯ , ) over ∈ and¯ ∈ (−1, 1]. Such an approach would represent a considerable increase in the computational burden of the procedure, due to the requirement for numerical Newton-type minimization methods. Moreover, marginal changes in¯ have an almost negligible effect on the resulting quasidifferenced break fraction estimator, and so implementingˆ D m using a reasonably fine set of discrete values for D m (as in our simulations) is entirely sufficient.
In practical applications, where knowledge of the integration order of the stochastic component of a series cannot typically be taken as known, it is desirable to have available a break fraction estimator that works well without having to take a potentially incorrect and therefore costly stand on the data's order of integration. We consider that the hybrid estimatorˆ D m proposed in this article goes a long way in fulfilling this role, and should therefore have practical appeal; moreover, extension of the hybrid procedure to the case of estimating the timing of multiple breaks is entirely straightforward.