Testing for strict stationarity in a random coefficient autoregressive model

We propose a procedure to decide between the null hypothesis of (strict) stationarity and the alternative of non-stationarity, in the context of a Random Coefficient AutoRegression (RCAR). The procedure is based on randomising a diagnostic which diverges to positive infinity under the null, and drifts to zero under the alternative. Thence, we propose a randomised test which can be used directly and - building on it - a decision rule to discern between the null and the alternative. The procedure can be applied under very general circumstances: albeit developed for an RCAR model, it can be used in the case of a standard AR(1) model, without requiring any modifications or prior knowledge. Also, the test works (again with no modification or prior knowledge being required) in the presence of infinite variance, and in general requires minimal assumptions on the existence of moments.


Introduction
In this article, we propose a procedure to decide in favor of, or against, the strict stationarity of a series generated by a random coefficient autoregressive (RCAR) model: where X 0 is an initial value: (1.1) Model (1.1) has been paid considerable attention by the literature, mainly due to its flexibility and analytical tractability-see Nicholls and Quinn (2012) and the references in Aue and Horv ath (2011), and also the article by Diaconis and Freedman (1999) where several examples are discussed. Eq. (1.1) has also become increasingly popular in econometrics. Indeed, it is immediate to see that (1.1) nests the AR(1) model as a special case, with the advantage that it can be viewed as a competitor for a model with an abrupt break in the autoregressive root (see, especially, a related article by Giraitis et al., 2014). Further, a closely related specification is the so-called double autoregressive (DAR) model X t ¼ uX tÀ1 þ v t with v t ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi a þ bX 2 tÀ1 p t and t is an i.i.d. process; this model, in turn, nests the popular ARCH specification (see Theorem 1 in Tsay, 1987). Finally, (1.1) has been employed as a more general alternative to deterministic unit root processes, with u ¼ 1 and b t not identically zero-this is known in the literature as the stochastic unit root (STUR) process, and we refer to the contributions by Granger and Swanson (1997), McCabe and Tremayne (1995), and Leybourne et al. (1996), among others, for an overview.
Various aspects of the inference on (1.1) are well developed. The estimation of u, in particular, has been studied in numerous contributions, for both the stationary case (Aue et al., 2006) and the nonstationary case (Berkes et al., 2009); Aue and Horv ath (2011) suggest using the quasimaximum likelihood (QML) estimator, showing that the estimator of u is always consistent and asymptotically normal, irrespective of the stationarity (or lack thereof) of X t , as long as b t is not equal zero almost surely. The same result has also been shown for several other estimators, like the Weighted Least Squares (WLS) estimator (see Horv ath and Trapani, 2016) and the empirical likelihood (EL) estimator, with Hill and Peng (2014) and Hill et al. (2017) showing that standard normal inference holds for all possible cases.
Conversely, other parts of the inference on (1.1) are not fully established. In particular, only few results are available as far as testing for the stationarity/ergodicity of X t is concerned. Most contributions focus on restricted versions of (1.1), e.g. testing whether X t is a genuine unit root process versus the alternative of a STUR process-see McCabe and Tremayne (1995), Leybourne et al. (1996), Distaso (2008) and Nagakura (2009). This literature has recently experienced a revival, starting from the contribution by Aue (2008), with the introduction, within the STUR context, of the notion of local-to-unity roots (we refer to the recent article by Lieberman and Phillips, 2017, and to the references therein; and to (Banerjee et al., 2020). A common feature to these approaches, however, is that they are not fully clear about X t being stationary or not, since a STUR process may well be strictly stationary (we refer to Yoon, 2006 for an insightful discussion). The case of a stochastic unit root is of great interest, since it is a marked departure from the usual deterministic unit root case. Indeed, whilst the unit root hypothesis may often hold for several series, nonstationarity itself may not; in this respect, the literature has explored, especially in the context of financial data, the notion of "volatility induced stationarity," where a series may have a unit root, but its variance grows to infinity so as to compensate the deviations from the mean (see Nielsen andRahbek, 2014 andKanaya, 2011). There are several reasons why it would be useful to find out whether X t is stationary and ergodic or not: estimating the variance of the error term e t is possible only under stationarity; further, in order to recover standard normal inference when X t is nonstationary through e.g. the QML estimator discussed in Aue and Horv ath (2011), it is required that b t is not equal to zero almost surely, otherwise the standard OLS estimator has a faster rate of convergence (Wang and Yu, 2015). Although there is a plethora of tests for a unit root in the AR(1) case, to the best of our knowledge there are no contributions which have a sufficient amount of generality to be applicable to the context of AR(1) and RCAR(1) models. In addition to this, even in the AR(1) case, testing procedures usually require the existence of at least the first two moments; tests for a unit root in presence of infinite variance have been studied (Phillips, 1990), but their implementation usually requires using a different limiting distribution (and, consequently, different critical values)-note however that, in a related contribution, Cavaliere et al. (2018) provide a solution by using the bootstrap, but their results do not cover the RCAR case and it is not immediately clear how to generalize the bootstrap to the RCAR model under general conditions (see Fink and Kreiss, 2013).
Such a lack of procedures to check for the (strict) stationarity of an RCAR model is undesirable, in the light of the empirical potential of this family of models. Indeed, the literature has developed a fair amount of tests for stationarity (either as the null, or, perhaps more frequently, the alternative hypothesis) in various contexts. There are, as is well-known, many contributions for the unit root problem in the case of a linear model (such as the AR(1), which as noted above is nested in the RCAR set-up), but there are also quite a few contributions for more general frameworks. For example, Kapetanios (2007), Lima and Neri (2013), and Busetti and Harvey (2010) propose tests for the null of stationarity by using quantile-based techniques, thus not needing to specify a model for the data. In addition, there are also several tests that deal with specific nonlinear models, such as TAR (Caner and Hansen, 2001;Tsay, 1997), STAR (Kapetanios et al., 2003), and ESTAR (Kiliç, 2011). In particular, building on the notion of top Lyapunov exponent, Guo et al. (2016) propose a test for the null of strict stationarity in the context of a DAR model; see also Shintani and Linton (2004), and Ling (2004) and Francq and Zakoïan (2012). However, all the contributions cited above have, in common, the assumption that some moments of the distribution of the data (typically, the variance) need to exist. This article extends the literature in at least three directions. Firstly, as mentioned above, contributions on testing for the stationarity of the RCAR model are rare; furthermore, the existing exceptions (e.g. Zhao and Wang, 2012) typically require moment restrictions such as having finite variance. Furthermore, our test, which is constructed for the null of (strict) stationarity in the RCAR framework, can also be applied to the case of an AR(1) model with no modifications and no need to test for genuine randomness in (1.1). Indeed, even though in the main part of the article we consider the construction of a test for stationarity without having any deterministics in (1.1), as is typical in the case of infinite variance, we also consider extensions to include these: our test can be applied without any pretreatment of the data in the presence of bounded deterministics (including constant and piecewise constant functions) and, subject to detrending, to the case of trends. We also propose, as a byproduct, a test for the null of nonstationarity with the same level of generality as discussed above. Finally, our test can be applied in the presence of infinite variance and even infinite mean (indeed, all that is required is the existence of some moments, i.e. Eje 0 j < 1 and Ejb 0 j < 1, for arbitrarily small ); again, the test does not require any prior knowledge as to the existence of moments, and it can be applied directly with no modifications (even in this case, irrespective of having an RCAR or an AR(1) model). Having a test which can be readily applied in the presence of heavy tails with no need to estimate nuisance parameters is arguably useful in empirical applications: data with infinite variance (or even infinite mean) could occur in applications to finance and also to other disciplines-we refer to the textbook by Embrechts et al. (2013) for details and discussed examples. An important feature of several inferential procedures is that they rely on the estimation of the so-called "tail index", which characterizes the largest existing moment of a random variable: this parameter is, however, notoriously difficult to estimate (see Embrechts et al., 2013). Note also that, since we allow for EjX t j 2 ¼ 1, our focus is on strict, as opposed to weak, stationarity.
From a technical point of view, we construct a scale-invariant statistic which diverges to positive infinity under the null of stationarity, and drifts to zero at a polynomial rate under the alternative hypothesis, in all possible circumstances. In the context of a nonlinear model like the RCAR (also with the possibility of having infinite variance or even infinite mean) it is not easy to construct a statistic which converges to a unique and "easy" limiting distribution: hence, we do not derive any distributional limit for our test statistic, only rates of convergence/divergence. Given that our proposed statistic does not have a usable limiting distribution, we propose to randomize it, in a similar spirit to Corradi and Swanson (2006) and Bandi and Corradi (2014). The test can then be employed as is; however, given that it is a randomized test whose outcome depends on the auxiliary randomization, we complement our methodology by also proposing a strong decision rule which is independent of the randomization, thereby giving the same outcome to different users.
The article is organized as follows. The main assumptions, the test statistic, and the asymptotics, are all in Section 2. In Section 3 we report extensions to: including deterministics, testing for the null of nonstationarity, and developing a family of related statistics. Monte Carlo evidence is reported in Section 4, where we also carry out an empirical application to illustrate the use of our procedure and in particular of the proposed strong decision rule. Section 5 concludes. Technical results and all proofs are in Appendix.

Notation
We use "!" to denote the ordinary limit; "a.s." stands for almost sure (convergence); C 0 , C 1 , … denote positive and finite constants that do not depend on the sample size (unless otherwise stated), and whose value may change from line to line; I A ðxÞ is the indicator function of a set A; strictly positive, arbitrarily small constants are denoted as -again, the value of may change from line to line. Finally, since all results in the article hold almost surely, orders of magnitude for an a.s. convergent sequence (say s T ) are denoted as OðT 1 Þ and oðT 1 Þ when, for some > 0 andT < 1, P jT À1 s T j < for all T !T Â Ã ¼ 1 and T À1 s T ! 0 a.s., respectively.

Testing for strict stationarity
This section contains all the relevant theory. In Section 2.1 we spell out the necessary and sufficient conditions required for strict stationarity, and discuss under which circumstances the second moment of X t is finite. Assumptions are in Section 2.2, and the test statistic is reported in Section 2.3.

Classification and hypothesis testing framework
Recall (1.1) It is well known that, under minimal assumptions such as the existence of logarithmic moments for e 0 and u þ b 0 , three separate regimes can hold for the solutions of (1.1), depending on the value taken by E ln ju þ b 0 j : Note that, when Eb 2 0 > 0, E ln ju þ b 0 j can be negative even when u ¼ 1 : thus, the STUR process can converge to a strictly stationary solution, although in such a case X t has an infinite second moment (see Hwang and Basawa, 2005). More generally, the variance of X t needs not be finite under strict stationarity; a necessary and sufficient condition for this is Eb 2 0 þ u 2 < 1 (Quinn, 1982). ii. If E ln ju þ b 0 j > 0, then X t is nonstationary and it exhibits an explosive behavior. This case has also been studied in depth by the literature: by Corollary 1 in Berkes et al. (2009), jX t j diverges exponentially fast, i.e. exp ðÀC 0 tÞjX t j ! 1 a.s. for all 0 < C 0 < E ln ju þ b 0 j: iii. In the boundary case E ln ju þ b 0 j ¼ 0, X t is nonstationary (see also the comments after Assumption 2). Even in this case jX t j diverges, but at a slower rate than exponential. This case has been paid comparatively less attention in the literature.
Clearly, this classification also holds for the basic AR(1) model, i.e. for the case b 0 ¼ 0 a.s. On the grounds of the classification above, we propose a procedure to decide between We point out that, based on the above, a natural approach to test for stationarity would be based on estimating E ln ju þ b 0 j, and constructing a test for the null that E ln ju þ b 0 j < 0: Indeed, based on standard arguments (see e.g. see Eq. (4.53) in Douc et al., 2014), it is easy to see that E ln ju þ b 0 j is the top Lyapunov exponent associated with (1.1). The literature has proposed some estimators for the top Lyapunov exponent, starting from the contributions by Eckmann and Ruelle (1985) and Eckmann et al. (1986) (see also the article by Barnett et al., 1997); in particular, Shintani and Linton (2004) show the asymptotic normality of the estimated top Lyapunov exponent, which can therefore be readily employed to test for H 0 : E ln ju þ b 0 j < 0: However, the theory requires finite second moments. It can be conjectured that, in principle, it would be possible to derive the limiting distribution in the case of heavy tails, but this is bound to depend on the tail index.

Assumptions
We now introduce and discuss the main assumptions. The first assumption must be satisfied by X t irrespective of the regime it belongs to, and it can be compared to the assumptions in Aue et al. (2006).
Assumption 1. It holds that: (i) fb t , À 1 < t < 1g and e t , À 1 < t < 1 f gare independent sequences; (ii) fb t , À 1 < t < 1g are independent and identically distributed random variables; (iii) fe t , À 1 < t < 1g are independent and identically distributed random variables; (iv) b 0 and e 0 are symmetric random variables; (v) Ejb 0 j < 1 and Eje 0 j < 1 for some > 0; (vi) X 0 is independent of e t , b t , t ! 1 f gwith EjX 0 j < 1: Assumption 1 contains minimal requirements as far as the existence of moments is concerned, and, in this respect, it is very general. Note that by part (iv) of the assumption, we require that, when the mean of e t and b t exists, this is zero. As far as the i.i.d. requirement for e t and b t is concerned, it is typical in this literature (see Aue et al., 2006), and it is imposed only in order for the main arguments in the proofs not to be overshadowed by technical details. Indeed, relaxing the assumption of independence is possible: the conditions for stationarity mentioned above hold as long as e t , b t f g is strictly stationary and ergodic (see Theorem 4.1 in Douc et al., 2014) as well as having logarithmic moments. Also, the technical arguments used in the article (essentially, the ergodic theorem, the SLLN, and the almost sure invariance principle) can all be extended to the case of weakly dependent data. A major advantage of our approach, in this case, is that our test statistic is based only on rates and therefore, even in the presence of dependence, our test statistic would not require any modifications such as, for example, the estimation of long run variance matrices.
Stationary units must also satisfy the following assumption.
Assumption 2. If E ln ju þ b 0 j < 0, it holds that (i) Pðj X 0 j ¼ 0Þ < 1; (ii) Pðje 0 j ¼ 0Þ < 1: Assumption 2 is also quite standard in the literature. Part (i) is, in essence, a nondegeneracy requirement; as far as part (ii) is concerned, its most immediate consequence is that, under the other assumptions in this article, the condition E ln ju þ b 0 j < 0 is necessary and sufficient for strict stationarity-see Aue et al. (2006). When X t is nonstationary, we need the following assumptions in addition to Assumption 1.
Assumption 3. If E ln ju þ b 0 j ! 0, it holds that: (i) e 0 has bounded density; (ii) when Pðb 0 ¼ 0Þ < 1, Ej ln ju þ b 0 jj k < 1 for some k > 2; (iii) EX 2 0 < 1: Assumption 4. When E ln ju þ b 0 j ¼ 0 with b 0 ¼ 0 a.s., it holds that either (i) EjX 0 j 2 < 1 and Eje 0 j 0 < 1 for some 0 > 2; or (ii) (a) fe t , À 1 < t < 1g are symmetric random variables with common distribution FðxÞ such that with C 0 > 0, c 2 ð0, 2 and 1ðxÞ ! 0 as x ! 1, with 1ðxÞx Àc decreasing for all x ! x 0 ; and (b) EjX 0 j c 0 < 1 for all c 0 < c: Assumption 3 is relatively common in this literature (see e.g. Berkes et al., 2009). The main part of the assumption is part (ii), which poses a moment restriction on ln ju þ b 0 j : asymptotics is based on this quantity rather than on ju þ b 0 j; note that this is not, therefore, a requirement on the existence of e.g. the second moment of b 0 . Assumption 4 deals with the standard unit root case, where u ¼ 1 and b 0 ¼ 0 a.s.; all other cases are covered by Assumption 3. Part (ii) of the assumption, in particular, allows for infinite variance, and indeed only requires minimal moment existence conditions-all that is needed is that the tail index, c, be strictly positive, so that the variance, and even the first absolute moment, need not be finite. Note that we do not need to estimate c at any stage of the proposed testing procedure. The requirements on the tail behavior of the distribution function are rather standard in the literature (Berkes and Dehling, 1989;Berkes et al., 1986). Some of the technical results derived under this assumption are of general interest, such as the anti-concentration bound in Lemma 8.

Detecting strict stationarity
We start by discussing the rationale underpinning the construction of the test statistic. In the light of the comments above, jX t j ! 1 or not according as X t is nonstationary or stationary: this holds under quite general circumstances, e.g. whether b t ¼ 0 or not, or whether EjX t j 2 < 1 or not. Thus, it could be possible to exploit this fact to decide between H 0 and H A . Heuristically, a possible indicator could be based on where u t ¼ X t À uX tÀ1 and I tÀ1 is the information set available up to t -1- (2.3) represents the (conditional) variance of the "error term" u t . Testing for stationarity or the lack thereof based on the growth rate of variances has already been considered (see e.g. Bandi and Corradi, 2014;Cai and Shintani, 2006;Corradi, 1999). Albeit natural, this approach suffers from several drawbacks: Eb 2 0 and/or Ee 2 0 may not exist (thus limiting the applicability of the test); or Eb 2 0 could be zero, which would prevent the test from being applied to genuine ARð1Þ specifications; finally, approaches based on (2.3) may require estimates of Eb 2 0 and Ee 2 0 , with the latter not being always consistent (see Horv ath and Trapani, 2019). In order to overcome all these difficulties, one could instead think of using the transformation (2.4) where 0 < a < 1 is chosen so as to ensure scale invariance. Heuristically, since a > 0, Y t should not be equal to 0 when X t is stationary; conversely, since a < 1, when X t is nonstationary, Y t should drift to 0. The variable Y t is not affected by X t having infinite variance or infinite mean, since all moments of Y t exist by construction. Also, in the definition of Y t there is no dependence on Eb 2 0 : thus, Y t uses the full force of X t when this diverges, even in the presence of a genuine ARð1Þ specification for which b t ¼ 0. Finally, upon choosing a in a suitable way, no estimation of Eb 2 0 or Ee 2 0 is required, making the problem much more tractable. We build on these considerations in order to propose a test for the strict stationarity of X t , based on the following scale-invariant transformation (2.5) where v p ¼ p À1 P p t¼1 X 2 t can be viewed as a sample second moment (or, if using demeaned data, a sample variance). We point out that, as shown in the remainder of the article, v p can still be used even when the second moment of X t is not finite. Based on the heuristic considerations above, D T should converge to a strictly positive number under stationarity, and to zero otherwise. As a final, ancillary comment, we note that a related quantity to D T is used in the context of the estimation of u-see Jane ckov a and Pr a skov a (2004).
The computation of v p should satisfy the following assumption: Assumption 5. It holds that p ¼ pðTÞ with: (i) lim T!1 pðTÞ ¼ 1; (ii) lim sup T!1 pðTÞ ln ln T ¼ C 0 < 1: The rates of convergence are summarized in the following theorem.
Theorem 1. Under Assumptions 1-5, it holds that where 0 < C 0 < 1 and > 0: The constant C 0 1 in (2.6) is explicitly calculated in the appendix, where it is shown that its value differs according as EX 2 0 < 1 or ¼ 1-in the latter case, C 0 ¼ 1: However, for the purpose of the implementation of the test, it suffices to have C 0 > 0: The result in (2.7) means that, under nonstationarity, D T drifts to zero. We do not know anything about the value of in general; however, on account of (2.7), D T converges a.s. to zero at a polynomial rate.
Based on Theorem 1, we can propose a test for H 0 : Consider a sequence wðTÞ such that, as T ! 1 wðTÞ ! 1 and T À wðTÞ ! 0: (2.8) Then, by virtue of (2.6) and (2.7), we can assume that lim T!1 Thus, wðTÞD T diverges to positive infinity under the null, whereas it drifts to zero under the alternative. Eqs. (2.9) and (2.10) suggest that wðTÞD T can be a suitable diagnostic to discriminate between E ln ju þ b 0 j < 0 and E ln ju þ b 0 j ! 0: Let now gðÁÞ be a continuous, monotonically increasing function such that gð0Þ ¼ 0 and lim x!1 gðxÞ ¼ 1, and define based on (2.9) and (2.10), it holds that and consequently we can assume that lim T!1 l T ¼ 1 under H 0 , and ¼ 0 under H A . Note that, on account of (2.8), wðTÞ may not be allowed to diverge too fast; wðTÞ ¼ ð ln TÞ b , for some b > 0, is a possible choice. However, depending on the choice of the function gðÁÞ, the sequence l T can be made to diverge arbitrarily fast. Our test is based on a randomized version of l T . We propose a "classical" randomization scheme, which has been employed in the literature-we refer to Corradi and Swanson (2006) and Bandi and Corradi (2014) inter alia. Of course, other schemes are also possible.
Step 1 Generate an i.i.d. sequence n j È É , 1 j R, with common distribution GðÁÞ: Step 2 Generate the Bernoulli sequence f j ¼ Iðl

1=2
T n j uÞ, with u extracted from a distribution FðuÞ: Step 3 Compute Step 4 Define j# R, T ðuÞj 2 dFðuÞ: (2.14) We need the following regularity conditions on GðÁÞ and FðÁÞ : Assumption 6. It holds that: (i) GðÁÞ has bounded density and Gð0Þ 6 ¼ 0 or 1; (ii) Ð þ1 À1 u 2 dFðuÞ < 1: We are now ready to present the main results. Let P Ã denote the conditional probability with respect of fe t , b t , À 1 < t < 1g; we use the notation "! D Ã " and "! P Ã " to define, respectively, conditional convergence in distribution and in probability according to P Ã : Theorem 2. Under Assumptions 1-6, as minðR, TÞ ! 1 with (2.8) and for almost all realizations of b t , e t , À 1 < t < 1 f g : Under Assumptions 1 and 3-5, as minðR, TÞ ! 1 with (2.8), it holds that for almost all realizations of b t , e t , À 1 < t < 1 f g : Theorem 2 provides the limiting behavior of the test statistic H R, T under the null and under the alternative. The results are derived conditional on the sample, and they hold for all possible realizations, apart from a set of measure zero. Considering (2.17), note that the drift term is The theorem illustrates how the choice of R impacts on the behavior of H R, T : On the one hand, one should choose R as large as possible, in order to maximize the rate of divergence of H R, T under the alternative-this is evident from (2.17). On the other hand, the test statistic has a noncentrality parameter which grows with R, under the null: this is illustrated by (2.15). Consequently, R should not be too large, in order to avoid size distortion.
Theorem 2 implies that lim minðT, RÞ!1 for almost all realizations of b t , e t , À 1 < t < 1 f g , where c a is defined as PfNð0, 1Þ ! c a g ¼ a, a 2 ð0, 1Þ: 2.3.1. Deciding between H 0 and H A Testing using H R, T is, in essence, based on checking a rate of convergence, and it is therefore very similar to the idea in Bandi and Corradi (2014)-we also refer to a contribution by Kanaya (2011) for a discussion. In principle (although, possibly, with some interpretational difficulties related to having a null hypothesis spelt out in terms of a rate of divergence), the test can be employed as it is, and its main properties (size and power) are reported in (2.18) and (2.19). Based on the latter equation, the test rejects the null with probability one when false, thus being consistent. Conversely, the meaning of (2.18) is nonstandard. The randomness in H R, T is added by the researcher, and indeed it is the only randomness present in the statistic: such randomness does not vanish asymptotically. Thus, different researchers using the same dataset will obtain different p values; indeed, if an infinite number of researchers apply the test to the same data (and the null holds), the resulting p values will be uniformly distributed on 0, 1 ½ : This is a well-known feature of randomized tests, and it may be viewed as an undesirable issue, which may explain their relative infrequent use-see the discussion and the solutions proposed in the contribution by Geyer and Meeden (2005). We build on the notion of "randomised confidence function," proposed by Song (2016), in order to propose a strong rule to decide between H 0 and H A , whose outcome is the same for all researchers using the same dataset. In order to remove the randomness from the test statistic, each researcher, instead of computing H R, T just once, will compute the test statistic S times using, at each iteration s, an independent sequence n ðsÞ j n o for 1 j R and 1 s S, then defining in our context, QðaÞ is the randomized confidence function proposed by Song (2016), computed under the null. Intuitively, based on Theorem 2, QðaÞ should converge to 1 À a under H 0 , and to 0 under H A . This dichotomous behavior is not subject to the randomness added by the researcher (which is washed away as S ! 1), and it could be employed to construct a decision rule based on the Law of the Iterated Logarithm; in particular, we propose to decide in favor of H 0 if for almost all realizations of b t , e t , À 1 < t < 1 f g and for every a > 0: The rule in (2.21) is similar to the rule proposed by Corradi (1999), who proposes a bound to discern between Ið0Þ and Ið1Þ-related contributions, where criteria are proposed in the context of choosing between Ið0Þ and Ið1Þ, have also been developed by Stock (1994), Phillips and Ploberger (1994), and Phillips and Ploberger (1996). A typical advantage of this approach is that, at least asymptotically, it yields a probability zero of having both a Type I error and a Type II error, as (2.23) and (2.24) show. In addition to this, as far as our context is concerned, we note that, based on Corollary 1, under H 0 each researcher will make the same decision ("accept"), with no discrepancies among researchers and probability one of being correct, and similarly under H A .

Discussion and extensions
In this section, we discuss possible extensions and generalizations of the basic test statistic. We consider three possible generalizations: (a) the case of deterministics being present in (1.1); (b) the construction of a test for the null of nonstationarity; and finally, (c) the construction of different, but related test statistics. All results are shown in Appendix when necessary; the proofs of some results, however, follow readily from the existing proofs, and thus, when possible, we omit the details.

Employing the test in the presence of deterministics
So far, we have assumed no deterministics in (1.1). This is common in this literature, when heavy tails are considered, and we also refer to the comments in Cavaliere et al. (2018). In this section, we show that it is possible to consider an extension to incorporate deterministics in (1.1). We assume that the observed data-say X Ã t -are generated as where X t is defined as in (1.1). There are two types of deterministic processes d t . In the case of square integrable d t , viz.
We show that the test can be used with no modifications. Condition (3.2) includes several possible cases: d t can be constant; it can be piecewise constant, thus allowing for shifts in the mean; or it could be a weighted average of sines and cosines, which could be useful to model seasonalities and, in general, smooth, bounded processes (see Enders and Lee, 2012). In all these cases, and indeed whenever (3.2) is satisfied, the test can be applied directly, with no modifications or prior knowledge of the nature of d t .
It is also possible to apply the test in the presence of trends, and in general when (3.2) does not hold. In this case, it is necessary to detrend the data first, by estimating d t via, say,d t : The test statistic D T is modified as according as ð3:2Þ holds true ð3:2Þ does not hold : ( We formalize our discussion in the following assumption. Assumption 7. It holds that (i) either (a) (3.2) holds; or (b) it holds that Ejd t À d t j 2 ¼ OðT À 1 Þ for some 1 > 0; (ii) when E ln ju þ b 0 j < 0, Pðj X 0 j ¼ cÞ < 1 for all c 2 R: of the assumption is very similar, in spirit, to Assumption 5 in Kapetanios (2007) and it essentially requires thatd t be a consistent estimator of the (trend) function d t . Although several choices are possible, we refer to the contribution by Peng and Yao (2004) on the estimation of trend functions in the presence of heavy tails, where MSE consistency is still ensured; note that we do not need 1 to be any special value, as long as the MSE drifts to zero at a polynomial rate. Part (ii) strengthens Assumption 2, and it is again a nondegeneracy condition.
It holds that Theorem 3. Under Assumptions 1-5 and 7, Eqs. (2.6) and (2.7) hold. Theorem 3 entails that D Ã T can be used in the same way as D T , obtaining the same results for the corresponding test (under the same assumptions), and it can be generalized as we do with D T in the next sections.

Testing for the null of nonstationarity
In the spirit of the KPSS test (see Kwiatkowski et al., 1992; see also the contribution by Giraitis et al., 2006), and of other constributions in the context of nonlinear models (see e.g. Kapetanios, 2007), all the results developed so far are based on the hypothesis testing framework set out in (2.2), where H 0 : X t is strictly stationary.
However, a test for can be readily derived from the theory developed above. Our testing approach requires having a test statistic which diverges under the null, whilst being bounded under the alternative. On account of (2.9) and (2.10), one could use where wðTÞ is defined in (2.8); by continuity, it follows that from which a test, based on a randomized version of l Ã T , can be constructed. Using the same algorithm as proposed above, we would obtain a test statistic denoted by H Ã R, T , whose asymptotics is in the following theorem, reported without proof.
Theorem 4. We assume that Assumptions 1-6 are satisfied. Then, under H Ã 0 , as minðT, A , it holds that as minðT, RÞ ! 1 for almost all realizations of b t , e t , À 1 < t < 1 f g : As for H R, T , (3.5) provides a selection rule for R: If, as suggested in the next section, one were to choose gðxÞ ¼ exp ðxÞ À 1-and wðTÞ ¼ ð ln TÞ b , as recommended above-then setting R ¼ T would satisfy (3.5). Finally, note that instead of X t the same arguments could be applied to Y t .

Modifications of the test statistic
Heuristically, our test statistic is based on studying the variance of u t ¼ e t þ b t X tÀ1 in The main intuition is that under stationarity Eðu 2 t jI tÀ1 Þ should be bounded as t elapses, whereas it grows as t ! 1 when X t is nonstationary. Owing to the reasons discussed at the beginning of Section 2.3, the testing procedure uses a modified statistic, rather than Eðu 2 t jI tÀ1 Þ directly. In a similar vein, one use different moments of the error term u t , say Eðju t j 1 jI tÀ1 Þ for some 1 > 0 : it can be expected that the stationarity/nonstationarity of X t will still entail the convergence/divergence of Eðju t j 1 jI tÀ1 Þ: From a technical point of view, our arguments and our proofs differ depending on whether EjX t j 2 ¼ 1 or not, and therefore it can be envisaged that, if relying upon Eðju t j 1 jI tÀ1 Þ, proofs will differ according as EjX t j 1 ¼ 1 or not-but apart from this, the final results will be the same.
Theorem 5 entails that D T ð1Þ has the same properties as D T , and it can therefore be used, and generalized, in the same way. Indeed, upon making sure that the artificial samples used in the various randomizations are generated independently, it would even be possible to run a meta-test by trying several values of 1 and combining the outcomes e.g. according to Fisher's method. As before, the same ideas could be applied to Y t . Although this extension is theoretically possible, unreported simulations show that 1 ¼ 2 affords the best results; choosing 1 < 2 makes tests very conservative (e.g. when 1 ¼ 1 the power is cut by two thirds), whereas 1 > 2 results in the opposite problem.

Simulations and empirical illustration
This section contains two separate contributions. In Section 4.1 we report some evidence from synthetic data on the empirical rejection frequencies of our test in order to assess size and power; we analyze the performance of decision rules based on D a, S ; and we discuss possible guidelines for the implementation of the test statistic. In Section 4.2, we illustrate our approach, and in particular the use of D a, S , through an application to several US macro aggregates.

Monte Carlo evidence
This section contains three separate subsections. Firstly, we study the performance of our test when the assumptions spelt out in the article hold. Secondly, we assess the robustness of our results by studying the performance of the test when the i.i.d. assumption is relaxed. Finally, we consider a small scale comparison of our test against possible alternative tests.
In all cases, the design of the reported experiments is as follows.
We use (1.1) as a DGP; b t is generated as i.i.d. Nð0, r 2 b Þ, with r 2 b 2 0, 0:1, 0:25 f gin order to consider the genuine AR case as well as cases with random coefficients. According to the theory, it would also be possible to consider a heavy tailed distribution for b t ; we noticed through few trials that doing this does not change results in a decisive way. We report results for three different specifications of e t : i.i.d. Nð0, r 2 e Þ, i.i.d. t 2 and i.i.d. t 1 , where t k denotes a Student's t distribution with k degrees of freedom, so as to consider the cases of infinite variance and infinite mean. In the Gaussian case, we have used r 2 e ¼ 1; we note, however, that the test is completely insensitive to the value of r 2 e , which suggests that the use of v p is very effective at ensuring scale invariance. We have used u 2 0, 0:5, 0:75, 0:95, 1, 1:05 f g ; larger values of u, for the nonstationary cases, could also be considered but in those cases-as can be expected-the test has unit power even for very small samples.
The various combinations u, r 2 b È É deserve attention. All cases where u 0:95 entail that X t is stationary: the corresponding empirical rejection frequencies represent the size of the test. Also, upon computing the value of E ln ju þ b 0 j, it can be noted that the two cases u, r 2 gcorrespond to a stationary STUR; even in these cases the empirical rejection frequencies represent the size, and it should be noted that, when X t is a (stationary) STUR process, it has infinite variance irrespective of the distributions of b t and e t . Finally, again upon computing E ln ju þ b 0 j, it turns out that the case u, r 2 b È É ¼ 1:05, 0:25 f gis also an instance of X t being stationary, again with infinite variance.
Thus, the nonstationary cases considered in our experiment are a pure explosive case corresponding to u, r 2 b È É ¼ 1:05, 0 f g, and the pure unit root case u, r 2 b È É ¼ 1, 0 f g: In the latter case, clearly E ln ju þ b 0 j ¼ 0 and therefore X t is on the cusp between explosive and strictly stationary behavior-note that we are considering, by virtue of the several possible distributions of e t , also cases of random walk with infinite variance and mean, in a similar spirit to Cavaliere et al. (2018). However, in our case the null is stationarity, not unit root, and therefore the empirical rejection frequencies represent the power of our test. Finally, we point out that the case u, r 2 b È É ¼ 1:05, 0:1 f g is of particular interest because E ln ju þ b 0 j ¼ 3:3 Â 10 À3 -that is, it is positive (and, therefore, X t is nonstationary) but very small. Finally, in a separate experiment (the outcomes are in Tables 2 and 4), we have considered several combinations of u, r 2 b È É for which E ln ju þ b 0 j ¼ 0, so as to evaluate the behavior of the test in those cases.
We now turn to describing the specifications of the test; as a general note, their impact vanishes for large samples. Our reported experiments are based on the following choices, which delivered the best results and are thus recommended as guidelines to the applied user. We choose: the choices in (4.1) and (4.2) are designed in order to ensure that gðwðTÞD T Þ exp ðTÞ under H 0 , and that gðwðTÞD T Þ drifts to zero as T ! 1 under H A . Thus, the double exponential in (4.2) serves the purpose of "divaricating" as much as possible the case where wðTÞD T diverges from the case where it does not; other choices would also be possible, but (4.1) and (4.2) work well in all cases considered. Based on (2.15), we set R ¼ T. Finally, by Assumption 5, in the computation of v p (which we carry out with demeaned data), we need to choose p ¼ C 0 ln ln T for some C 0 , which implies that p does not vary too much as T increases. We have used C 0 ¼ 2, rounding C 0 ln ln T to the nearest, largest integer; varying p around this number does not affect the results anyway. Finally, we have implemented the decision rule based on D a, S using S ¼ 1000; we note that increasing this number results in better outcomes, at the (obvious) cost of a higher computational burden. Under each scenario, we compute the percentage of times that the decision rule is in favor of H 0 , using this as a measure of performance. We generate n j È É R j¼1 as i.i.d. Nð0, 1Þ, and u is À ffiffi ffi 2 p , ffiffi ffi 2 p È É with equal probability. Finally, the sample sizes have been chosen as T 2 250, 500, 1000, 2000 f g ; the first 1000 observations have been discarded to avoid dependence on initial conditions. The number of replications is set equal to 2000; when evaluating the size, this entails that empirical rejection frequencies have a confidence interval of 0:04, 0:06 ½ : 4.1.1. Empirical rejection frequencies and the D a , S criterion Results are reported in Tables 1 and 2 below. Table 1 contains the empirical rejection frequencies for the test for H 0 : X t is strictly stationary, using the combinations of u, r 2 b È É indicated above and, in brackets, the percentage of times that the decision rule based on D a, S leads to accepting H 0 . As can be noted, the test has the correct size for almost all cases considered-exceptions are cases on the boundary (such as , 0:25 f g ), but even in these cases the size becomes correct as T increases. The distribution of the error term e t does not affect, in general, the values of the empirical rejection frequencies, with few exceptions. As far as power is concerned, in the pure unit root case-viz. when u, r 2 b È É ¼ 1, 0 f g-the test exhibits good power, which is found to be higher than 50% whenever T ! 500, despite not being designed explicitly for that specific alternative hypothesis; even in this case, the results are broadly similar for different distributions of e t . As a conclusion, the test seems to work well in discerning between a genuinely unit root process, and a (stationary) STUR process. The test is also powerful in the purely explosive case u, r 2 b È É ¼ 1:05, 0 f g, and has some power also versus the "boundary" case u, r 2 b È É ¼ 1:05, 0:1 f g ; in this case, the power is affected by the distribution of the error term for small T, and it declines as the tails of the distribution of e t become heavier, but this seems to vanish as T increases. Similar considerations hold for D a, S ; note that, when data have heavy tails, in the case u, r 2 b È É ¼ 0:95, 0 f g the procedure requires, in order to work sufficiently well, T ! 1000: As mentioned above, we have also considered a broader set of cases where X t is nonstationary, in which E ln ju þ b 0 j ¼ 0: The power of our test versus these alternatives is in Table 2; the test has very good power in all cases considered, the only possible exception being the case with normally distributed errors and T ¼ 250, but even in that case the power picks up for larger T. Note the major increase in power when the error term e t has a Student's t distribution; this does not seem to be sensitive to the degrees of freedom of the distribution. Note also the excellent performance of D a, S for large T.

Further experiments
In order to further assess the performance and properties of the proposed test, we have also considered further experiments based on variations of the main test statistic and of the main assumptions. In this section, we report three set of results which, albeit in a more succint way, complement the previous ones. We firstly report results for test for H 0 : X t is nonstationary. We subsequently explore the properties of the test under failure of the i.i.d. assumption in the shock b t . Finally, we compare the performance of our approach against other existing methodologies.
4.1.2.1. Testing for the null of nonstationarity. We begin by presenting results on testing for H 0 : X t is nonstationary, based on the discussion in Section 3.2. For brevity, we did not experiment with D a, S ; otherwise, the design of the simulations, and the specification of the test statistic, are carried out exactly as in the previous case. As suggested in Section 3.2, we use in the construction of the test (Tables 3 and 4).
The test has the correct size in all cases considered. The power versus stationarity is strong when X t is "very stationary"-i.e. when u ¼ 0 or 0.5, and it is anyway above 50% in the "less stationary" case of having u ¼ 0:95 when T ! 500: Similarly, Table 4 shows that the test has the correct size even in the nonstationary, but boundary, case E ln ju þ b 0 j ¼ 0: We note that, in all cases considered, the distribution of e t does not seem to play a role on the final results.

4.1.2.2.
Testing for the null of stationarity in the non i.i.d. case. We now turn to considering the test performance when the i.i.d. restriction in Assumptions 1 fails to hold. In particular, we report the empirical rejection frequencies (we omit the values of D a, S because these are essentially in line with the ones in Tables 1 and 2) in the case where the innovation b t has an autoregressive structure, viz.
where e b t is generated as i.i.d. Nð0, r 2 e Þ, i.i.d. t 2 and i.i.d. t 1 , where, as before, t k denotes a Student's t distribution with k degrees of freedom. We have considered the cases q b ¼ 0:5, 0:75 f g , which complement the results in Tables 1 and 2; we point out that, in these cases, we have set r 2 e ¼ c e =ð1 À q 2 b Þ, with c e chosen so that, as in the previous experiments, we have r 2 b ¼ 0, 0:1, 0:25 f g : Also, in (4.3), we consider only cases with a positive q b , as these are arguably more interesting; we have run a series of experiments with negative roots also, which we do not report here, and results are essentially unchanged (Tables 5 and 6). 1 As can be noted, results are, broadly, very similar to the case of i.i.d. innovations: the size and power of the test do not change very much in general, especially for cases which are not "too close" to the boundary between the null and the alternative. Conversely, the performance of the test does deteriorate-both in terms of size and power-in the cases close to the boundary between the null and the alternative. As can be seen, the test tends to be oversized in the stationary cases (see in particular the empirical rejection frequencies for T ¼ 250 with u ¼ 1, and r 2 b ¼ 0:1 and 0.25), although this tends to resolve itself for T ! 1000: Similarly, the power in the "local" alternative u, r 2 b È É ¼ 1:05, 0:1 f g is lower for small T although, again, it picks up as T increases.
Finally, we point out that we have also experimented with allowing for the innovation e t to follow an autoregressive specification, in a limited set of experiments which are available upon request; we note that results are essentially the same as when e t is i.i.d.

Comparison with other tests.
As a final set of experiments, we have compared the performance of our test against other, alternative procedures available in the literature. The most obvious term of comparison is the test, for the null of strict stationarity, developed by Zhao and Wang (2012), in the context of an RCA model. Whilst we refer to Zhao and Wang (2012) for details, we point out that this is a test based on h ¼ u 2 þ r 2 b , thus requiring r 2 b < 1 (indeed, the asymptotics requires EjX t j 8 < 1). The test is formulated in terms of 1 As a final remark, we note that, when r 2 b ¼ 0, we have initialized b 0 ¼ 0, and we have used the same seed as in the simulations for the i.i.d. case; thus, results for the case r 2 b ¼ 0 are the same-modulo some rounding errors-as for the i.i.d. case.
where 0 < h < 1 is a user-defined parameter. When implementing the test, we have used h ¼ 191 192 as suggested in the original article (other choices are possible). As possible terms of comparison, we have also considered several variants of the KPSS test (Kwiatkowski et al., 1992), which may be regarded as a classical way of choosing between stationarity and nonstationarity albeit in a different setup-in fact, the article by Corradi et al. (2000) suggests that these tests can be applied even in broader, nonlinear contexts. In particular, we have implemented the test under two choices of the bandwidth when estimating the long-run variance, both based on Schwert's rule as suggested in Kwiatkowski et al. (1992)-a "short" bandwidth, selected as 4bðT=100Þ 1=4 c, and a "long" bandwidth, chosen as 12bðT=100Þ 1=4 c: In addition to the original versions of the KPSS test, we have also used the modified version proposed by Leybourne and McCabe (1994), with the same choices of bandwidth. In Tables 7-9, we also report the empirical rejection frequencies of our test (which are the same as in Table 1) to facilitate the comparison; the set-up of the test is the same as employed in the construction of Table 1, and we only report results for T ¼ 500 and 1000 to save space. Results show, first of all, an extremely dichotomous behavior of the test by Zhao and Wang (2012), which is also reported by the authors. This is due to having h > 0; in turn, this entails that situations at the boundary, where h ¼ u 2 þ r 2 b is close to 1, may be missed by the test. In particular, this emerges clearly in the standard unit root case where u ¼ 1 and b t ¼ 0 a.s., where the null of stationarity is (mistakenly) never rejected, and also in the close-to-boundary case u, r 2 b È É ¼ 1:05, 0:1 f g : Upon inspection, this seems to be due to a downward bias in estimating h. Interestingly, the test does not seem to be strongly affected by whether the variance of X t is finite or not. As far as KPSS tests are concerned, as in the original article these have, broadly speaking, excellent properties under the assumptions that EðX 2 t Þ < 1 (see Table 7) and Eðb 2 0 Þ ¼ 0 : note in particular the very high power against the unit root alternative. The tests also works well in the explosive case, and in the mildly nonstationary case u, r 2 b È É ¼ 1:05, 0:1 f g : Conversely, even when e t follows a normal distribution, the test seems to work less satisfactorily in boundary cases when r 2 b > 0; in particular, in the stationary case u, r 2 b È É ¼ 1, 0:1 f g the test massively over-rejects; something similar was also noted in Arltova and Fedorova (2016). Note also that the test, again in boundary cases (chiefly, u, r 2 b È É ¼ 0:95, 0:1 f g ), has very different performance depending on the choice of the bandwidth. In all other cases, KPSS based tests work better than the test proposed in this article. This changes when EðX 2 t Þ ¼ 1 (Tables 8 and 9), with the KPSS test becoming undersized when u is small, and (massively) oversized as u approaches 1-the very high power of the test should be read in conjunction with this. Compared with the KPSS, our test seems to be preferable in the presence of heavy tails.
We have also carried out a smaller scale experiment where we use our procedure, as suggested in Section 3.2, to test for H 0 : X t is nonstationary, comparing it against the variance-ratio tests proposed by Cai and Shintani (2006), which are in a very similar spirit to our test statistic. In Table 7. Empirical rejection frequencies for the test for H 0 : X t is stationary-comparison with other tests under the case of e t $ Nð0, 1Þ: Our test is referred to as "lT test". The other tests are described in the main text; "ZW" refers to the test by Zhao and Wang (2012) for (4.4). Also, KPSS(a) refers to the KPSS test using bandwidth equal to 4bðT=100Þ 1=4 c and similarly LMC(a) refers to the KPSS variant of Leybourne and McCabe (1994). KPSS(b) and LMC(b) are the same tests, with bandwidth 12bðT=100Þ 1=4 c: particular, Cai and Shintani (2006) define four alternative tests based on variants of the following statistic wherexðÁ, ÁÞ is a weighted-sum-of-covariance defined as As in Cai and Shintani (2006), we define the four statistics: C0 (using M ¼ 1 and K ¼ bT 1=3 c); CC (using K ¼ M ¼ bT 1=3 c); CI (using M ¼ T and K ¼ bT 1=3 c); and II (using K ¼ M ¼ T). The set-up for the experiments is the same as in Table 3 above, and as above we only report results for T ¼ 500 and T ¼ 1000 (Tables 10-12).    Results show, in general, that variance ratio tests work very well, and often outperform our test. This indicates that our procedure works better as a test for (the null of) stationarity than (the null of) nonstationarity. However, variance ratio tests become entirely wrong in the explosive (and in mildly explosive) cases. Note also the size distortion when EðX 2 t Þ ¼ 1:

Empirical illustration
The purpose of this section is primarily to illustrate the use of QðaÞ and of the decision rule based on D a, S : We apply our procedure to several US macroeconomic aggregates (similarly to Hill and Peng, 2014). We consider the logs of: real GDP, M2 (as a measure of the aggregate money supply), CPI (and we also consider inflation, defined as the log-difference of CPI), and the industrial production index. We also apply our methodology to the (untransformed) rate of unemployment. Finally, we also consider the 3-month T-bill, inspired by the contribution by Nielsen and Rahbek (2014). The decision rule (2.21) is applied in order to choose between H 0 : X t is strictly stationary H A : X t is nonstationary & As a further illustration, we apply the test to first-differenced data, in the cases where a series is found to be nonstationary. Finally, we also use (2.21) to decide between again considering data in levels and (if need be) in first difference. As far as the implementation is concerned, a note on deterministics is in order. We know from Section 3.1 that our test can, in general, be applied to nonzero mean data; the test can also be applied in the presence of trends (albeit only after detrending), which should make our procedure particularly suitable for macroeconomic aggregates. There has been much debate on the presence (or absence) of a linear trend in macroeconomic aggregates, and, in general, as to whether macroeconomic series are better characterized as having a linear trend or a unit root (the so-called "uncertain unit root"). Starting at least from the seminal article by Nelson and Plosser (1982), various contributions have questioned whether such series ought to be modeled as having a unit root or a linear trend. GDP is a prime example of this debate, and it has been the subject of several studies: we refer to the classical article by Rudebusch (1993), and also to Murray and Nelson (2000) and the extensive literature review therein. Similarly, some studies seem to suggest that trends may be present in the CPI (see Beechey and € Osterholm, 2008), and that money aggregates also may have trends (Brand et al., 2002). Whilst the empirical exercise in this article is not aimed at addressing the "uncertain unit root" debate in a comprehensive way, we have taken this literature into account by detrending all the series using the GLS detrending scheme proposed in Elliot et al. (1996).
In the computation of (2.21), we have used Qð0:05Þ, setting S ¼ 5000. We have used the same specifications as described in the previous section, namely: R ¼ T; p ¼ 5; wðTÞ ¼ ð ln TÞ 5=4 ; gðxÞ ¼ exp ð exp ðxÞ À 1Þ À 1; and we compute v p using demeaned data. In Table 12, we also report the estimated values of u and r 2 b computed using the WLS estimator studied in Horv ath and Trapani (2019). Based on (2.21), and on the fact that a ¼ 0:05 and S ¼ 5000, the decision rule is based on not rejecting H 0 whenever Q 0:05 ð Þ ! 0:9436, (4.5) rejecting otherwise. Results are reported in Table 12, where we have also reported, for illustration purposes, the outcomes of the unit root test by Elliot et al. (1996) and of the KPSS test (see Kwiatkowski et al., 1992), which we have carried out for those series which do not have a random autoregressive root. As a preliminary comment, based on the test for no randomness (H 0 : r 2 b ¼ 0) developed by Horv ath and Trapani (2019), two series (industrial production and unemployment) are found to We have also reported the WLS estimators of u and of r 2 b (û andr 2 b ), as studied in Horv ath and Trapani (2019); the symbol "( Ã )" next to the values ofr 2 b denotes rejection of the null of no coefficient randomness expressed as H 0 : r 2 b ¼ 0 (we refer to? for the theory of estimation and the test). As mentioned in the article, we have computed two popular unit root tests: the Elliot-Rothenberg-Stock test (reported in the column ERS) and the KPSS test. The former has been carried out under the hypothesis of a constant and a trend, using the Bartlett kernel to estimate the spectral density and choosing the related bandwidth via the criteria discussed in Andrews (1991). The same specifications were used for the KPSS test. In the last four columns, we have reported the values of QðaÞ (for a ¼ 0:05) for the data in levels and first differences, considering both testing for the null of stationarity and the null of nonstationarity. In all tests considered, the symbol "( Ã )" indicates rejection of the null hypothesis.
have a random autoregressive root, whereas the others do not. We have (very) heuristically checked these two series, by calculating the value of E ln ju þ b 0 j, usingû as face value and assuming that b 0 is Gaussian; in both cases, E ln ju þ b 0 j turns out to be very close to zero. As far as testing for stationarity is concerned, results are quite clear-cut: all series are found to be nonstationary. This is perfectly in line with the findings from applying the other unit root tests reported in the table-although of course these can only be applied to the series with a deterministic autoregressive root. Interestingly, all series become stationary after first differencing: this cannot be taken for granted, since some series have a random autoregressive coefficient and therefore there is no guarantee that first differencing may induce stationarity-see Leybourne et al. (1996). Exactly the same pattern of results is found when swapping the null and the alternative hypotheses.
Finally, we note that, as a robustness check, we have experimented with different specifications for our procedure-e.g. varying R from T=2 to 2T; increasing p to 4 ln ln T; and trying S ¼ 1000 and S ¼ 10000. In all these cases, we noted that results were entirely unchanged compared to the ones in Table 12, suggesting that the strong rule based on (2.21) is quite robust to different choices of user-defined parameters.

Conclusions
In this article, we have developed a test for the null of strict stationarity applied to a RCAR(1) model. Our testing approach can be applied to a wide variety of situations, without requiring any modification or any prior knowledge. Chiefly, the test is still usable if the autoregressive root is not random, i.e. in the case of an AR(1) specification. Also, the test does not require (neither as assumptions, nor for the purpose of the actual implementation) the existence of the variance of X t , or of virtually any moment. Finally, the test can be applied in the presence of deterministics: even in this case (with the exception of having trends), the implementation of the test does not require any prior analysis. To the best of our knowledge, no existing test has such a level of generality.
Technically, the test is based on the (almost sure) limiting behavior of a statistic which either diverges to positive infinity or drifts to zero without having any randomness, we propose to use it as part of a randomization procedure. Numerical evidence shows that the test performs very well, also showing very promising results in the cases where X t is borderline between being stationary and nonstationary-that is, in cases where E ln ju þ b 0 j is either positive or negative, but "small".
Other, more traditional approaches would of course be possible at least in principle 2 . Indeed, considering Y t defined in (2.4), one could define using the same arguments in this article, it could be shown that EðD 0 T ðaÞÞ > 0 if and only if X t is stationary. Thus, it would be possible to study the limiting distribution of T 1=2 ðD 0 T ðaÞ À EðD 0 T ðaÞÞÞ, and testing whether X t is stationary would be tantamount to constructing a confidence interval for EðD 0 T ðaÞÞ, verifying if this contains 0 (nonstationarity), or whether it is entirely positive (stationarity). Technically, this is perfectly feasible: the sequence X t can be shown to be strong mixing with geometric mixing rate under stationarity (see e.g. Carrasco and Chen, 2002); hence, Y t has the same mixing properties, and has finite moments up to any order; thus, a CLT for D 0 T ðaÞ can be shown, based e.g. on Ibragimov (1962). Alternatively, one could use the same 2 I wish to thank a Referee for suggesting the idea in this paragraph to me.
approach as in the proof of Lemma 7.3 in Horv ath and Trapani (2016). This approach is not free from tuning-e.g., one would have to estimate the relevant long-run variance, and also the choice of a would be bound to play a role, at least in finite samples; moreover, given that the alternative hypothesis is on the boundary, such an approach would only guarantee pointwise size control if implemented as described above. However, this is a very promising approach, which is under investigation by the author.

Technical lemmas
The first few lemmas are for the case E ln ju þ b 0 j < 0: The first lemma is an immediate consequence of Assumption 2, and we therefore report it without proof.
Lemma 1. Under Assumption 2, if E ln ju þ b 0 j < 0 with juj < 1, it holds that Ej X 0 j d > 0, for all d ! 0: Lemma 2. Under Assumption 1, if E ln ju þ b 0 j < 0 it holds that X T t¼1 jjX t j j À j X t j j j ¼ Oð1Þ, for any j > 0: Proof. Using Lipschitz and H€ older continuity, we have jjX t j j À j X t j j j C 0 j X t j jÀ1 þ jX t j jÀ1 À Á jX t À X t j C 0 jX t À X t j j according as j ! 1 j < 1 : ( Horv ath and Trapani (2019) show that jX t À X t j ¼ Oðe ÀC 0 t Þ for some C 0 > 0: Also, by Lemma 2 in Aue et al. (2006), there exists a d 0 > 0 such that Ej X t j d 0 < 1: Hence, by the Borel-Cantelli Lemma (see e.g. Chow and Teicher, 2012, Corollary 3 on p. 90), j X t j ¼ Oðjtj 1=d 0 ð ln tÞ ð1þÞ=d 0 Þ: The desired result now follows immediately by putting everything together.
We now distinguish the cases E X 2 0 ¼ 1 and E X 2 0 < 1: In the former case, the following lemmas are needed.
The next two lemmas contain some anti-concentration bounds for the case of nonstationary X t .
Lemma 7 covers the cases E ln ju þ b 0 j > 0, E ln ju þ b 0 j ¼ 0 with genuine random coefficient and E ln ju þ b 0 j ¼ 0 with no randomness and Eje 0 j 0 < 1, with 0 > 2: The next lemma is useful to study the case of a nonrandom unit root process with infinite variance.
Lemma 8. Under Assumptions 1 and 4(ii), it holds that P jX t j t a ð Þ C 0 t 1ÀacÀc þ C 1 t aÀ1=c , for some > 0: Proof. By Theorem 1 in Berkes et al. (1986), on a suitably larger space we can construct two independent sequences of i.i.d. random variables, say y i , i ! 1 f g and z i , i ! 1 f g such that: y i and z i are both symmetric, the common characteristic function of the y i is exp ðÀC 0 jxj c Þ with C 0 > 0, the z i 's have common symmetric distribution function F z ðxÞ satisfying 1 À F z ðxÞ ¼ 1ðxÞx Àc for x ! x 0 and X t i¼1 e i À y i À z i ð Þ ¼ O t 1=cÀ 0 ð Þ , (6.5) for some 0 > 0: Note also that, using Eq. (2.32) in Berkes and Dehling (1989) and Markov inequality, (6.5) entails the following estimate P X t i¼1 e i À w i ð Þ ! t a ! C 0 t 1ÀacÀc : (6.6) Let w i ¼ y i þ z i ; we can write Let now QðX; kÞ ¼ sup x Pðx X x þ kÞ denote the concentration function of a random variable X (see Petrov, 1995), and let Y ¼ P t i¼1 y i and Z ¼ and by the independence between the y i 's and the z i 's, Q Y þ Z; t a ð Þ min Q Y; t a ð Þ, Q Z; t a ð Þ È É Q Y; t a ð Þ: Now, given that the y i 's have a distribution belonging in the stable distribution family, we have P x j X t i¼1 y i j x þ t a ! ¼ P xt À1=c jy 1 j x þ t a ð Þt À1=c ; (6.7) further, since the characteristic function of the y i s is integrable, the density of the y i s is bounded with upper bound m y . Hence sup x P xt À1=c jy 1 j x þ t a ð Þt À1=c 1 2 m y t aÀ1=c , which concludes the proof.
Lemma 9. Let a T be a positive, real-valued sequence diverging to positive infinity as T ! 1. Under Assumptions 1, 3 and 4, it holds that EjX i j , for some 0 < C 0 < 1; is defined in Assumption 1, and, when Assumption 4(ii) holds, it is chosen so that < c: Proof. The lemma follows immediately upon noting that Pðv p ! a T Þ ¼ Pðv T Þ, and applying convexity (when 2 > 1) or the C r -inequality (otherwise) to v =2 p : Lemma 10. Under Assumptions 1 and 3-5, it holds that, in all cases considered for some > 0: Proof. Let k > 0, and note that On account of Lemmas 6 and 9, it is immediate to see that, in the worst case, Pðv p ! T k Þ C 0 T Àk=2 ð ln ln TÞ 2 ð ln TÞ d for some d > 0: Also P jX t j t a ð Þ : Using Lemmas 7 and 8, the desired result follows.
Finally, we need the following lemma.
Lemma 11. Consider a sequence U T for which EjU T j a T , where a T is a positive, monotonically nondecreasing sequence. Then there exists a C 0 < 1 such that lim sup T!1 jU T j a T ln T ð Þ 2þ C 0 a:s: Proof. By Eq. (2.3) in Serfling (1970), it holds that Emax 1 t T jU t j C 1 a T ln T: Therefore the first term converges to zero on account of the strong LLN (conditional on the sample), whereas the second one, E Ã Y s , drifts to zero in the ordinary limit sense by Theorem 2, again conditional on the sample. As far as ( where C 2 < 1: This follows directly when Assumption 7 (i)(a) holds; under Assumption 7(i)(b), it can be shown by elementary arguments that Ej P T t¼pþ1 jd Ã t j d j P T t¼pþ1 Ejd Ã t j 2 C 0 T 1À 1 , so that Lemma 11 ensures that ðT À pÞ À1 P T t¼pþ1 jd Ã t j d ¼ oð1Þ, whence (7.3). In conclusion, for any > 0 there is a random p 0 such that for all p ! p 0 t a:s:, which proves (7.2) even in the case of infinite variance. Thus, the same passages as above yield that, under H 0 , D Ã T ¼ C 0 þ oð1Þ for some 0 < C 0 < 1: Under H A , the proof that D Ã T ¼ OðT À Þ follows immediately if we show that Lemmas 7 and 8 hold. However, this can be easily verified by noting that PðjY t j t a Þ PðjX t j t a þ jd Ã t jÞ: