Testing for randomness in a random coeﬃcient autoregression model

: We propose a test to discern between an ordinary autoregressive model, and a random coeﬃcient one. To this end, we develop a full-ﬂedged estimation theory for the variances of the idiosyncratic innovation and of the random coeﬃcient, based on a two-stage WLS approach. Our results hold irrespective of whether the series is stationary or nonstationary, and, as an immediate result, they aﬀord the construction of a test for ”relevant” randomness. Further, building on these results, we develop a randomised test statistic for the null that the coeﬃcient is non-random, as opposed to the alternative of a standard RCA (1) model. Monte Carlo evidence shows that the test has the correct size and very good power for all cases considered.


Introduction
In this paper we study the Random Coefficient Autoregressive (RCA) model X t = (ϕ + b t ) X t−1 + e t , 1 ≤ t < ∞, where X 0 is an initial value. (1.1) Model (1.1) has been paid considerable attention by the literature, mainly due to its flexibility and analytical tractability. We refer to the monograph by Nicholls and Quinn (2012) for an excellent survey of early results, and also to the references in the contribution by Aue and Horváth (2011). Estimation of (1.1), and in particular of ϕ, has been extensively studied. Aue et al. (2006) and Berkes et al. (2009) use the quasi-maximum likelihood (QML) method to estimate the regression coefficient, showing that the asymptotic distribution of the estimated ϕ is normal irrespective of the actual value of ϕ, as long as E b 2 t > 0. Hill and Peng (2014) propose an empirical likelihood (EL) based estimator which is shown to be asymptotically normal even in the boundary case E b 2 t = 0. Finally, Schick (1996) and Koul and Schick (1996) discuss the weighted least squares (WLS) approach, showing that it works well in comparison to the maximum likelihood approach.
Fewer contributions are available on inference for E b 2 t = τ 2 and E e 2 t = σ 2 . An exception is the article by Aue and Horváth (2011), who develop a QML approach to estimate ϕ, τ 2 , σ 2 , showing that the estimators of ϕ, τ 2 are consistent and always asymptotically normal, irrespective of whether X t is stationary or not, as long as τ 2 > 0. Horváth and Trapani (2016) extend the theory to a panel data context. In particular, the results in Horváth and Trapani (2016) also show that, when X t is nonstationary, the WLS estimator has a slower rate of convergence than the ordinary least squares (OLS) estimator (see Aue et al., 2006, for a review of results). Given that standard normal inference is desirable, employing the WLS (or a QMLE) approach is well-justified when τ 2 > 0. However, when τ 2 = 0, asymptotic normality does not hold for the WLS or the QMLE estimators in the nonstationary case, thereby making such classes of estimators less desirable than the standard OLS estimator, for which the limiting distribution (not normal) is known although non standard -a notable exception is the contribution by Hill and Peng (2014), who propose a way of avoiding this shortcoming. Thus, it is important to have a test to understand whether (1.1) is genuinely a random coefficient model or not, which corresponds to the cases τ 2 > 0 and τ 2 = 0 respectively. Despite its importance, testing for the randomness of the autoregressive root in (1.1) has not been fully studied by the literature, in particular when X t is nonstationary. This is primarily due to the well-known "boundary problem" (see e.g. Davies, 1977) - Akharif and Hallin (2003) provide an insightful discussion of this issue in the context of the RCA set-up. Nicholls and Quinn (2012) consider a test for τ 2 = 0, based on the LM principle, but this is valid only when considering stationary data. More recently, in a seminal contribution on inference when a parameter lies on the natural boundary of the parameter space, Andrews (2001) studies the case of a random coefficient model under the assumption of stationarity. Similarly, Akharif and Hallin (2003) develop an efficient test for randomness in an RCA(p) model for p ≥ 1. In addition to requiring the estimation of the density of the innovation e t , however, their test also hinges on the assumption that the roots of the autoregressive polynomial (obtained by setting the random shocks b t equal to zero) are outside the unit circle, thereby restricting the presence of nonstationarity. Similar restrictions are also needed for other tests developed by the literature: examples include Ramanathan and Rajarshi (1994), Lee (1998), Nagakura (2009), and also Carrasco et al. (2014), albeit the last paper fits within a more general set-up. Parallel to these contributions, the econometric literature has also investigated the use of RCA models as a flexible alternative to a unit root specification; in particular, tests have been developed for H 0 : τ 2 = 0 when ϕ = 1. This set-up is arguably of great importance: when τ 2 = 0, equation (1.1) is the standard unit root process; conversely, the case in which ϕ = 1 and τ 2 > 0 -known in the literature as a Stochastic Unit Root (STUR) process -represents as a series which has periods of explosive and stationary dynamics (see the seminal contribution by Granger and Swanson, 1997). Within this framework, McCabe and Tremayne (1995) propose a test for H 0 : τ 2 = 0; however, this approach constrains (1.1) to be either a unit root or a STUR process -see also Leybourne et al. (1996) and Distaso (2008) for related contributions. Despite the significant application potential of STUR processes, this somewhat limits the generality of the approach: for example, the test by McCabe and Tremayne (1995) is inconsistent in presence of a STUR process with explosive behaviour (see Nagakura, 2009), which may be potentially useful when testing for bubbles (see Phillips et al., 2011;and Banerjee et al., 2013).
To the best of our knowledge, no test is available which can be applied even when X t is nonstationary, and indeed without any prior knowledge as to whether X t is stationary or not. Indeed, testing for genuine randomness in (1.1) with such a level of generality is not a trivial problem, on account of the several factors mentioned above. In this paper, we fill this gap by proposing a test for H 0 : τ 2 = 0. Thanks to the self-normalised nature of the WLS estimator, our test is robust to the stationarity or lack thereof of X t , and no prior knowledge of this is required; also, our test is not affected by the well-known inconsistency of the estimator of σ 2 when X t is nonstationary (see Aue and Horváth, 2011). In addition to this, motivated by a recent paper by Dette and Wied (2016), we also study a test for "relevant" randomness as an application of our results on the limiting distribution of the WLS estimator of τ 2 .

Hypotheses of interest and testing approach
Our main contribution is to develop a test for the following hypothesis testing framework Due to the difficulties in constructing a test statistic for H 0 : τ 2 = 0 that does not require prior knowledge of the dynamic properties (stationarity or not) of X t , we construct a test statistic which diverges to positive infinity under H 0 in all possible cases, whilst being bounded under the alternative H A . Thence, we exploit such divergence by proposing an approach based on randomising the test statistic. In particular, we follow Corradi and Swanson (2006): randomisation is employed in conjunction with sample conditioning, so that the asymptotics is derived conditional on the sample. We point out that this approach, and this mode of convergence, may be viewed as related to the bootstrap; indeed, in a cognate contribution on testing for a unit root, Chang (2012) relies on similar arguments, explicitly based on the bootstrap. As mentioned above, we also consider a test for where ∆ > 0 is a threshold below which the researcher may consider randomness in the autoregressive root to be negligible for practical purposes.
The paper is organised as follows. We state and discuss the main assumptions, and derive the WLS estimator in Section 2, where we also study the asymptotics of the estimator of τ 2 . Section 3 contains the construction of the test statistic and the relevant asymptotics. Evidence from synthetic data is reported in Section 4, and some illustrations using real data are in Section 4.2. Section 5 concludes. Technical lemmas, the proofs of the main results, further discussion and numerical evidence are reported in the online supplement.

Assumptions and estimation
Depending on the value taken by E ln |ϕ + b 0 |, three separate regimes can be identified for the causal solutions of (1.1).
(i) If E ln |ϕ + b 0 | < 0, then X t converges exponentially fast for all initial values X 0 to the strictly stationary solution of Nicholls and Quinn, 2012). In particular, the stationary solution forX t is given bȳ Note that, when τ 2 > 0, E ln |ϕ + b 0 | can be negative even when ϕ = 1: thus, the STUR process can converge to a strictly stationary solutions, although in such a caseX t has an infinite second moment (see Hwang and Basawa, 2005).
(ii) If E ln |ϕ + b 0 | > 0, then X t exhibits an explosive behaviour. This case has also been studied in depth in the literature: Berkes et al. (2009) show that |X t | → ∞ exponentially fast.
(iii) In the boundary case E ln |ϕ + b 0 | = 0, X t is nonstationary. This case has been paid comparatively less attention in the literature: Horváth and Trapani (2016) show that |X t | diverges in probability, but at a rate slower than exponential. Note that the classification above holds for the RCA(1) as well as the AR(1) models, i.e. for τ 2 > 0 and τ 2 = 0. However, the behaviour of the estimators for the parameters will be different in the two models. Also, it is important to note that the STUR model mentioned above does not naturally fall into any of these categories.
We now discuss the WLS estimators for σ 2 and τ 2 . Let u t = e t + b t X t−1 . We have Assuming that e t and b t are independent, and independent over t, we get Based on these considerations, the infeasible WLS estimators of σ 2 , τ 2 can be defined as (see also Janečková and Prášková, 2004) The feasible version of (2.2) can be obtained as follows. Let where ϕ is the WLS estimator of ϕ defined as Then, based on (2.2), the feasible estimators of τ 2 and σ 2 are We now introduce and discuss the main assumptions. The first assumption must be satisfied by X t irrespective of the regime it belongs to.
Assumption 1. It holds that: (i) {b t , −∞ < t < ∞} and {e t , −∞ < t < ∞} are independent sequences; (ii) {b t , −∞ < t < ∞} are independent and identically distributed random variables; (iii) {e t , −∞ < t < ∞} are independent and identically distributed random variables; (iv) Eb 0 = Ee 0 = 0; (v) E |b 0 | ν < ∞ and E |e 0 | ν < ∞ for some ν > 2 (vi) σ 2 > 0; and (vii) Assumption 1 is relatively standard, and it can be compared to the assumptions in Berkes et al. (2009), or Aue et al. (2006. A useful consequence of the assumption, which we will use extensively in the remainder of the paper, is that if τ 2 = 0, then b t = 0 must hold with probability 1. The assumption of serial independence can be relaxed to consider (weak) dependence. All our results would hold, unmodified, as long as the following results hold: the ergodic theorem; the strong approximation for the partial sums of b t and of e t ; the Central Limit Theorem. All of these have been derived for dependent data -see e.g. the papers by Wu (2005), Wu (2007) and Berkes et al. (2014). Thus, our results and our test can be extended to the case of serial dependence. Indeed, as we show later on, our test statistic requires only rates, thus avoiding having to estimate any long run variance.
Stationary units must also satisfy the following assumption.
We point out that, in part (i),X 0 is the stationary solution at t = 0, and not an initial value for X t . In essence, by Assumption 2, the processX t and the innovations e t are "proper" random variables, as opposed to constants. Assumption 2 is a technical requirement to avoid degeneracy of the estimators. In particular, it is needed in order for the denominators τ 2 2 and σ 2 2 to be nonzero with probability 1 (see Lemma A1).
When X t is an explosive process, we need the following assumption in addition to Assumption 1.
Assumption 3 is also used in Berkes et al. (2009), and, together with Assumption 1, it ensures that |X t | → ∞ at an exponential rate, and that, when suitably normed, X t converges to a nonzero limit (cf. Lemma A3). Consistency and limiting distribution of τ 2 This section contains the strong rates of convergence of τ 2 , which are needed in order to construct the randomised test in Section 3; as an ancillary result, we also report the limiting distribution of τ 2 . In order to derive the rates of convergence and the asymptotic distribution, we also need to strengthen Assumption 1.
Theorem 1. We assume that Assumptions 1-4 are satisfied. Then, for all > 0: (i) if τ 2 = 0, we have Theorem 1 provides strong rates for τ 2 and for both the cases of non random coefficients (τ 2 = 0) and random coefficients (τ 2 > 0). When τ 2 = 0, in particular, τ 2 is always consistent at a rate (at least) T −1/2 . We show in Theorem 2 that, at least in the case E ln |ϕ + b 0 | = 0, the almost sure rates are optimal up to the (ln T ) 3/2+ terms. The only case which is left out of the theorem is the case where E ln |ϕ + b 0 | = 0 and τ 2 > 0. This is due to the fact that, in this case, we are not able to compute the rate of divergence of the denominator τ 2 2 ; in turn, this is due to the fact that we are not able to find the rate at which |X t | diverges as t → ∞. This issue is highly non-trivial, and we also refer to the comments, albeit in a different set-up, by Francq and Zakoïan (2012); similarly, Berkes et al. (2009) derive a full-blown set of results for the divergence of X t when this is nonstationary, but no results are derived for the boundary case.
Although not directly needed for the construction of our test statistic, we now report the limiting distribution of τ 2 for both cases of τ 2 = 0 and τ 2 > 0 and under the restriction E ln |ϕ + b 0 | = 0. When τ 2 > 0, we show that asymptotic normality holds, similarly to the estimator for ϕ in Aue et al. (2006) and Berkes et al. (2009). When τ 2 = 0, we show that asymptotic normality holds for E ln |ϕ + b 0 | < 0, whereas it fails when E ln |ϕ + b 0 | > 0. We consider the following estimator of the asymptotic variance of τ 2 (which is constructed under the assumption of independence) having defined the short-hand notation and z t = u 2 t − σ 2 + τ 2 X 2 t−1 . The proof of Theorem 2 requires a mildly stronger version of Assumption 4.
Theorem 2. We assume that Assumptions 1-5 are satisfied, and consider the case E ln |ϕ (2.9) Theorem 2 states that τ 2 always follows a normal distribution when τ 2 > 0: this is due to the self-normalised nature of the WLS estimator. The same result holds even in the explosive case E ln |ϕ + b 0 | > 0. Note that the asymptotic variance of T 1/2 τ 2 − τ 2 is different for stationary or explosive regimes (cf. Lemmas A9 and A11), but V T provides the correct norming for both cases. Thus, irrespective of whether the data are stationary or explosive, standard normal inference can be applied whenever τ 2 > 0. On the other hand, standard normal inference does not hold when E ln |ϕ + b 0 | > 0 and τ 2 = 0. Technically, this is due to the fact that the variance of the leading term stays bounded as T → ∞, which is a degenerate case in which the central limit theorem cannot be shown -see e.g. Davidson (1993). Also, we do not have a distributional result for the case E ln |ϕ + b 0 | = 0. This is because, when τ 2 > 0, we are not able to derive the rate at which |X t | diverges as t → ∞, which, in turn, makes it impossible to show the consistency of σ 2 . As far as rates of convergence are concerned, we are not aware of an OLS based inferential theory for τ 2 ; however, on the grounds of available results on the OLS estimation of ϕ when |ϕ| > 1 (see Wang and Yu, 2015, and the references therein), it can be expected that OLS based inference would have faster rates of convergence when E ln |ϕ + b 0 | > 0, but standard normal inference would no longer hold. Finally, note that we are able to obtain consistent estimators for ϕ and τ 2 ; however, when X t is explosive, σ 2 is inconsistent (cf. Lemma A10).

Testing for
In this section, we propose a randomised test statistic for (1.2). Define for some user-defined κ ∈ 0, 1 2 -we defer comments on the choice of κ until the end of this subsection. Note that, in (3.1), the presence of V −1/2 T makes the test statistic scale invariant; using the absolute value prevents the argument of the exponential from diverging to negative infinity.
In the Supplement, we show that, under H 0 (and for all values of ϕ), it holds that therefore, we can assume that lim T →∞ ψ T = ∞ holds under the null, and that lim T →∞ ψ T = 1 holds under the alternative (except for the case E ln |ϕ + b 0 | = 0, which we comment on after Theorem 4). Given that ψ T → ∞ under the null, we cannot use ψ T directly and we instead propose a randomised version of it. We present the construction of the test statistic as a four step algorithm.
Step 1 Generate an artificial sample {ξ j , 1 ≤ j ≤ R} of independent and identically distributed (across j) random variables according to the distribution function G.
Step 2 Define Step 3 Compute Step 4 Define the test statistic where F is a distribution function.
We point out that there are various ways in which the randomisation algorithm can be carried out; the one proposed above has been employed in several contributions (we refer to Corradi and Swanson, 2006, where it is proposed). Consider the following regularity conditions: Assumption 6. It holds that (i) ∞ −∞ |u| 2 dF (u) < ∞; and (ii) G has a bounded density function with G (0) = 0 and G (0) = 1.
Let P * denote the conditional probability with respect of {e t , b t , −∞ < t < ∞}; we use the notation " D * →" and " P * →" to define, respectively, conditional convergence in distribution and in probability according to P * . Finally, we let χ 2 1 denote a chi-square with one degree of freedom. Theorem 3. We assume that Assumptions 1-6 are satisfied. If H 0 holds, then, as min(T, R) Theorem 3 states that, under the null, Θ T,R follows a chi-squared distribution with one degree of freedom; the result holds for all samples, except for a set of measure zero. Condition (3.2) poses a restriction on the relative rate of expansion between the actual sample size T and the artificial sample size R as they both pass to infinity, as is customary in this literature. In principle, (3.2) offers a selection rule for R, although it is arguably a very mild condition.
Theorem 4. We assume that Assumptions 1-6 are satisfied, and consider the case E ln |ϕ Theorem 4 states that, under the alternative, the test statistics diverges to positive infinity as fast as R, thus ensuring consistency. From a technical point of view, the difference with the previous theorem is that we are now ruling out the case E ln |ϕ + b 0 | = 0, for the same reasons as above (namely, the difficulty in determining at which rate |X t | diverges when E ln |ϕ + b 0 | = 0). It could be shown that a sufficient condition to have power, in this case, would be T −(1/2+ ) τ 2 2 → ∞ a.s. as T → ∞, but there is no way of formally verifying this.
After presenting Theorems 3 and 4, we are able to make some heuristic comments on the tuning parameter κ, defined in (3.1). According to the theory presented above, κ impacts on the well-known trade-off between size and power. Under H 0 , essentially, it is required that ψ T → ∞ as fast as possible as T → ∞. Indeed, (3.2) arises from the fact that the test statistic Θ T,R has a non-centrality parameter which vanishes as long as R 1/2 exp (−T κ ) → 0; upon setting R = T , such non-centrality will vanish more quickly, the larger κ is. On the other hand, under H A , it is required that ψ T converge to a finite constant (which is 1 by construction); again it would be desirable for this to happen as fast as possible as T → ∞. In (3.1), our results show that V −1/2 T τ 2 is driven by V −1/2 T τ 2 , and that, in the worst case scenario, V −1/2 T diverges to infinity as fast as T 1/2 . Since we require that, under H A , T κ V −1/2 T τ 2 → 0, this entails that it must be that κ < 1 2 : the smaller κ, the faster the drifting to zero. This explains why we require κ ∈ 0, 1 2 , and also in which way the choice of κ will affect the power and the size of the test; we note, however, that in practice the test is not sensitive to the value of κ even for small to moderate sample sizes T .
These comments should also shed some light on the impact of R: on the one hand, R should be as big as possible, in order to boost the power of the test; on the other hand, by (3.2), a large value of R will result in size distortion. Finally, a note on Assumption 1 and its relevance is in order. As mentioned above, the requirements in the assumptions could be relaxed, as long as "strong" results (such as the LLN and an almost sure version of the Invariance Principle) hold. Under serial dependence, V T is incorrect, and, as a consequence, the results in Theorem 2 do not hold any longer. However, the test is based on rates rather than limiting distributions, and therefore the results in Theorems 3 and 4 are robust to the presence of serial dependence.

Deciding between H 0 and H A
As is customary, our test is carried out to decide between the null hypothesis of non-random autoregression versus the alternative of a random autoregressive root. To this end, in view of the randomised nature of our statistic, some comments on Theorems 3 and 4 are in order. Under H A , Theorem 4 readily implies lim min(T,R)→∞ . This result holds for all values of E ln |ϕ + b 0 | = 0 and -similarly to a bootstrap-based procedure -conditional on the sample, or, equivalently, for almost all realizations of {e i , b i , −∞ < i < ∞}. In essence, the result entails that whenever a researcher will use Θ T,R , (s)he will reject the null, when false, with probability one.
Conversely, the implications of Theorem 3 are subtler. The theorem ensures that, under H 0 lim min(T,R)→∞ again conditional on the sample and for each value of E ln |ϕ + b 0 | (including E ln |ϕ + b 0 | = 0). However, our test is constructed using a randomisation which does not vanish asymptotically, as would be the case e.g. when using the bootstrap, and therefore the asymptotics of Θ T,R is driven by the added randomness. Thus, different researchers using the same data will obtain different values of Θ T,R and, consequently, different p-values; indeed, if an infinite number of researchers were to carry out the test, the p-values would follow a uniform distribution on [0, 1]. This is a well-known feature of randomised tests. Of course, not withstanding this potential shortcoming, the randomised test can be applied as is, accepting that different researchers may obtain different outcomes with the same data: randomised tests are well-known in the literature and, in some cases, may even have optimality properties (see e.g. Lehmann and Romano, 2006), although the arbitrariness described above is undesirable. However, it is possible to ensure that the decision between H 0 and H A is not subject to such arbitrariness. In particular, Geyer and Meeden (2005) propose the use of the so-called "fuzzy confidence intervals", which are also used in Song (2016) and can be computed as follows. Each researcher, instead of computing Θ T,R just once, will compute the test statistic S times, at each time s using an independent sequence ξ (s) j for 1 ≤ j ≤ R and 1 ≤ s ≤ S, thence (3.4) The function Q τ 2 ; α is called the "randomised confidence function" (see Song, 2016); an immediate consequence of Theorems 3 and 4 is (see also Corollary 3.1 in Song, 2016) lim min(T,R,S)→∞ P * {Q τ 2 ; α = 1 − α} = 1 for τ 2 = 0, lim min(T,R,S)→∞ P * {Q τ 2 ; α = 0} = 1 for τ 2 > 0, under (3.2). As S → ∞, there is no randomness in Q τ 2 ; α ; note also that no restrictions are required for the rate at which S diverges to infinity, so that S can be chosen arbitrarily large. From a computational point of view, it is possible to obtain Q τ 2 ; α by modifying (3.1) as on a grid of values of τ 2 (this could actually be done for a generic test for H 0 : τ 2 = τ 2 0 , for any value of τ 2 0 , although in this paper we focus on the case τ 2 0 = 0). There are at least three ways in which Q τ 2 ; α can be employed. Firstly, it can be reported as it is: the interpretation of (the graph of) Q τ 2 ; α is, in essence, the same as that of ordinary ("crisp") confidence intervals, namely it is a measure of the uncertainty about τ 2 (see Geyer and Meeden, 2005). 1 Another possible approach, proposed by Song (2016) is to use a (mildly conservative) rule based on the confidence set deciding in favour of H 0 if τ 2 0 ∈ C α,β , and against otherwise (in our case, of course, τ 2 0 = 0). As said, this non-randomised decision rule is conservative, but less and less so the smaller β; we refer to the discussion in Song (2016), where this idea is developed and illustrated. Finally we note that (3.5) could also lend itself directly to the construction of a decision rule. It could be shown based on standard arguments that, under H 0 Hence, a "strong rule" to decide in favour of H 0 is Decisions made on the grounds of (3.9) have vanishing probabilities of both Type I and Type II errors; of course, the same result would be obtained by directly thresholding ψ T , but in that case the choice of the threshold may be difficult to justify.
3.3. Testing for H 0 : τ 2 ≥ ∆ Building on Theorem 2, and inspired by an idea by Dette and Wied (2016), it is possible to propose a test for "relevant" randomness. As mentioned in the introduction, it is possible that a small amount of randomness in the autoregressive coefficient (measured as τ 2 ) may actually be negligible for practical purposes, with a standard and simpler AR (1) model working better than the RCA(1) model. This corresponds to testing for (1.3), viz.
where ∆ > 0 is user-defined and represents a threshold below which randomness, even if present, can be ignored. In light of Theorem 2, this corresponds to running a one-tailed t-test based on do not reject H 0 reject H 0 where c α is defined such that P [N (0, 1) < c α ] = α, for a pre-specified level α ∈ (0, 1).

It holds that
Theorem 5. Consider the case E ln |ϕ + b 0 | = 0, with τ 2 ≥ 0. Under the assumptions of Theorem 2, it holds that Theorem 5 is a direct application of Theorem 2, and thus it can only be applied to the case E ln |ϕ + b 0 | = 0. One main advantage of the Theorem is that it exploits the standard normal inference afforded by the WLS estimator, which holds true even for the case of an explosive process.

Numerical and empirical evidence
In this section, we illustrate the use of our test and its properties, both by simulation (Section 4.1) and through an empirical application (Section 4.2). In particular, in Section 4.1, consistently with similar contributions where randomised tests are developed, we consider the performance of tests based on the direct use of Θ T,R . In Section 4.2, in order to complement the results in the previous section, we illustrate through an application the use of (3.9).

Simulations
In this section, we present some evidence on the properties of τ 2 , and of inference based on it. In particular, we consider the bias and the Mean Squared Error (MSE) of τ 2 , and we also study the empirical rejection frequencies of tests for H 0 : τ 2 = 0 based on Θ T,R . Experiments are based on the RCA(1) model defined in (1.1). We have used ϕ ∈ {−1.05, −1, −0.75, −0.5, −0.25, 0, 0.25, 0.5, 0.75, 1, 1.05}. We have conducted two sets of experiments, which differ in the way the errors have been simulated: particularly, we have set e t ∼i.i.d. N (0, 1) and e t ∼i.i.d. 3 5 t 5 , where t 5 denotes a Student t distribution with 5 degrees of freedom -the factor 3 5 rescales the variance to unity. The latter choice should provide some evidence on the test performance with heavy tailed data (see e.g. Rachev, 2003 for examples of applications of heavy-tailed distributions to financial data). We have initialised the DGP by setting, in all cases considered, X 0 = 0; however, we have carried out some unreported experiments with other initialisations, which show that results are virtually unchanged when using different values of X 0 . In all experiments involving τ 2 > 0, we have simulated b t ∼i.i.d. N 0, τ 2 ; we have also carried out some trials with b t ∼i.i.d.
3 5 τ t 5 to evaluate if the power of our test changes in a significant way, but this does not appear to be the case. Different set-ups, and other specifications for the other parameters, which have been used for specific experiments, are described later on. We generate samples of size T , with a burn-in period of 1, 000 observations to avoid dependence on initial conditions, and use T ∈ {100, 200, 400, 800}. Results are based on 500 simulations; all routines have been written using Gauss 10.

WLS estimation of τ 2 and testing for relevant randomness
In the first set of experiments we let τ 2 j denote the estimate of τ 2 for iteration j of the simulation. To save space, results are in Tables D1 and D2 in the Supplement, where we report the following measures bias = 1 500 500 j=1 τ 2 j − τ 2 , (4.1) We set τ 2 ∈ {0, 0.125, 0.25, 0.5}: we have also experimented with other values of τ 2 , but results are, conceptually, the same. Similarly, we do not report bias and MSE for the case where e t ∼i.i.d.  Tables D1 and D2. The main findings are as follows. Based on the two metrics considered here (bias and MSE), the WLS estimator τ 2 seems to behave in the correct way in the majority of the cases considered, with both measures declining as T increases -with some exceptions, e.g. when ϕ is around zero and T = 200 and 400 (in such cases, the bias does not decline, although it does decidedly when T = 800). This is true also for the boundary case τ 2 = 0, where rates of convergence indicate that the estimator is consistent, but distributional results differ depending on whether X t is stationary or not; even in this case there are some exceptions, e.g. when ϕ = −0.75 and ϕ = −0.5, where the bias slightly increases when switching from T = 100 to T = 200, and seems to plateau between T = 200 and T = 400. Note how, in all cases considered, the bias declines when T = 800; the results suggest that bias and MSE are not affected by τ 2 . Finally, it is interesting to note that, for finite samples, the WLS estimator τ 2 seems to have a downward bias. We also report the empirical rejection frequencies of the test for H 0 : τ 2 ≥ ∆, in Tables D4 and D5 in the Supplement. We have used ∆ = 0.25; unreported experiments show that altering ∆ does not, essentially, change results. The performance of the test is so poor when T = 100 that we do not report this case. When the errors are normal, the test has the correct size for T ≥ 400, with a tendency to be oversized in some cases for smaller samples; the empirical rejection frequencies are all below 5% when τ 2 = 0.3, even for T = 200, as expected. As far as power is concerned, this is always good for T ≥ 800, and, as one may expect, it is better for smaller values of τ 2 . Conversely, when τ 2 approaches 0.25, the power worsens, again as expected; noticeably, the power is good for the cases where |ϕ| ≥ 1, but much worse whenever |ϕ| < 1. In the presence of errors with heavier tails (i.e., when e t ∼i.i.d. 3 5 t 5 ), the size of the test, when τ 2 = 0.25, has a much worse behaviour, showing a marked tendency to over-reject, which can be interpreted as a result of Assumption 5 failing to hold. Interestingly, the test has better power than in the presence of Gaussian errors, at least when τ 2 is much smaller than 0.25, but results become worse when τ 2 approaches the boundary; as noted before, when |ϕ| < 1 the power is usually far worse than when |ϕ| ≥ 1.
Empirical rejection frequencies for H 0 : τ 2 = 0 -size and power In order to validate the results in Theorems 3 and 4, and in line with other contributions in this literature, we report the empirical rejection frequencies when using Θ T,R -for the sake of reproducibility, we note that numbers have been generated with seed set equal to 5 13 . Results have been obtained by generating ξ j as i.i.d. N (0, 1), and drawing the values of u from a discrete uniform distribution with support {−1, 1}, merely for computational simplicity; we have also set κ = 0.1, and R = T . We have tried to modify these specifications, but results are only marginally affected; we therefore recommend these values in applications. Prior to discussing the results, we point out that, as mentioned at the end of Section 3, one may not choose to use Θ R,T directly, instead preferring decisions based on Q τ 2 ; α or even using (3.9). In particular, we have simulated the outcomes of using C α,β and (3.9) for some of the specifications used in the main Monte Carlo exercise -although we have not done it for all possible scenarios, due to the computational burden, we note in particular that (3.9) discerns between the null and the alternative with a success rate of 100%, even when T = 100. We refer to the empirical application in the next section for a full-blown implementation of C α,β and (3.9).
[Insert Table 1 somewhere here] Table 1 contains the empirical rejection frequencies of the test (carried out at a nominal level α = 0.05) under the two distributional set-ups. Since the number of replications is set equal to 500, the empirical rejection frequencies have a confidence interval [0.03, 0.07]. All empirical rejection frequencies fall within this interval, thus ensuring that the test has the correct size for all cases, and sample sizes, considered. Exceptions are the cases where ϕ = −1.05 and ϕ = −1 for T = 100, where the test is grossly oversized -at least when e t ∼i.i.d.
3 5 t 5 , whereas no particular problems arise when having Gaussian errors. This may however be viewed as a small sample problem, since when T = 200 the size aligns itself with its theoretical values for all cases considered. Interestingly, the distribution of the error term e t does not have virtually any impact on the test size (with the exceptions detailed above); we have carried out a few extra experiments, which are available upon request, with e t having a Student t distribution with a larger degree of freedom, and as can be expected results are very similar to those reported. The results in Table  1 are quite remarkable, since having an error term with a Student t distribution with 5 degrees of freedom is at the boundary of our assumptions, where the existence of at least the 4-th moment is required in order to have our convergence rates (indeed, in the case E ln |ϕ + b 0 | = 0, this is not enough -see Assumption 4(ii)). We conjecture that such robustness may arise from the fact that our test is based on rates, rather than limiting distributions, for which a slightly higher moment condition would be required.
We now turn to considering the power. In our experiments under the case E ln |ϕ + b 0 | = 0, we have considered τ 2 ∈ {0.25, 0.5, 1}, and the same values of T as above -we do not report the results when τ 2 = 1 and T ≥ 400 since the empirical rejection frequencies are all 100%, similarly to the case T = 200.
[Insert Tables 2 and 3 somewhere here] Based on Tables 2 and 3, the test seems to have good power: the empirical rejection frequencies are higher than 50% for all values of ϕ and for both distributional assumptions on e t -the only exception is the case τ 2 = 0.25 when T = 100, but even in this case the power picks up when T ≥ 200. As noted in the case of the test for (1.3), the power of the test is lower when |ϕ| < 1, compared to the cases |ϕ| ≥ 1.
Finally, we have run some simulations -again under the two distributional assumptions of Gaussian errors and errors with heavier tails -to investigate the power of the test when E ln |ϕ + b 0 | = 0.
[Insert Table 4 somewhere here] Results are reported only for T = 100, since whenever T ≥ 200 the power is always 100% with no exceptions: our test, even in the case E ln |ϕ + b 0 | = 0 which is not covered by the theory, has very good power for all cases considered.

Power versus local alternatives
All the results derived so far, and in particular Theorem 4, refer to a fixed parameter case, thereby leaving out the case of local-to-zero τ 2 (and, also, of local-to-unit ϕ). We discuss this case in the Supplement. In particular, we manage to show that, when ϕ is fixed and bounded away from unity, the test for H 0 : τ 2 = 0 based on Θ T,R has nontrivial power as long as where we have now made the dependence of τ 2 on T explicit. The Supplement (see Appendix C) also contains some (essentially negative) theoretical results on the "near-integrated" case ϕ T − 1, τ 2 T = O T −1 -although left out in (4.3), it can be argued that there is no power in such a case. The case of near integration is clearly of theoretical, but also of empirical interest, since it entails that X t is mildly explosive or stationary, but close to a unit root behaviour. In the context of a standard AR (1) model, the literature has extensively studied the case of the autoregressive root shrinking towards unity as the sample size passes to infinity, starting from the seminal contribution by Phillips (1988). Phillips and Magdalinos (2007), in particular, have studied the case where the autoregressive root may be close to unity from below (that is, a stationary near unit root), and also from above (that is, an explosive near unit root). In the RCA context, the STUR case has also received some attention, despite its technical difficulties, and we refer to the contributions by Berkes et al. (2005) (who study near-integrated GARCH sequences) and Aue (2008). In a recent advance, Lieberman and Phillips (2017a) bridge the two approaches (random vs deterministic autoregressions) by introducing a hybrid case where local-to-unit root behaviour may be due to the autoregressive coefficient shrinking to zero deterministically, and also because of a random shock with vanishing variance -see also Lieberman and Phillips (2014) and Lieberman and Phillips (2017b). Thus, we also report a comprehensive Monte Carlo investigation of the properties of our test in the local-to-STUR case. We use the same specification as above with ϕ = ϕ (T ) = 1 ± T −q and b t = τ T γ t , τ T = T −q/2 ; (4.4) in (4.4), γ t has zero mean, unit variance, and the same distributions (Gaussian and Student t) as considered above. By (4.4), we are considering both near-stationary and near-explosive cases (depending on whether ϕ T = 1 − T −q or ϕ T = 1 + T −q ); we use these expressions with some abuse of terminology, based only on the deterministic part of the autoregressive root, since the case ϕ (T ) = 1 + T −q may well correspond to a stationary case. Based on (4.4), we set q = q − κ, where κ = 0.1 as before, and q ∈ {1, 0.9, 0.75, 0.6, 0.5}. Thus, in this set-up, q takes into account two factors that impact on the power of our test: the proximity of ϕ T to unity (via q), and the fact that our procedure, by construction, loses some ability to discern local alternatives (due to κ).
[Insert Table 5 somewhere here] We know from the literature (see e.g. Phillips andMagdalinos, 2007 andAue, 2008) that the two cases of near-stationarity and near-explosive behaviour entail very different properties for the sequence X t . This can also be evaluated in the light of Lemma C2 in the Supplement, and especially equations (C14) and (C15). Based on the latter, in the near-explosive case there should be nontrivial power versus alternatives which are closer to STUR than in the near-stationary case. Indeed, the table shows that as long as q < 1 (and, therefore, as long as q < 0.9), the power is higher than 50% even for T = 100, with results improving as T increases. Conversely, in the near-stationary case, equation (C14) suggests that power can be attained for smaller values of q and, therefore, of q. Indeed, the test has no power for q > 0.75, suggesting that the test is unable to discern alternatives which are "too close" to a local STUR. The power picks up when q ≤ 0.75, although a large T is required to have power larger than 50%. These results reinforce the heuristic conclusion that having a local-to-STUR from above (near-explosive) or from below (near-stationary) are very different situations, which correspond to very different test performances. Note, finally, that the distribution of the error term e t does not seem to play any perceivable role in affecting the performance of the test. Finally, we have also investigated the bias and MSE of τ 2 ( Table D3 in the Supplement) and the power of the test for (1.3) under local-to-zero alternatives ( Table D6 in the Supplement) -mainly in order to assess the quality of τ 2 . Bias and MSE exhibit a similar behaviour to the ones in non-local cases (see Tables D1 and D2); note the negative bias. As far as testing for H 0 : τ 2 ≥ ∆ is concerned, in (3.11) we require ∆ > 0. Under ∆ = ∆ T → 0, we are not able to use Theorem 2 if E ln |ϕ + b 0 | ≥ 0. This is due to the fact that the WLS estimator yields standard normal inference even in the nonstationary case, but this fails to hold in the case of non-random coefficient (see equation (2.9)). In our experiments, the null hypotesis is represented by the local-to-explosive case, whereas the alternative is the local-to-stationary case. Under the null, the empirical rejection frequency has the desired behaviour, with the test having the correct size even when T = 100 and q = 1, and irrespective of the distribution of the error term e t : this reinforces the idea that WLS inference is reliable in the local-to-STUR case from above (i.e. near-explosive). Conversely, the test has good power versus the alternative hypothesis, but only when q ≤ 0.6: again, this confirms the intuition that the asymptotic distribution of τ 2 is a less good approximation in the near-stationary case; note the difference in power between the case of e t being Gaussian and Student t, also noted in the previous set of simulations.

Empirical illustration
Inspired by the empirical exercise in Hill and Peng (2014), we illustrate our testing procedure (and, in particular, the use of Q τ 2 ; α and C α,β ) by applying it to several U.S. macroeconomic and financial time series. We have considered the logs of: real GDP, M2 (as a measure of the aggregate money supply), CPI, S&P 500, and Industrial Production. We have also analysed the 3 month Treasury Bill rate, and the rate of unemployment. All data have been downloaded from the website of the St Louis Federal Reserve Bank; we refer to Table 6 for a description of the data, including whether they are seasonally adjusted or not, sample periods and sample sizes.
[Insert Table 6 somewhere here] The table also contains a test for ARCH effects, applied to the first differenced data; as can be seen, all series seem to have conditional heteroskedasticity. Note however that, although our theory is derived under the i.i.d. assumption, as mentioned above our results would hold also in the presence of serial dependence and heteroskedasticity, with no need to modify the test statistic. We base our analysis on C α,β and (3.9). We have used Q τ 2 ; 0.05 , with S = 5, 000. In the computation of C α,β we have set β = 0.005. Unreported trials show that reducing β is inconsequential on the results; thus, we recommend β = 0.005 as the value to be employed in applications. Also, we have used the same specifications as in the previous section -in particular, R = T and κ = 0.1. We note however that results are entirely unaffected by these choices. As can be seen from the table, the estimated values of τ 2 , τ 2 , can be negative. In these cases, from an empirical perspective, running the test is probably not even needed, since a negative value of τ 2 suggests very strongly that τ 2 = 0. Based on (3.9), and on the fact that α = 0.05 and S = 5, 000, the decision rule is based on not rejecting H 0 whenever Q (0; 0.05) ≥ 0.9436, (4.5) rejecting otherwise. Results are reported in Table 7.
[Insert Table 7 somewhere here] For all series, the estimate of ϕ is always very close to 1 (recall that ϕ is consistent in all cases considered), which indicates that series are likely to be in the case of near-stationarity, or near-explosiveness (or even in a pure deterministic or stochastic unit root case). Considering only the series for which τ 2 is non-negative, both the confidence sets C α,β and the strong rule (4.5) indicate that unemployment and Industrial Production do have a stochastic autoregressive root, which might suggest that these two series exhibit a STUR-type behaviour. These results should be taken with some caution, especially for the Industrial Production: the fact that ϕ < 1 but close to 1, and the very small value of τ 2 , may indicate that this is a localto-STUR case with a possibly near-stationary behaviour; we know from the previous section that in such a case the test may have low power. Conversely, the case of unemployment seems more clear-cut, with a smaller ϕ, and consequently a behaviour which seems to be further away from a STUR. The case of unemployment is also interesting in light of the value of Q (0; 0.05) and the very small (compared with τ 2 ) lower bound of the confidence set C α,β (bearing in mind the fuzziness of the endpoints of confidence intervals): both suggest rejection of the null of no randomness, but in a less decisive way than in the case of Industrial Production. By way of robustness check, in this case we have experimented different values of β, but the set C α,β stays always clear of zero.
On the other hand, in the cases of the CPI and the T-bill series, the null of no randomness cannot be rejected, indicating that these series have a (probably unit root) deterministic autoregressive behaviour. Heuristically, even these cases -in light of the small values of ϕ − 1 and τ 2 -may be considered local-to-STUR; however, the fact that ϕ > 1 seems to suggest that we are in a near-explosive case, where the test should have sufficient power.

Conclusions
Being able to discern an RCA(1) specification from a standard AR(1) one has important practical implications. On the one hand, the RCA(1) model offers more flexibility than its AR(1) counterpart, being able to capture possible nonlinearities in the dynamics of a series. On the other hand, the nonlinear nature of the RCA(1) framework makes inference more complicated: although the WLS estimator affords standard normal inference irrespective of the stationarity (or lack thereof) of the series, this is true only in the case of genuine randomness. Conversely, if no randomness is present, the OLS estimator is superior to the WLS one, due to its faster rate of convergence. This paper bridges the existing gaps in the literature, by proposing a test for the null of no randomness H 0 : τ 2 = 0. The test does not require any knowledge as to which regime (stationary, nonstationary, or on the boundary) X t belongs in, and it only requires minimal assumptions on moment existence. Given that our proposed test statistic diverges under the null, we regularise it by employing a randomised version of it, developed conditional of the data. We then employ a "de-randomised" indicator, namely the so-called fuzzy confidence interval, so as to ensure that different researchers using the same data will have the same outcome from the test. From a technical point of view, we develop an estimator of τ 2 which is related to the one studied in Aue and Horváth (2011); we complement the existing results by deriving explicit and near-optimal rates of convergence, thereby extending the existing literature; an immediate practical consequence of this is that we are able to develop a test for relevant randomness (Section 3.3). In our theory, we manage to derive the full-fledged asymptotics under the null; conversely, we are not able to derive results under the alternative for the boundary case E ln |ϕ + b 0 | = 0, or for cases which are local to it (see Appendix C in the Supplement). The latter case is under current investigation by the authors. From a methodological point of view, our approach to testing for a null hypothesis which is on the boundary of the parameter space can be viewed as a possible alternative to the one proposed by Andrews (2001), being based on rates rather than the limiting distribution.     Table 5. Power versus local-to-STUR alternatives -we refer to (4.4) for details, noting that q = q + κ. The cases termed "near-stationary" and "near-explosive" refer, respectively, to having set ϕ (T ) = 1 − T −q and ϕ (T ) = 1 + T −q .  Table 6. Data description of the series employed in the empirical exercise; the column headed SA refers to whether data are seasonally adjusted or not. In the last column, we carry out, for completeness, a test for the null of no ARCH effects (using an ARCH(7) specification in the auxiliary regression of the test) on the first differences of each series; the " * " denotes rejection of the null of no ARCH effect at 5% level.  Table 7. Outcomes of estimation and testing. We have reported: the WLS estimators of ϕ and of τ 2 ( ϕ and τ 2 ); the confidence set C α,β defined in (3.7); and the value taken by Q τ 2 ; α , computed for a level α = 0.05 and for τ 2 = 0 (i.e. under the null) -in this case, the threshold based on S = 5, 000 is 0.9436, and the symbol " * " denotes rejection of the null of coefficient randomness.