Testing for Common Trends in Nonstationary Large Datasets

We propose a testing based procedure to determine the number of common trends in a large non-stationary dataset. Our procedure is based on a factor representation, where we determine whether there ...


Preliminary lemmas
Henceforth, ν (p) (A) represent the eigenvalues, sorted in decreasing order, for a matrix A; we occasionally employ the notation ν (min) (A) to denote the smallest eigenvalue of A. Also, " D =" denotes equality in distribution. We also use the following matrix notation As far as the notation is concerned, Λ (1) is N × r 1 ; Λ (2) is N × r 2 ; and, finally, Λ (3) is N × r 3 .
We begin with the following lemma, which is useful to derive almost sure rates.
Let now γ (p) and ω (p) denote the p-th largest eigenvalues of ΛT −1 T t=1 E (∆f t ∆f t ) Λ and T −1 T t=1 E (∆u t ∆u t ) respectively. By Assumption 6, it can be easily verified using the arguments in the proof of Lemma 1 in Trapani (2018) that γ (p) = C p N for 1 ≤ p ≤ r; ω (1) ≤ C 1 ; and lim inf N →∞ ω (N ) > 0.
We will often need the following lemma, shown in Trapani (2018) (see Lemma A1), which we report here for convenience.
Let b be a nonzero vector of dimension r 1 + r 2 , such that b < ∞. We will prove that lim inf for every b, thus proving the lemma. Clearly by Assumption 2(iv). This entails that IV is dominated. Consider now II and III. By the Law of the Iterated Logarithm (henceforth, LIL), we have that there exists a random t 0 such that, for all t ≥ t 0 , there exists a positive finite constant C 0 such that W (t) 2 ≤ C 0 t 1/2 (ln ln t) 1/2 . Thus, using Assumption 2(iv) ln ln T T 2 t 1/2 (ln ln t) 1/2 = o a.s. (1) .

Finally it holds that lim inf
T →∞ ln ln T T 2 with B (t) a scalar, standard Wiener process and by applying equation (4.6) in Donsker and Varadhan (1977), and by the positive definitness of Σ ∆f * . Since this holds for all b, the Lemma follows.
We will now make extensive use of the notation f (2) t for short. It holds that where the first passage is the usual spectral norm inequality, and the last passage follows from applying (twice) the C r -inequality (Davidson, 1994, p. 140).

Let now u
(2) and note that i,k f k,t f (2) We have on account of Assumption 2(iv), it holds that the final result follows from Donsker and Varadhan (1977, Example 2)). Thus Finally, consider having used Assumption 3(ii), so that Putting all together, we have We have Note that

Similar passages yield
We now consider the next term in equation (A14). We have having used Assumption 3(ii). Thus, using Lemma A1 Using (A10) and putting all together, the desired result obtains.
for short. As before Consider the first term; by (A12), Similarly, considering the second term in (A16) we have having used equation (2.3) in Serfling (1970), Assumption 4(i) and Assumption 3(ii). From here henceforth, the proof is the same as for the first tem in (A16); also, the proof for the third term in (A16) is exactly the same, and it is therefore omitted. Putting everything together, the lemma follows.

A.2 Proofs of main results
Proof of Lemma 1. When d 1 = 0, the lemma follows immediately from B having full rank. When d 1 = 1, the proof follows the arguments in Maciejowska (2010). Let by Assumption 1(ii), C has full rank. It is therefore possible to re-write the expression above as where D 1 = [1, 0, ..., 0] is r ×1, and P and D 2 are r ×r and have full rank. Among the possible matrices that satisfy this representation one can consider ( The desired result follows immediately after computing and Lemma A6 immediately yields the desired result. Proof of Theorem 2. The proof is similar to that of related results in other papers -see e.g. Trapani (2018). We begin with (22). Note that, under H (p) 0,1 , (11) and Lemma A2 entail that P ω : lim for every ε > 0, and therefore we can henceforth assume that lim min(N,T )→∞ φ Let E * and V * denote, respectively, expectation and variance conditional on P * ; we have, for 1 ≤ j ≤ R 1 with d u = 1 for u ≥ 0 and −1 otherwise. Letting m G1 denote the upper bound for the density of G 1 , we have which drifts to zero under (21) by (A17) and Assumption 7. Also, consider by the independence of the ξ (p) 1,j . Elementary arguments yield with the last passage following from the CLT for Bernoulli random variables and continuity. This proves (22). We now turn to (23). By (12) We can write ζ having used again the independence of the ζ Proof of Lemma 2. Let Z be a N (0, 1) random variable. By (22), using Bernstein concentration inequality we have that which implies that P * Θ Thus, under the alternative there is zero probability of a Type II error. This proves the desired result.
Proof of Theorem 3. The proof is exactly the same as the proof of Theorem 2.
Proof of Lemma 3. The proof is exactly the same as the proof of Theorem 3 in Trapani (2018).

A.3 Discussion of the main assumptions
In this section, we shed further light on Assumptions 2 and 3. We begin by spelling out some easierto-verify sufficient conditions for the assumptions to hold (Section A.3.1). We then verify such sufficient conditions under various dependence assumptions which are typically employed in the literature (Section A.3.2). Finally, we present several examples of DGPs for which the assumptions are satisfied (Section A.3.3).
Recall that the vector of zero-mean, I (1) common factors f * t has dimension r 2 + d 2 ; henceforth, we use the short-hand notation d = r 2 + d 2 .

A.3.1 Sufficient conditions
We present a set of sufficient conditions which imply Assumption 3 and are easier to verify. In all our subsequent arguments, we will check under which assumptions our sufficient conditions hold, and prove the validity of Assumption 3 by showing them. Let and let |Z| p denote the L p -norm of an n-dimensional vector Z, viz.
A general result for mixingales Henceforth, for a generic process x t , we let F t+m We consider the following assumptions and g t are all L 4 -bounded; all sequences are zero mean, weak stationary, strong mixing with mixing numbers α m = O (ρ m ), where 0 < ρ < 1.
and g t are all L 4 -bounded, zero mean, weak stationary, uniformly mixing, with mixing numbers

Remarks
We note that the exponential rates of strong mixing in Assumption 1 could be replaced, as typical in this literature (see the book by Davidson, 1994), with higher order moment conditions. Assumption 3 is the same as in Corollary 2.2 in Corradi (1999), and it requires that the variance of the partial sums of the e i,t s diverge, even at a very slow rate.
Results under Near Epoch Dependence Consider the following -possibly vector valued -se- t=−∞ , which form four mutually independent groups. We assume that the DGP of e t , u i,t , f (3) t and g t are given by for all 1 ≤ i ≤ N , where the functions g j (·) are measurable for all j ∈ {e, u 1 , ..., u N , g, 3}.
Assumption 5. We assume that v j t ∞ t=−∞ , for j ∈ {e, u 1 , . .., u N , g, 3}, are L 4+ -bounded (for some > 0), zero mean, stationary, uniformly mixing of mixing size − 4 3 + for some > 0. Assumption 6. We assume that, for p = 4 Assumption 6 entails that all the sequences are L p -NED of size −1 on the relevant mixing basis on which they are defined.
Results for causal processes We consider the following DGPs where the shocks {ε t }, {v i,t } (for 1 ≤ i ≤ N ), {ε g t }, ε 3 t are mutually independent groups of i.i.d. variables and the functions f j (·) are all measurable.
We define the functional measures of dependence (Wu, 2005) as where ε 0 is a copy of ε 0 such that ε 0 D = ε 0 and ε 0 is independent of {ε t }, and v i,0 is defined similarly. We also define δ g t,2 and δ 3 t,2 in the same way.

Remarks
We refer to Section A.3.3 for several examples of DGPs that satisfy these assumptions.

A.3.3 Results for various DGPs
In this section, we build on the results in Section A.3.2 to study various DGPs for which Corollary 5 can be applied.

Transformations of linear processes
The set-up in Section A.3.2 lends itself (similarly to the NED set-up, see Chapter 17 in Davidson, 1994) to studying nonlinear tranformations of causal processes. Consider in particular where e t , u i,t , g t and f (3) t are the linear processes defined in (A33)-(A36) and the same for h u,i (·), h g (·) and h 3 (·).

Nonlinear autoregression models Consider the nonlinear autoregressions
with the DGPs for f (3) t and g t defined similarly. The functions f e (·) and f ui (·) are assumed to be contracting maps, i.e. f ui (x) − f ui (y) ≤ c 0 x − y with 0 ≤ c 0 < 1. Note that a possible example could be the Threshold AutoRegression model, defined as where max {|ρ 1 | , |ρ 2 |} < 1.

Random coefficient autoregressive models We consider
with the DGPs for f (3) t and g t defined similarly.
GARCH-type models These models lend themselves to being defined as causal processes similar to the ones discussed in Section A.3.2. Recall that a d-dimensional causal process is defined as where ε 0 is an independent copy of ε 0 . Recalling the functional dependence measure for X t , viz. δ X t,p = |X t − X t | p , the typical result when GARCH-type DGPs are considered is that δ X t,p declines exponentially with t, similarly to the RCA case considered in Section A.3.3, viz. δ X t,p = O (ρ t ) for some 0 < ρ < 1. Upon inspecting the proof of Corollary 5, this automatically yields all the results required for Lemma A7.
We report some examples of GARCH-type models which could be considered for the innovations e t , u i,t , f (3) t and g t . The key difference is between univariate and multivariate GARCH-type models, since in the latter case there are fewer specifications usually considered in the literature.
We now provide examples of multivariate GARCH models which have an exponential rate of decay for the functional dependence measure coefficients. While we refer only to e t in the following, the same results would apply to g t and f (3) t . We consider the following specifications (see also Aue, Hörmann, Horváth, and Reimherr (2009)), where denotes the Hadamard product 1. the CCC-GARCH (Bollerslev, 1990) where the vectors ω is coordinate-wise strictly positive and the vectors {α l } p l=1 and {β l } q l=1 are coordinate-wise nonnegative; 2. the CCC-GARCH variant of Jeantheau (1998) where {A l } p l=1 and {B l } q l=1 are nonnegative definite matrices; 3. the multivariate exponential GARCH of Kawakatsu (2006) where is a measurable function and C is a symmetric d × d matrix; we require that for some t > √ 8q.

A.3.4 Proofs
Proof of Lemma A7. Consider (A25), and note that, having used the Cauchy-Schwartz inequality in the fourth passage, (A23) in the fifth passage, and (A20) in the last one. Consider (A27), and note that henceforth, the proof is the same as for (A25), and it is therefore omitted. Similarly, the proof of (A26) follows exactly the same passages and is therefore omitted. Turning to (A28), we have having used the Cauchy-Schwartz inequality in the fifth passage and (A23) in the seventh passage. We now consider (A29)-(A31), and provide a full proof only for the first result (the other two follow from the same passages); note that, for all 1 We finally consider (A32) having used the Cauchy-Schwartz inequality in the fifth and sixth passages, and (A24) in the eight passage.
Proof of Corollary 1. It is immediate, by direct calculation, to verify (A20)-(A22). Similarly, (A23) also follows by direct calculation. Finally, note that from the Burkholder inequality. Using Holder's inequality recalling that e j is i.i.d. with finite fourth moment. Thus, Lemma A7 follows and therefore Assumption 3 is proven. As far as Assumption 2(iv) is concerned, it follows from Theorem 1 in Berkes and Philipp (1979).
Proof of Corollary 2. We begin by showing that (A20) holds; as before, the proofs of (A21) and (A22) follow from the same arguments and we therefore omit them to save space. Let c u i (L) = ∞ j=0 c u i,j L j ; using the Beveridge-Nelson decomposition, we can write Equation (28) in Phillips and Solo (1992) Indeed, using the same logic in the proof of Lemma 3.6 in Phillips and Solo (1992) having used (A38) repeatedly, and the fact that it implies Putting all together, the desired result follows. The proof of (A22) and (A21) follows from the same logic. We where λ i (A) denotes the i-th smallest eigenvalue of a matrix A (see Bushell and Trustrum, 1990), and recalling that (c e (1)) (c e (1)) and Σ ε are both positive definite. Further, by the proof of Theorem 3.2 in Phillips and Solo (1992), it follows that, under our assumptions, | ε 0 | 4 < ∞, whence (A24) follows from (A71). Similarly, note that and which yields the desired result. Finally, we prove that Assumption 2(iv) holds. Note that (A37) entails that Liu and Lin (2009) holds with e.g. p = 3. This entails that we can use Corollary 3.7 in Liu and Lin (2009), whence it follows that, on a richer probability space, there exists a standard, which ensures that Assumption 2(iv) holds.
Proof of Corollary 3. We show (A20); (A21) and (A22) follow from exactly the same logic. Recall first that u i,t is weakly stationary and L 4 -bounded. Then it holds that (see Davidson, 1994, p. 212) under Assumptions 1 and 2 respectively. This entails that the |E (u i,0 , u i,m )| are summable across m under both Assumptions 1 and 2. Since follows immediately. We now turn to showing (A23)-(A24). Note that Assumptions 1 and 2 entail (see Davidson, 1994, p. 248 according as Assumption 1 or 2 holds. Recall that e t is (weakly) stationary; also Thus, using (A74) and (A75), it is easy to see that Thus, by Theorem 1 in Peligrad, Utev, and Wu (2007) which yields (A23) and (A24) for p = 2 and p = 4 respectively. We now verify that Assumption 2(iv) holds. Write f m t = m+t j=m+1 e t . Equations (A74) and (A75) entail that the ξ e 2,m s are summable across m; hence, by the definition of mixingale for all m and t. Thus, equation (1.3) in Eberlein (1986) is satisfied; this, and Assumption 3 (see Corollary 2.2 in Corradi (1999)) entail that all the assumptions in Theorem 2 in Eberlein (1986) are satisfied, which in turn entails that there exists a κ > 0 such that, on a richer probability space, there exists a standard, which ensures that Assumption 2(iv) holds.
Proof of Corollary 4. We start by showing (A20) (note that the proofs of (A21) and (A22) are, as usual, the same). Since u i,t is an L 4 -NED sequence of size −1, Theorem 17.7 in Davidson (1994) stipulates that T m=1 |E (u i,0 u i,m )| < ∞, which immediately yields (A20). We now turn to showing that (A23)-(A24) and Assumption 2(iv) hold. By Theorem 17.5 in Davidson (1994), our assumptions imply that e t is an L 4 -mixingale of size −1. In turn, this immediately entails that (A23)-(A24) hold by Corollary 3. Further, from Corollary 3, it also holds that |E (f m t |F e,m )| 2 ≤ c 0 for all m and t. Also, by the measurability of g e (·), e t is a stationary sequence; thus, Corollary 2.2 in Corradi (1999) entails that all the assumptions in Theorem 2 in Eberlein (1986) hold, which yields Assumption 2(iv).
Proof of Corollary 5. We start with (A20). Note that, by the measurability of f ui , it follows that u i,t is a stationary sequence. Thus Note now that Similarly by stationarity. As shown in Wu (2005), |P 0 u i,l | 2 ≤ δ ui l,2 , so that ultimately Putting together (A76) and (A77), and recalling (A47), (A20) follows. The same applies to (A21) and (A22). We now turn to showing (A23) and (A24). Assumption 8 entails that, by Lemma A.2 in Liu and Lin (2009) for all p ≤ 4, thus providing the desired result. Note that the result only requires ∞ j=0 δ e j,p < ∞. Finally, consider Assumption 2(iv). All the assumptions of Theorem 2.2 in Liu and Lin (2009) are satisfied, and therefore it follows that, on a richer probability space, there exists a standard, real-valued, d-dimensional for any > 0, which ensures that Assumption 2(iv) holds.
Proof of Corollary 6. The proof hinges on the fact that e * t , u * i,t , g * t and f (3) * t can be represented as causal processes, i.e.
where v i,0 is an independent copy of v i,0 . Assumption (A38) therefore entails that T m=1 ∞ t=m δ ui * t,2 < ∞; thus, using the same passages as in the proof of Corollary 5, the desired result obtains. We now turn to (A23)-(A24). It holds that and by (A37) we have ∞ t=0 δ e * t,p < ∞ for all p ≤ 4; hence, Lemma A.2 in Liu and Lin (2009) Liu and Lin (2009) holds with e.g. p = 3. This yields (A73).
Proof of Corollary 7. Note that, under the assumptions on f e (·), it is possible to write all sequences as causal processes, viz.
for a measurable function g e (·) such that, for all p ≤ 4, δ e t,p = O (ρ t ), with 0 < ρ < 1, and the same holds for all the other sequences considered. The proof then follows the same arguments as the proof of Corollary 5.
Proof of Corollary 8. We begin with (A20), noting that, as usual, (A21) and (A22) follow exactly the same logic. Under the assumptions of the corollary, it is well known (see e.g. Aue, Horvath, and Steinebach, 2006) that (A52) converges exponentially fast, for all initial values u i,0 , to a unique stationary solution defined as We estimate Horváth and Trapani (2019) , and using the Cauchy-Schwartz inequality it also follows that IV = O (exp (−c 1 s − c 2 t)). Finally, standard algebra yields Thus, putting everything together We now turn to (A23)-(A24) and Assumption 2(iv). Solving (A51) recursively, we have defining now e t as in (A79), but with ε 0 replaced by an independent copy ε 0 , it follows that, for all p ≤ 4 where ρ = E (Φ e + b e 0 ) p < 1 by assumption. Henceforth, the proof follows exactly the same arguments as in the proof of Corollary 7.
Proof of Corollary 10. For all three models, it is possible (see e.g. Aue et al., 2009) to write y t = g (ε t , ε t−1 , . ..); thus, we show that the functional dependence measure δ t,4 = |y t − y t | 4 -where, as usual, y t = g (ε t , ..., ε 0 , . ..) with ε 0 an independent copy of ε 0 -is such that δ t,4 = O (ρ t ) for some 0 < ρ < 1. Hence, Corollary 5 affords the desired result. We begin by noting that our assumptions entail that, for all the three models (A63)-(A67), |e t | 4 < ∞ (see Aue et al., 2009 where this result is shown). We begin with (A63); the use of the Hadamard product entails that the functional dependence measure can be computed coordinate-wise, i.e. for each 1 ≤ j ≤ d. Standard arguments (see also Aue et al., 2006) yield the unique stationary solution having repeatedly used Minkowski's inequality. Recalling the definition of n, this ultimately entails that δ t,4 = O (ρ t ) for some 0 < ρ < 1, which proves the desired result.
The arguments for (A65) are very similar, except the model needs to be studied as a whole as opposed to component-wise. Indeed, we have the stationary solution Since γ J < 1, putting all together, we obtain that |e t − e t | 4 = O (ρ t ) for some 0 < ρ < 1. Finally, we consider (A67)-(A68); by Theorem 4.6 in Aue et al. (2009), (A69) and A < 1 entail that the unique, nonanticipative, stationary and ergodic solution of (A68) is Thus, defining the coupling H t in the usual way with ε 0 , we have Letting math (·) be the inverse of vech (·), we can write We know by assumption that |ε 0 | 4 < ∞. Lemma B.2 in Aue et al. (2009), Minkowski's inequality and (A70) entail Also, by the same arguments as in the proof of Theorem 4.7 in Aue et al. (2009), it holds that exp 1 Note that these calculations assume t "large enough"; otherwise we can set H t = 0. Putting all together, we obtain which entails that the functional dependence measure δ t,4 is, even in this case, O (ρ t ), 0 < ρ < 1.

A.4 Testing for trend stationarity versus unit root
When r 1 is found to be equal to 1, the question arises as to whether the corresponding factor is a trendstationary series, or an I (1) process with drift. Indeed, our methodology can only determine whether r 1 = 0 or 1, but it does not provide an answer to this question: the sum of the squares of the common factor, in both cases, diverge at a rate which is exactly O T 3 . If the common factor were observable, this issue could be tackled via a battery of "classical" tests (see e.g. Bierens, 1997). In our context, however, we can only estimate the space spanned by the common factors. This entails that the first common factor would be a weighted average of all the latent ones, thus containing by construction an I (1) component even though the unobservable first common factor is a genuinely trend stationary series.
In order to propose a solution to this issue, suppose that we have found that one factor has a trend (i.e. r 1 = 1), and define the vector (2) t and f (3) t are defined in Lemma 1 in the main paper. By construction, f t is of dimension k = 1 + r 2 + r 3 , where recall that r 2 is the number of I (1) common factors with zero mean, and r 3 is the number of I (0) common factors. We can write where the vector µ is nonzero in its first element and zero elsewhere, and φ t = f construction, φ t has r 3 I (0) components. There are now two alternatives 1. φ t has only r 3 I (0) components: this means that the first component of f t is trend stationary; 2. φ t has r 3 + 1 I (0) components: this means that the first component of f t is random walk with drift.
In order to find out which of the two alternatives we are in, if f t is observable we could determine the rank of the cointegrated system f t (say R) from the hypothesis testing problem which is customarily carried out using the Likelihood Ratio tests developed in Johansen (1991). In our case, f t is not observable; however, a possible approach would be to apply Johansen's procedure to f t .
Another possible approach which is more in line with this paper (and could be viewed as related to variance ratio tests, see e.g. Cai and Shintani, 2006) is based on the following arguments. Note first that a rotation of µ can be estimated using Lemma A8. We assume that Assumptions 2 and 4 hold, and that Assumptions A-D in Maciejowska (2010) are satisfied. Then it holds that for a nonsingular matrix H.
Let now ∆φ t = ∆f t − µ, and assume that ∆φ t admits an M A (∞) representation, viz. where , with finite fourth moments and covariance matrix Σ u .
By definition, the long-run variance matrix Σ = C (1) Σ u C (1) , has reduced rank. We can propose the following estimator for a rotation of Σ where The following result is very similar to Theorem 1 in Ipatova and Trapani (2013), and states the consistency of Σ.
Lemma A9. Under the assumptions of this paper and of Maciejowska (2010), it holds that We are now ready to discuss a methodology to test for Let λ r3+1 (H ΣH) denote the (r 3 + 1)-th largest eigenvalue of Σ. On account of H being full rank, we have Upon choosing B = min T 1/4 , N 1/4 , it holds that Henceforth, it is possible to construct a randomised test based on c N,T , along exactly the same lines as in the main paper.

A.4.1 Proofs
Proof of Lemma A8. It holds that where the last line follows from Proposition 3 in Maciejowska (2010). Putting all together, the desired result obtains.
Proof of Lemma A9. We start by showing that for all 0 ≤ k ≤ B. It holds that We have having used Proposition 3 in Maciejowska (2010). Also after Lemma A8. Hence, E I ≤ c 0 min T −1/2 , N −1/2 . The same result holds for E II . Finally We have Lemma A8 immediately entails that and, by construction and the same holds for E III d . Finally, it is easy to see that Putting all together, it follows that Now the final result obtains from the proof of Theorem 1 in Ipatova and Trapani (2013).

A.5 Weak factors: discussion
By Assumption 4, all the common factors are assumed to be strongly pervasive. This is a direct consequence of having Λ 2 = O (N ). It is however possible to imagine a situation in which some of the common factors are "weak", or "less pervasive": this can arise from e.g. having genuinely weak factors, or from having strong factors which impact only on a small number of units -see, for example, Onatski (2012) and the references therein.
In this section, we report some heuristic arguments (similar to Trapani, 2018), on the ability of our procedure to determine weak factors. For the sake of a concise discussion, but with no loss of generality, we consider the case where all r factors are zero-mean I(1), and Λ Λ is diagonal, with diagonal elements c p (N ) given by Allowing for κ p ∈ (0, 1) corresponds to the case of having weak factors, and the larger κ p the weaker the corresponding factor. Suppose that the researcher is using Σ 2 and its eigenvalues ν (p) 2 in order to determine r. Repeating exactly the same arguments in the proof of Theorem 1, it can be shown that Equation (A85) entails that, whenever p < p ≤ r, Recall that, our procedure, essentially, is based on testing whether, as min ( On the grounds of (35), the constraint in (A87) explains up to which extent weak factors can be detected. When β ≤ 1 2 , that is N √ T = O(1), then δ = 0, and we need κ p < 1. This entails that, when N is much smaller than T , our procedure is able to detect even very weak factors. Conversely, when β > 1 2 , that is (1), it is required that κ p < 1 − 1 2β : as β increases, i.e. N increases, the test is less and less able to detect weak factors. Note that when N and T have the same order of magnitude, and thus β = 1, weak factors can be detected as long as κ p < 1 2 -that is, when the eigenvalues associated with that factor diverge to infinity a bit faster than √ N .

B Additional numerical results
We report further Monte Carlo evidence under different specifications of the tests. In all cases, when estimating r 2 , we use r 1 ; results using r 1 are are similar and available upon request.
B.1 Bai Information Criterion for estimating r 2 when r 1 is known As a complement to Tables 2 and 4 in the main paper, we report there the result obtained with the Information Criteria by Bai (2004), denoted as IC -this corresponds to IC3 in the original paper; we note that the other criteria, known as IC 1 and IC 2 , deliver a similar (or worse) performance and are therefore not reported. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. Same configuration as tables in main text. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. Same DGP as for the tables in the main text.
B.2 Casesρ = 0 andρ = 0.8 We report results obtained when, in (32), ρ j ∼ U [ρ, 0.8], withρ = 0 andρ = 0.8. As an overall comment on the impact ofρ, results are in general unaffected but for two cases. The first case is when r 1 = 0 and r 2 = 1 and we compute r 1 ; in this case, we find that lower values ofρ improve the results. Conversely, higher values ofρ make the innovations of the zero-mean I(1) factors more persistent, thus making the associated eigenvalues larger: in this case, we are therefore more likely to falsely detect trends. The second case arises when r 2 = 2, and we compute r 2 . In this case, we find the exact opposite. This can be explained upon noting that, for lower values ofρ, the two I(1) factors become closer to two pure random walks which are highly collinear, thus making the second eigenvalue ν (2) 2 much smaller than the first one ν (1) 2 : thus, in this case, we are less likely to detect the second factor. For the same reason, higher values ofρ make the two factors less collinear, so that then the second factor is detected more easily.  In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2.

B.2.2 Caseρ
B.3 Sensitivity to F 1 (u), F 2 (u), G 1 (u) and G 2 (u) We consider different, alternative choices of F 1 (u), F 2 (u), G 1 (u) and G 2 (u). In particular, as far as F 1 (u) and F 2 (u) are concerned, we use u = (a) and u = − (a) with equal weights, using a = 2 and a = 25. In the latter case, the theory predicts that tests will have higher power (at the expense of size distortion), thus yielding understatements of r 1 and r 2 for small N and T . In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. B.4 Setting δ * = 10 −1 in equation (35) As discussed in the main paper, δ * should be very close to zero, and in the main paper we use δ * = 10 −5 .

B.3.3 G(·) Student-t with 4 degrees of freedom
The theory predicts that a large δ * will compress diverging eigenvalues, thus leading (for finite N ) to an understatement of r 1 and r 2 . As can be seen, this seems particularly evident for r 2 . In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. B.5 Sensitivity analysis to the choice of R 1 and R 2 B.5.1 R 1 = N and R 2 = N In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2.
B.5.2 R 1 = 3N and R 2 = 3N In each cell we report the average and standard deviation of r1 over all Monte Carlo replications, as well as the fraction of times in which r1 = r1. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2. In each cell we report the average and standard deviation of r2 over all Monte Carlo replications, as well as the fraction of times in which r2 = r2.