Testing for (In)Finite Moments

This paper proposes a test to verify whether the k-th moment of a random variable is nite. We use the fact that, under general assumptions, sample moments either converge to a nite number or diverge to innity according as the corresponding population moment is nite or not. Building on this, we propose a test for the null that the k-th moment does not exist. Since, by construction, our test statistic diverges under the null and converges under the alternative, we propose a randomised testing procedure to discern between the two cases. We study the application of the test to raw data, and to regression residuals. Monte Carlo evidence shows that the test has the correct size and good power; the results are further illustrated through an application to nancial data. JEL codes: C12. Keywords: nite moments, randomised tests, Chover-type Law of the Iterated Logarithm, Strong Law of Large Numbers. Cass Business School, Faculty of Finance, 106 Bunhill Row, London EC1Y 8TZ. Tel.: +44 (0) 207 0405260; email: L.Trapani@city.ac.uk.


Introduction
An assumption common to virtually all studies in statistics and econometrics is that the moments of a random variable are …nite up to a certain order. Existence of population moments is naturally required when computing sample moments. Moment restrictions are also routinely assumed in the various statements of the Law of Large Numbers (LLN) and of the Central Limit Theorem (CLT), thus playing a crucial role in estimation and testing -we refer to Davidson (2002), inter alia, for a comprehensive treatment of asymptotic theory. In addition to statistics and econometric theory, several applications in economics and …nance require the calculation (and, therefore, the …niteness) of moments. However, a well-known stylised fact, e.g. when using high frequency …nancial data, is that heavy tails are often encountered (see e.g. Phillips and Loretan, 1994, and a recent contribution by Linton and Xiao, 2013; see also the references therein). Hence the importance of verifying whether assumptions on the …niteness of moments are satis…ed.
In order to formally illustrate the problem, let X be a random variable with distribution F (x), and consider the functional k X (t) where a 2 ( t; t) is …nite and the two integrals exist for any a. Then the raw absolute moment of order k is de…ned as It is well known that, when the support of X is not bounded, the integral in (1) needs not be …nite, which entails that the k-th moment (and of course also moments of order higher than k) does not exist. Testing procedures to check for the existence of moments are available, although not always employed. A typical approach (see e.g., in the context of testing for covariance stationarity, Phillips and Loretan, 1991, 1994and 1995 is based on estimating the so-called "tail index". This usually requires some assumptions on F (x) -typically, it is assumed that the tails of F (x) can be approximated as L (x) x , where L (x) is a slowly varying function. The parameter is referred to as the "tail index", and it is related to the highest …nite moment of X -formally, this means that Hence, one could use an estimate of in order to test for the null hypothesis that > k, which is tantamount to testing for H 0 : E jXj k < 1. A routinely employed technique is the Hill estimator (Hill, 1975), or some variants thereof; we refer to Embrechts, Kluppelberg and Mikosch (1997) and de Haan and Ferreira (2006) for excellent reviews which also consider several improvements of the original Hill estimator. In general, however, estimation of is fraught with di¢ culties. Considering the Hill estimator as a leading example, it is well known that its rate of convergence may be relatively slow: indeed, this is a common feature to all tail index estimators. Moreover, the quality of the Hill estimator depends crucially on selecting the appropriate number of order statistics -see Section 3.2, for details, and in particular the discussion after equation (21). If this is not chosen correctly, the Hill estimator can yield very poor inference; Resnick (1997) provides an insightful discussion of the main pitfalls of the Hill estimator, and also several possible variants to overcome such pitfalls.

Hypotheses of interest and the main result of this paper
In this paper, we propose a test for the null that the k-th raw moment of X does not exist; formally, we develop a test for 8 > < > : We base our analysis on the divergent part of the Strong LLN (SLLN). De…ning the k-th sample moment, based on the sample fx i g n i=1 , aŝ Based on (5), we use^ k to test for H 0 : lim t!1 k X (t) = 1 in (3).
The literature has proposed several contributions that use (5), both for the purpose of estimating and for conducting hypothesis testing. As far as the former is concerned, Meerschaert and Sche-er (1998; see also the related contribution by McElroy and Politis, 2007, and the references therein) exploit the generalised version of the CLT to propose a moment-based estimator of . As far as the latter issue (hypothesis testing) is concerned, Fedotenkov (2013; see also the related papers by Fedotenkov, 2015a and 2015b) develops a bootstrap-based methodology whose main idea is closely related to the contribution of the present paper. In particular, Fedotenkov (2013) proposes comparing two statistics: the full-sample estimator of k , and a subsample based one. Under the null hypothesis that k is …nite, both statistics would converge to k by virtue of (5). Conversely, under the alternative that k is not …nite, the two statistics diverge at a di¤erent rate. Building on this, the test proposed by Fedotenkov (2013) is essentially based on comparing (by means of the bootstrap) the two statistics, checking whether their di¤erence is bounded or diverges.
In the context of this paper, (5) is employed in order to test for the null hypothesis that k does not exist. From a technical point of view, however, (5) is not used directly; rather, the main results in the paper hinge on a version of the Law of the Iterated Logarithm (LIL) for random variables that do not admit a …nite …rst absolute moment, known in the literature as the "Chover-type LIL" (Chover, 1966). Thus, an ancillary contribution of this paper is the development of a Chover-type LIL for dependent data. From a methodological point of view, the results in this paper share, with the works cited above, the (desirable) feature of not having to determine an optimal number of order statistics to carry out inference, which is one of the main problems of the Hill estimator. However, note that, under the null hypothesis of an in…nite k-order moment, there is no random-ness in (5): the statistic^ k does not converge to any distribution (it diverges to positive in…nity), and it cannot be used directly in order to conduct the test. Consequently, we employ a randomised testing procedure, which builds on a contribution by Pearson (1950).
From a conceptual point of view, such approach is based on the idea that, when a statistic does not have randomness under the null (e.g. because it diverges) or when it has a non standard limiting distribution, randomness can be added by the researcher. Corradi and Swanson (2006) and Bandi and Corradi (2014) have recently employed randomised testing procedures. In particular, Bandi and Corradi (2014) propose a test to evaluate rates of divergence, which, albeit in a very di¤erent context, is essentially the same problem investigated in this paper. As far as conducting inference is concerned, we follow the approach used in Corradi and Swanson (2006), where randomisation is employed in conjunction with sample conditioning. This entails adding randomness to the basic statistic, and then deriving the asymptotics conditional on the sample, showing that limiting distribution and consistency hold for all samples save for a set of zero measure. Such approach is somehow akin to bootstrap based inference, which is also carried out conditional on the samplealthough using bootstrap in this context would be problematic, e.g. due to the di¢ culties in extending the theory to the case of data with in…nite …rst moment (see Cornea-Madeira and Davidson, 2014). A key di¤erence with bootstrap-based inference is the interpretation that the notion of test size has in this context. Indeed, it is well known that, in a classical hypothesis testing context, the level of a test means that, if a researcher applies the test B times and the null is valid, then (s)he will reject the null with frequency -that is, (s)he will be wrong B times. Conversely, as illustrated by Corradi and Swanson (2006), in this context is interpreted thus: out of J researchers who apply the test, J of them will reject the null when this is true. Despite such interpretational di¤erence, as we show in Section 2, using this approach we overcome the issue of^ k diverging under the null, and we obtain a test statistic which, for a given level , rejects the null with probability when true, and with probability 1 when false.
The remainder of the paper is organised as follows. In Section 2, we discuss the test, its theoretical properties (null distribution and consistency), and possible extensions to regression residuals (Section 2.1). Section 3 contains, in addition to a set of guidelines on how to use the test (Section 3.1), a Monte Carlo exercise (Section 3.2), and an application (Section 3.3). Section 4 concludes. Proofs are in Appendix.
NOTATION We denote the ordinary limits as "!"; convergence in distribution as " d !"; convergence in probability and almost surely as " p !" and " a:s: ! " respectively. We use "a.s." as short-hand for "almost surely", "i.o." for "in…nitely often", and " " for de…nitional equality. Finite constants that do not depend on the sample size are denoted as M , M 0 , ..., etc. Other relevant notation is introduced in the remainder of the paper.

The test
This section contains a description of how the test statistic is constructed, and its theoretical properties (reported in Theorems 1 and 2). In Section 2.1, we study the application of the test to regression residuals.
We start by reporting the testing procedure as a four step algorithm.
Step 2 Randomly generate an i.i.d. N (0; 1) sample of size r, say j r j=1 , and de…ne the sample Step 3 Generate the sequence j;n (u) r j=1 as for all j, where u 6 = 0 is any real number and I [ ] is the indicator function. The values of u can be selected from a density ' (u) on a bounded support U = [u; u].
Step 4 For each u 2 U n f0g, de…ne and the test statistic The following remarks contain comments on the speci…cations of the test, and a heuristic preview of how the test works; the choice of the arti…cial sample size, r, de…ned in Step 2, is discussed after Theorems 1 and 2, and in Section 3.1.
Note also that the statistic^ k is bound to be sensitive to the unit of measurement; thus, a scale invariant transformation thereof should be employed instead -for the sake of a concise discussion, we assume henceforth that^ k is scale-free; in Section 3.1, we explore ways in which scale invariance can be obtained.

A heuristic description of the main idea of
Step 3 is the following. Under the null hypothesis H 0 that E jXj k does not exist, as n ! 1, p e^ k j should follow a normal distribution with mean zero and in…nite variance. This entails that, under H 0 with n ! 1, for any real number u, the random variable j;n (u) has a Bernoulli distribution with P j;n (u) = 1 = 1 2 . Therefore, under H 0 as n ! 1, j;n (u) has mean 1 2 and variance 1 4 . Conversely, under the alternative that E jXj k < 1, e^ k converges to a …nite value. Hence, p e^ k j should follow a normal distribution with mean zero and …nite variance, so that, for any u 6 = 0, E j;n (u) 6 = 1 2 .

2.2
Step 4 follows directly from Step 3, and it is an application of the CLT. It can be expected that, under the null with n ! 1 and r ! 1, # nr (u) d ! N (0; 1) for every choice of u. Conversely, under the alternative that E jXj k < 1, the j;n (u)s do not have mean 1 2 and therefore a CLT does not apply to (7). Given that in (7) there is a sum involving a sequence with non-zero mean, it can be expected that # nr (u) diverges at a rate p r. This ensures the consistency of the test under H A .

In
Step 3, we consider the possibility that several values of u could be tried. The de…nition of nr in Step 4 is based on combining a continuous set of values of # nr (u), attaching a di¤erent weight to each u according to some density ' (u). Monte Carlo evidence (Section 3.2) shows that choosing U = f 1; 1g with equal probability works well under any scenario. From a theoretical point of view, it can be expected that the width of U is positively related to both power and size: as it increases, the power versus the alternative that E jXj k < 1 will increase, but there will also be some size distortion under the null.
We now lay out the main assumptions on dependence and tail behaviour. Prior to that, recall the de…nition of uniform mixing (see e.g. Davidson, 2002, p. 209). Let ( ; F; P ) denote the probability space on which fx i g n i=1 is de…ned, and let F j+k j (X i ; j i j + k) and the sets A 2 F k 1 and B 2 F 1 k+m ; …nally, de…ne Then, fx i g n i=1 is said to be uniformly mixing if m = 0 as m ! 1.
Consider the following assumption.
as x ! 1, with: L (x) 0 slowly varying at in…nity in the Karamata sense, c i (x) 0 Part (i) of the assumption imposes some structure on the dependence of the datahaving > 0 is a very mild requirement on the amount of memory allowed in the data. As mentioned in the Introduction, the main tool employed in the proofs -which is also one of the contributions of this paper -is a Chover-type LIL for uniformly mixing sequences. In addition to considering dependence (with a quite ‡exible amount of memory, since only > 0 is required) uniform mixing a¤ords great analytical tractability. Other forms of dependence could also be considered, as long as Chover's LIL holds -see e.g. the results in Trapani (2014) for the case of strongly mixing data.
Part (ii) of the assumption contains the null hypothesis, represented by having k.
The exact speci…cation of the slowly varying function L (x) is not required, and thus, basically, part (ii) is needed only to ensure that some moments of X exist.
De…ning P as the probability law of j;n (u) r j=1 conditional on the sample, let " d !" denote convergence in distribution according to P .
The limiting distribution of nr under the null is given in the following Theorem.
Theorem 1 stipulates that nr has, under the null, a chi-squared distribution with one degree of freedom, as can be expected from the discussion above. Convergence to the null limiting distribution requires both n ! 1 and r ! 1. As far as the latter is concerned, r needs to pass to in…nity subject to (11), in order to ensure that a CLT holds. Since the test statistic is based on an i.i.d. sequence of uniformly distributed random variables, it can be expected that convergence should be quite fast, and therefore in practice r is not needed to be too large.
We now consider the consistency of the test versus the alternative that the k-th moment exists.
Theorem 2 Let Assumption 1 hold with = k + " for some " > 0 in part (ii). De…ne c Theorem 2 states that tests based on nr have non trivial power versus the alternative that E jXj k exists. In the proof, we show that, under H A , # nr (u) has a non-centrality parameter proportional to q r exp( k ) , whence (12).

Application to regression residuals
We now turn to discussing the application of the test to regression residuals. Indeed, this is a natural application, since the existence of moments up to a certain order is routinely assumed for the error term in the regression model for example, a typical assumption in (13) is that the i s have …nite second moment, i.e.
E j i j 2 < 1. In order to verify the validity of this (and similar) assumptions, we show under which conditions the test developed above can be applied to the OLS residuals computed from (13).
Henceforth, we denote the distribution of i as F (x), and de…ne the OLS residuals^ i .
Also, let k 1 and k (t) R a t jxj k dF (x) + R t a jxj k dF (x) with both integrals existing for any …nite a 2 ( t; t); we propose to test for 8 > < > : by using^ or a scale invariant transformation. Using p e^ k , we de…ne the corresponding test statistics as nr .
Consider the following extension of Assumption 1.
are two mutually independent groups; (iv) x i 6 = 0 a.s. for 1 i n.
Corollary 1 shows that the test can be extended to regression residuals, with the same null distribution and consistency properties as above. The only proviso is that the regressors have enough moments: in essence, the corollary requires that E jx i j k < 1. As shown in the proof, this is needed in order for the test to have power: the regression residuals^ i contain x i , and therefore even when E j i j k < 1, if E jx i j k = 1 this may entail that^ k diverges, thereby yielding zero power. From a methodological point of view, therefore, the test ought to be applied to residuals after checking whether E jx i j k = 1 or not.
Note that the test is designed for k 1 only. When k < 1, the test may still work, but this depends on the maximum moments of both x i and i ; the reason why the test may fail is that, under the alternative, the OLS estimator of in (13) may be inconsistent, and even diverge to positive in…nity -this is e.g. the case if E jx i j 2+" < 1 for some " > 0; see also Cline (1989).

Applying the test: guidelines, simulations and empirical applications
In this section, we study three inter-related issues concerning how to apply the test. Firstly, we discuss how to make the test statistic scale invariant -Section 3.1. Secondly, we evaluate the properties of our test through a simulation exercise, also proposing some guidelines as to how to choose some of the test speci…cations -Section 3.2. Finally, we illustrate the implementation of the test and the interpretation of its results by means of an application to …nancial data -Section 3.3.

Scale invariance
As pointed out in Section 2, the raw absolute moment of order k,^ k , cannot be employed directly as it is not scale invariant. Hence, the need for a re-scaling of the statistic^ k : although several alternatives can be proposed, as a general rule it is necessary to have a scaling factor that does not diverge (or diverges at a slower rate than^ k ) under the null, and is bounded under the alternative.
Several choices are possible; in particular, based on Lemma 1 in Appendix, we propose the following family of scaling factorŝ where is chosen such that 2 (0; k); thence, the re-scaled test statistic iŝ The test could then be carried out exactly as in Section 2, using^ k instead of^ k . Although in principle any choice of 2 (0; k) can be employed, "natural"choices are = 1 or = 2 (i.e., using the variance of the data) -in the simulations, we employ = 2, which proves to be a good choice in terms of empirical rejection frequencies; only minor changes are noted, anyway, when setting = 1. Whilst^ k ful…lls the purpose of making^ k scale invariant, in the simulations we also found that, at least in …nite samples, better results are achieved by further rescaling^ k by the value it would have if X were normally distributed. Letting (N ) k be the k-th absolute moment of a standard normal, the test statistic is thus based The rationale for employing^ can be illustrated as follows.
and the same holds for^ k . Under the alternative,^ is bounded by the SLLN, so that k is also bounded.
Finally, we return to the issue of selecting the size of the arti…cial sample, r. Equations (11) and (18) can be combined in order to choose r; after standard algebra, it can be veri…ed that it must hold When k > , any choice such that r is a polynomial function of n will be appropriate.
The case k = is of special interest: in such case, (19) boils down to requiring r n ! 0. Thus, choosing r = o (n) always satis…es (19).
The innovation i is generated according to two schemes. In the …rst set of experiments, we generate data according to a Student t distribution with degrees of freedom (t ); such distribution is often found to be good at capturing the features of …nancial data (see e.g. Hurst and Platen, 1997;andUsmen, 1996a, 1996b), and therefore the results from this experiment should provide a set of guidelines for the applied user; indeed, we analyse the impact, on the test, of several speci…cations under this distributional design.
We also consider a second set of experiments, where i is generated as having a power law, as a robustness check to assess how the test responds to a di¤erent distribution; data are generated according to standard procedures, and we refer to e.g. Clauset, Shalizi and Newman (2009).
As far as the testing problem in concerned, without loss of generality, we consider testing for the existence of the fourth moment, viz.

> <
> : Let denote the degree of freedom of the Student t distribution data are drawn from in the …rst set of experiments; and the tail index of the power law from which data are generated in the second set of experiments. We set 2 f2; 3; 4; 5; 6g: the …rst three values are used to assess the size of the test, and the last two values are used in order to assess the power.
Based on Corollary 1, we apply the test to pre-whitened data, by …tting an AR (7) model to y i and then applying the test to the residuals; unreported simulations show that, when applying the test directly to the raw data, this results in a massive oversizement when there is dependence -hence, a guideline is that the test ought to be applied to prewhitened data. In order to make the test statistic scale invariant, we use, as suggested in As far as the test speci…cations are concerned, on account of Theorems 1 and 2, it can be expected that the empirical rejection frequencies will be a¤ected by the size of the arti…cial sample, r, and by the values of u employed (that is, by the set U ). As regards the former, in the proof of Theorem 2 we show that, under the alternative, the test statistic has a noncentrality parameter proportional to q r exp( k ) : thus, a large r is bound to increase the power of the test. This is in a trade-o¤ with (11), which indicates that a large value of r will yield size distortion; of course, this is valid for …nite samples, since asymptotically the test will have the correct size as long as (11) is satis…ed. Similarly, the width of U also has an impact on power and size. Indeed, as shown in the proof of Theorem 2, under the alternative the test has a noncentrality parameter which increases as the width of U increases: hence, using large values of u should boost the power, at the expense of size.
In order to analyse the impact of r and U on the size and power of the test, we run four di¤erent experiments for the leading case of Student t data. In the …rst, benchmark case, we set r = n 4 5 and U = f 1; 1g, with each value drawn with equal probability of 1 2 ; this proved to be the best choice for all cases considered. We also consider the cases r = n 1 2 and U = f 1; 1g, and r = n 4 5 and U = f 2; 2g. In the former case, the test is expected to be less powerful, whereas in the latter higher power should be observed, in presence of size distortion, at least for small samples. Finally, we also report the empirical rejection frequencies for the intermediate case r = n 1 2 and U = f 2; 2g. Results should look similar as n ! 1, and in this respect having n = 100000 in the simulations should shed some light on the asymptotic performance of the test.
By way of comparison, we also report a set of experiments to determine the size and power of a test for H 0 : 4 based on a direct estimate of the tail index . Based on Hill where^ (s) denotes the s-th order statistic (in descending order) of the sample of the absolute values of the residuals from …tting an AR (7) model to the y i s, i.e. from fj^ i jg n i=1 .
The estimator is applied to the residuals of the AR (7) model, rather than directly to the data: according to Embrechts, Kluppelberg and Mikosch (1997; see in particular Figure   5.5.4 on p. 270), this is an e¤ective way of attenuating the impact of serial dependence in the data. As far as the threshold h is concerned, we use h = n 3=4 = ln n . We found this to be the best choice for h; note that, in the second set of experiments, data are generated according to a strict Pareto model for the tails, and therefore in that case we can expect the Hill estimator to be unbiased. Alternatively, one could employ data-driven rules to select the optimal h -we refer to e.g. Drees and Kaufmann (1998) We point out that the test based on^ Hill is not meant to be the only possible alternative to our approach. Indeed, the performance of the Hill estimator can be quite poor, and various improvements have been suggested -see de Haan and Ferreira (2006). Rather, we would suggest to interpret the test based on^ Hill as a naive benchmark. It is however worth noting that the applied literature customarily uses this approach to verify whether the data have …nite moments of a certain order or not (e.g. Phillips and Loretan, 1991, 1994and 1995.
Tables 1a-1c contains empirical rejection frequencies for the cases of data having a Student t and a power law distribution, respectively. The number of simulations has been set equal to 1000, so that, when evaluating the size of the test, the empirical rejection frequencies should lie in the interval [0:036; 0:064].
[Insert Tables 1a-1c somewhere here] The e¤ect of pre-whitening (in both cases: Student t and, as shown in Table 1c below, power law) is that the test is nearly una¤ected by the presence of serial correlation: size and power almost do not change across the various combinations of ( ; ).
As far as size is concerned, consider …rst the benchmark case where r = n 4 5 and U = f 1; 1g -Table 1a. The test has the correct size in both non-boundary cases = 2 and = 3. As could be expected, the test exhibits higher empirical rejection frequencies when = 4; this, however, attenuates when n 500, with the test having the correct size again in all cases considered. Turning to the power (cases = 5 and = 6), this is higher than 50% for all cases considered when n 500, and anyway higher than the 50% threshold for n 250 in the non-boundary case = 6: the power increases monotonically with n and , both features being in line with what can be expected. As mentioned above, these speci…cations (for r and U ) correspond to the best results under all scenarios considered: thus, a guideline from Table 1a and, in general, from this section is to choose r quite close to n, and use U = f 1; 1g.
The other cases, displayed in Table 1b,  and U 2 f 1; 1g, although the power is slightly lower and the size, when = 4, is never correct unless n 10000.
Turning to the case of data following a power law (Table 1c), there are few instances of oversizement when n is small (n = 100) and is equal to 3, but such tendency is relatively infrequent and it disappears for larger sample sizes. When = 4, the empirical rejection frequencies are higher than in the Student t case, with oversizement attenuating when n 1000. Further, as far as power is concerned, the test is less powerful, for large samples, than with Student t data. This can be further considered in conjunction with the high rejection rates when = 4.
Note, …nally, that tests based on the Hill estimator have lower power, even for relatively large n -the power does increase above 50% for n 10000 when data follow a Student t distribution, whereas it tends to be lower when the data follow a power law.

Application
In this section we illustrate how the test works through an empirical application to …nancial data. We consider daily returns from 3rd January, 2008, until 30th September, 2013, which corresponds to a sample size of n = 1499. We consider two groups of stocks from the FTSE 100: the banking sector (5 stocks) and the …nancial services sector (4 stocks), for a total of 9 stocks. A list of the constituents is in Tables 2a-2b. We test for the existence of the …rst, second, third and fourth moment. In particular, letting y j;i be the return on stock j at day i, we use the following test statistics to verify the existence of the …rst four momentŝ Based on the results from the simulations, the test is applied to pre-whitened data, using U = f 1; 1g. Tables 2a-2b:  [Insert Tables 2a-2b somewhere  …nite third moment (Aberdeen and Ashmore), although the null that the fourth moment is in…nite is accepted. Indeed, in both cases there is not an overwhelming amount of evidence in favour of the null (e.g., in the case of Aberdeen, the null is accepted at 5% level, but it would be rejected at 10%). Heuristically, this is in line with the descriptive statistics:

Results are reported in
both stocks have kurtosis around 9, which (were one to assume a Student t distribution for the data) would correspond to a degree of freedom of 5, thus admitting …nite fourth moments. The other stocks in the …nancial sector have the same behaviour as observed for the banking sector, namely in…nite third moments. Again, these results are reinforced by the estimated values of the kurtosis, which are similar to the ones found in the banking sector.

Discussion and conclusions
This paper proposes a test for the null that the k-th moment of a random variable does not exist. The test uses the SLLN, which stipulates that sample moments diverge or converge according as their population counterparts are in…nite or …nite. Since, under the null, sample moments diverge to in…nity and therefore have no randomness, we propose a randomised testing procedure. From a methodological point of view, this approach to testing for the …niteness of moments avoids having to estimate the tail index, which is known to be fraught with di¢ culties. Our simulations show that the test has the correct size and good power. It is important to point out that the test can be applied to verify the existence or not of any moment, including fractional moments -i.e., the case where k in E jXj k is not an integer number. This is bound to prove useful e.g. when evaluating estimators or predictors which are of in…nite variance, and whose properties, therefore, cannot be studied in terms of the customarily employed L 2 -norm or its derivatives (details, and a comprehensive literature review, can be found in Matsui and Pawlas, 2015).
A natural question that stems from this paper is whether it is possible to derive an estimate of . This contribution is focused on providing a test for the null that E jXj k does not exist -this can be of relevance e.g. when computing descriptive statistics; or when employing a theory that requires the existence of moments up to a certain order. It would be possible to use a sequential approach, based on testing for the existence of consecutive, (possibly) integer values of k, although in general this methodology would have to take into account the risk of a high procedure-wise rejection frequency (see in particular a very insightful paper by Fedotenkov, 2015b). On the other hand, the approach proposed in this paper complements the estimator of suggested by Meerschaert and Sche-er (1998), showing that it is consistent, albeit at the slow rate O p 1 ln n . Finally, a word of warning on the meaning of the hypothesis testing framework is in order. Indeed, testing whether a quantity is passing to in…nity, when samples are naturally …nite, is bound to be conceptually unclear. In order to understand the rationale of the test, note that the null hypothesis is tested for by evaluating the rate of divergence of a sample moment (or, rather, of a scale invariant transformation thereof) -this lends itself to being put into an asymptotic setup. However, despite such asymptotic characterisation, the purpose of the analysis is to test for the magnitude of a sample moment, rather than for its actual behaviour at in…nity. In this respect, the approach suggested in this paper is strongly related to the contribution of Bandi and Corradi (2014), who also propose a test for rates of divergence: as the authors put it, "evaluating magnitudes is essential to a variety of econometric problems". Thus, the purpose of this paper is to propose a procedure to allow the researcher to decide whether the moments of a random variable are "small enough" to be able to assume that the underlying distribution admits such moments, or not.

Acknowledgement
I am grateful to the Editor (Oliver Linton), one anonymous Associate Editor and two anonymous Referees for very constructive feedback which has greatly improved the generality of this paper. I also wish to thank the participants to the 5th International conference on Computational and Financial Econometrics (CFE '11;London, December 17-19, 2011), to the 12th OxMetrics User Conference (London, September 3-4, 2012), to the New York Camp Econometrics IX (Syracuse University, April 4-6, 2014), to the 6th ICEEE conference (University of Salerno, January 21-23, 2015) and to Lajos Horvath.
The usual disclaimer applies.

Appendix: Proofs and Derivations
Recall that P is the probability law of the j r j=1 conditional on the sample; henceforth, we let E and V denote the expected value and the variance calculated with respect to P .
We start with a preliminary Lemma, which contains a Law of the Iterated Logarithm (LIL). Proof. The result in the lemma is a LIL for -stable processes, and it is also known as a Chover-type LIL (Chover, 1966). Several results on Chover-type LILs are available in the literature (see e.g. Cai, 2006;Wu and Jiang, 2010;Trapani, 2014); thus, when possible, passages in the proof are omitted to save space.
Let y i = jx i j k ; on account of Assumption 1(i) and of Theorem 14.1 in Davidson (2002, p. 210), the non negative sequence y i is also uniformly mixing with mixing numbers of the same size. De…ne also y (n) i = y i I [jy i j < a n ], and a n = [nL (n) f (n)] k= with f (n) a function such that We start by showing the upper half of the LIL, i.e.

lim sup
1= ln ln n e (1+ )k= a.s.; for every > 0. This requires showing that 1 X n=1 1 n P max 1 j n jS j j > "a n < 1; for some " > 0. The passages are very similar e.g. to those in the proof of Theorem 2.3 in Cai (2006). Indeed P max 1 j n jS j j > "a n P max 1 j n y j > a n +P " max 1 j n S (n) j > "a n max Under Assumption 1, as n ! 1 we have 1 a n max the proof is in Cai (2006). Combining (25) and (26) 1 X n=1 1 n P max 1 j n jS j j > "a n > " 0 a n ; for some 0 < " 0 < " and n large enough. Assumption 1(ii) entails that P n j=1 n 1 P [y j > a n ] This completes the proof of (24). Equation (23) follows from (24) upon choosing f n = ln 1+ n -see e.g. the proof of Corollary 2.4 in Cai (2006).
We now turn to the lower half of the LIL, i.e.
lim sup n!1 P n i=1 y i n k= 1= ln ln n e (1 )k= a.s., for any 2 (0; 1). This requires showing that if, for some sequence b n of positive numbers, for any non decreasing sequence d n of positive numbers. To show this, recall that the sequence fy i g is uniformly mixing of the same size as x i . Hence, the second Borel-Cantelli Lemma holds -see Lemma 1.1.2 in Iosifescu and Theodorescu (1969). Thus, a 0/1 law can be shown, yielding P [y n > b n i.o.] = 0 or 1 according as P 1 n=1 P [y n > b n ] < 1 or = 1; see e.g. Lemma 4(ii) in Wu and Jiang (2010). Hence, equation (28) can be shown by contradiction. Assuming that (28) does not hold when P 1 n=1 P [y n > b n ] = 1, this entails stating that there is a d 0 2 [0; +1) such that lim sup n!1 b 1 n j P n i=1 y i d n j = d 0 almost surely. Clearly lim sup n!1 b 1 n jy n (d n d n 1 )j 2d 0 + lim sup n!1 b 1 n jd n d n 1 j. However, since b 1 n y n p ! 0, we also have lim sup n!1 b 1 n jd n d n 1 j = 0, so that lim sup n!1 b 1 n y n 2d 0 a.s.; hence, P [y n > M b n i.o.] = 0 for some M 2d 0 < 1. By the Borel-Cantelli Lemma, this entails that P 1 n=1 P [y n > M b n ] < 1.
But this contradicts the initial statement. Thus, (28) holds for any sequence d n . Now, given a non decreasing sequence of positive numbers f n , by Assumption 1(ii) we have P 1 n=1 P h y n > (nf n ) k= i = 1, as long as P 1 n=1 (nf n ) 1 = 1. In such case, it follows that lim sup n!1 (nf n ) k= P n i=1 y i = 1 a.s.; hence, equation (27)  where the second equality comes from the fact that j;n (u) is generated independently across j, and the last equality comes from (30) and the passages thereafter. This holds uniformly in u by the same passages as above. Thus, a CLT can be applied to I, so that, as (n; r) ! 1, I d ! N (0; 1). Putting everything together, as (n; r) ! 1 with (11), # nr (u) d ! N (0; 1) uniformly in u. This entails that lim n;r!1 nr = lim n;r!1 Proof of Theorem 2. Under Assumption 1, with part (ii) modi…ed so as to allow for = k + ", it holds that E [jY j ln jY j] < 1. Since uniform mixing implies strong mixing, we can apply a SLLN for strong mixing sequences (e.g. Rio, 1995): thus,^ k a:s: Similarly to the proof of Theorem 1, de…ne such that under H A we have P [lim n!1 + n ] = 1. All the passages below are reported conditional on ! 2 + n .
Consider (29). Term I still satis…es a CLT by construction, so that, under H A , I d ! N (0; 1). As far as II in (29) is concerned, by (30) we have (considering the case of u > 0) Note that the non-centrality parameter II increases with the width of U . These passages entail that # nr (u) has a non-centrality parameter proportional to q r Q( k ) ; therefore, nr has a noncentrality parameter that diverges under (12) In order to study the behaviour under the null, it su¢ ces to show that when E j 1 j k = 1, Lemma 1 holds for^ k . Let kbk k denote the L k -norm of an n-dimensional vector b, i.e. kbk k = h P n j=1 jb j j k i 1=k ; we can write^ k = n 1 k^ k k k = n 1 kM k k k . Based on Lemma 2.2 in Grcar (2003), there exists a …nite and strictly positive constant, say C M , such that n 1 kM k k k C M n 1 k k k k , so that^ k C M n 1 k k k k . By applying Lemma 1, n 1 k k k k = O a:s: n k= 1 ln k= n under H 0 . The same arguments as in the proof of Theorem 1 yield the asymptotics under the null of nr .
As far as the behaviour of nr under alternatives is concerned, it su¢ ces to show that, under H A ,^ k converges to a …nite limit as n ! 1.
by applying the C r -inequality we have jx i j k = 2 k 1 (I + II) : As far as I is concerned, using Assumption 1*(i) under the alternative, the SLLN entails that n 1 k k k k converges a.s. to a …nite limit, so that I = O a:s: (1). Further note that it is not important whether n 1 P n i=1 x 2 i converges or not in order to prove that II is bounded: what matters is that it is bounded away from zero -part (iii) of Assumption 1* rules this out. By virtue of Assumption 1*(ii), the SLLN yields n 1 P n i=1 jx i j k = O a:s: (1). Consider the remaining term; by virtue of the independence between x i and i , the sequence fx i i g n i=1 is also -mixing (see Theorem 5.2 in Bradley, 2005). Under the alternative that > k, and on account of Assumption 1*(ii), the SLLN can be employed again, yielding n 1 P n i=1 jx i i j = O a:s: (1); this is not necessarily the sharpest bound, but it su¢ ces for our purposes. Hence, II = O a:s: (1); this yields that, under the alternative,