NONLINEARITIES IN THE RELATIONSHIP BETWEEN DEBT AND GROWTH: (NO) EVIDENCE FROM OVER TWO CENTURIES

I revisit the popular concern over a nonlinearity or threshold in the relationship between public debt and growth employing long time series data from up to 27 countries. My empirical approach recognizes that standard time series arguments for long-run equilibrium relations between integrated variables (cointegration) break down in nonlinear specifications such as those predominantly applied in the existing debt–growth literature. Adopting the novel cosummability approach, my analysis overcomes these difficulties to find no evidence for a systematic long-run relationship between debt and growth in the bivariate and economic theory-based multivariate specifications popular in this literature.


INTRODUCTION
The latest research [by Reinhart and Rogoff] suggests that once debt reaches more than about 90% of GDP the risks of a large negative impact on long term growth become highly significant.
George Osborne, Mais Lecture, February 24, 2010 The study [Reinhart and Rogoff (2010b)] found conclusive empirical evidence that total debt exceeding 90 percent of the economy has a significant negative effect on economic growth.
"The Path to Prosperity," House Committee on the Budget, April 5, 2011 Despite the rhetoric adopted by a number of governments and opposition parties over recent years, determining a causal link from public debt to long-run growth and the potential nonlinearity of this relationship are widely regarded as unresolved empirical issues [IMF (2012), Panizza and Presbitero (2014)]. As above quotes indicate, the most influential research on the debt-growth nexus in recent years is unarguably the work by Reinhart and Rogoff (2010b) that has been adopted as justification for fiscal austerity measures by politicians on both sides of the Atlantic. Although recent revelations challenged the descriptive analysis carried out in their paper, Reinhart and Rogoff maintain that "the weight of the evidence to dateincluding this latest comment [by Herndon et al. (2014)]-seems entirely consistent with our original interpretation of the data" (Wall Street Journal "Real Time Economics" blog, April 16, 2013), namely that "high debt/GDP levels (90% and above) are associated with notably lower growth outcomes" [Reinhart and Rogoff (2010b, p. 577), see also Rogoff (2013)]. Perhaps aware of the tension between the causal interpretation typically read into this type of statement and the descriptive nature of their analysis, some of their earlier work [Reinhart et al. (2012)] already pointed to a set of empirical studies that are argued to address both concerns regarding causality and identification of a nonlinearity in the long-run debt-growth relationship [e.g., Kumar and Woo (2010), Balassone et al. (2011) Cecchetti (2011, Checherita-Westphal and Rother (2012)] 1 in support of their findings.
This paper investigates the debt-growth nexus from a new angle and with a somewhat more modest aim, focusing on the persistence of the longtime series data used in the original Reinhart and Rogoff (2010b) study. I adopt annual data for over two centuries (1800-2010) to investigate whether linear or various nonlinear specifications of the debt-growth nexus constitute "long-run equilibrium relations" in four Organisation for Economic Co-operation and Development (OECD) countries: the United States, Great Britain, Japan, and Sweden 2 ; additional work presented in an online appendix extends the analysis to 27 advanced and developing economies. The analysis employs the most popular specifications in this empirical literature-polynomial functions and piecewise linear (threshold) specificationsto model the hypothesized nonlinearity. The basic premise of my analysis is that if variable series are integrated (nonstationary), then the popular implementations of nonlinearity in the debt-growth literature (squared debt terms or debt terms interacted with threshold dummies) are invalid, since these transformations of the variable are not defined within the (co-)integration framework. Therefore, any empirical results building on these polynomial or threshold specifications may be spurious.
My empirical strategy addresses this problem by adopting novel methods for summability and cosummability testing [Berenguer-Rico and Gonzalo (2014a,b)]. These concepts provide a framework encompassing integration and cointegration that however extends to nonlinear relationships. The analysis in this study is thus (narrowly) focused on the question of potential nonlinearities in the long-run debt-growth relationship, bypassing any concerns over the direction of causation, which does not impact the statistical validity of the results. If there is no evidence for (nonlinear) long-run relations, then standard empirical specifications in the literature adopting thresholds or polynomial functions are misspecified and the causal interpretation assigned in these studies is questionable: The presence of a long-run equilibrium is a prerequisite for the existence of any long-run causal relationship in the data. Results have important policy implications given that the most vocal supporters of fiscal austerity have pointed to the above-cited studies as providing empirically sound evidence for (this type of) nonlinearities in the debt-growth relationship.
The primary empirical focus is to investigate data for debt and the gross domestic product (GDP) in Great Britain, Japan, Sweden, and the United States over the 1800-2010 time horizon. In additional analysis, I investigate subperiods of 60 years using rolling window analysis to allow for structural breaks in the debtgrowth relationship and also to reduce the impact of global shocks such as World War II on the presence or absence of a long-run relationship. A host of further robustness checks is confined to an online appendix.
My core analysis finds no evidence for any long-run relationship between debt and growth in the linear or nonlinear specifications for the four countries investigated. Subsample analysis does not fundamentally challenge this finding although it provides an indication that there may have been long-run relationships between debt and growth at different points in time, but not in the post-WWII period typically studied in the existing literature. The general patterns revealed by the subsample analysis support the notion that the debt-growth relationship differs across countries and "with economic circumstances" (Larry Summers, Witness Statement to the US Senate Budget Committee, June 4, 2013). Additional empirical analysis goes to great lengths to determine whether the choice of countries, time periods, and/or atheoretical specifications drives this finding but arrives at a fairly consistent picture across all different modes of investigation. These findings help challenge the apparent consensus in parts of the empirical literature of both the existence and the common nature of a debt threshold across countries.
The remainder of the paper is organized as follows: The next section discusses the existing literature on debt and growth, with Section 3 providing the theoretical background for my econometric approach. Section 4 introduces the data and describes the debt-growth nexus in each of the four OECD countries that are at the core of my analysis. Results for these and a larger set of countries are presented and discussed in Section 5, before Section 6 concludes.

EXISTING LITERATURE
The existing empirical literature on the debt-growth nexus builds on somewhat ambiguous theoretical foundations [for a recent survey see Panizza and Presbitero (2014)]. Some theoretical models argue that higher stocks of public debt may create increased uncertainty or even fear of future financial repression among investors and thus lead to a negative long-run relationship [Elmendorf and Mankiw (1999), Teles and Mussolini (2014)] between debt and growth. Other work maintains that this negative relationship disappears once sticky wages and unemployment are taken into account in the modeling process [Greiner (2011)]. The nonlinearity or debt threshold hypothesized and investigated in most empirical work can be motivated for developing countries by pointing to the issue of debt overhang [Krugman (1988), Sachs (1989)], although it may be difficult to extend this argument to advanced economies such as those investigated in this paper. Nonlinearities may further arise if there is a tipping point of fiscal sustainability as is developed in Ghosh et al. (2013); however, I am not aware of any theoretical models incorporating such debt tipping points into a framework for economic growth over the long run.
As was suggested above, the work by Reinhart and Rogoff (2009, 2010a,b, 2011 is largely descriptive in nature, although this should not distract from the significant contribution these authors have made to the literature in the construction of long data series for empirical analysis. Regression analysis of the debt-growth nexus conducted using panel data typically shares the unease about misspecification and endogeneity with the wider cross-country growth literature [for a discussion of the latter see Durlauf et al. (2005), Eberhardt and Teal (2011)]. Empirical specifications in this literature are across the board partial adjustment models in the mould of Barro (1991) and Mankiw et al. (1992)-regressing growth on a lagged level of per capita GDP and a measure for debt stock as well as typically a large number of control variables-in a pooled model specification, thus assuming away the possibility of parameter heterogeneity across countries. 3 The standard practice in the cross-country literature to average data over threeor five-year intervals in the panel is also adopted in all but the most recent papers [Checherita-Westphal and Rother (2012), Baum et al. (2013), Panizza and Presbitero (2014)]. Samples differ significantly across existing studies, with the work by Kumar and Woo (2010), Cecchetti et al. (2011), Checherita-Westphal and Rother (2012), Baum et al. (2013), and Panizza and Presbitero (2014) primarily focused on OECD and other high-income economies and thus most relevant to this study. Among these OECD country studies, the only one to adopt a polynomial specification is the paper by Checherita-Westphal and Rother (2012), although this practice is popular in the study of developing economies [e.g., Cordella et al. (2010), Calderon and Fuentes (2013), Presbitero (2012)]. With the exception of Cecchetti et al. (2011), who apply the within (fixed effects) estimator and thus cannot address concerns over reverse causality, all of the above empirical studies implement their panel analysis adopting the Blundell and Bond (1998) System GMM estimator originally developed for firm-level panel data analysis. 4 Despite different sample periods, country coverage, control variables, modeling of the nonlinearity, and choice of moment conditions for identification, these studies come to remarkably similar conclusions, namely that beyond a threshold at around 90% debt-to-GDP the relationship between debt and growth is negative significant. However, as demonstrated by Panizza and Presbitero (2013), these findings are either not robust to small changes in the sample, suggesting the results are driven by outliers, or fail to formally test the coefficients on the pairwise linear terms, which on closer inspection typically cannot support the notion of a statistically significant change in the debt coefficient above the threshold.
All of the above studies are focused on pooled panel data modeling, implicitly assuming that the long-run equilibrium relationship between debt and growth is the same for all countries in the sample. Existing research has found very different results when moving away from full sample analysis in homogeneous parameter regression models and toward subsample analysis along geographic, institutional, or income lines [IMF (2012), Kourtellos et al. (2013), Eberhardt and Presbitero (2015)]. There are a number of reasons to assume that the equilibrium relationship between debt and growth could differ across countries. Vulnerability to public debt depends not only on debt levels, but also on debt composition [IDB (2006)]. Unfortunately, existing data for the analysis of debt and development often represent a mix of information relating to general and central government debt, debt in different currency denominations and with different terms attached (be they explicit or implicit). All of this implies that comparability of the debt data across countries may be compromised [Panizza and Presbitero (2013)]. In addition, even assuming that debt stocks are comparable across countries and over time, the possible effect of public debt on GDP may depend on the reason why debt has been accumulated and on whether it has been consumed or invested (and in the latter case in which economic activities). Furthermore, different stocks of debt may impinge differently on economic growth: Debt can clearly hinder GDP growth when it becomes unsustainable, affecting interest rates and triggering a financial crisis, thus affecting the level of GDP. However, the capacity to tolerate high debts depends on a number of country-specific characteristics, related to past crises and the macro-and institutional framework [Reinhart et al. (2003), Kraay and Nehru (2006), Manasse and Roubini (2009)]. For these reasons, the focus of analysis in this paper is on the country-by-country investigation of the long-run relationship between debt and growth.
A recent study that empirically investigates the debt-growth nexus with a time series econometric approach is the paper by Balassone et al. (2011) on Italy (1861. By adopting unit root and cointegration testing prior to estimation, they establish a long-run relationship between per capita GDP, per capita capital stock, and debt-to-GDP ratio (all in logarithms). They then go on to estimate (among other models) a piecewise linear specification for the debt-to-GDP ratio where values beyond a threshold of 100% are found to create a significantly stronger negative effect on growth-it is precisely this form of interaction between a threshold dummy and the debt-to-GDP ratio that is not defined under (linear) cointegration and that necessitates the present analysis. 5 It should also be noted that cointegration does not imply causation from debt to growth.

Methodology
In this section, I highlight the difficulties arising for conventional time series analysis when assuming a nonlinear model in the presence of integrated variables and discuss a novel approach to tackle these issues.
Suppose a time series relationship y t = f (x t , θ) + u t for a nonstationary regressor x t ∼ I (1), stationary u t , and some nonlinear function f (·). Assuming for illustration f ( , and then we know that (1) In other words, it can be shown that the [Engle and Granger (1987), henceforth EG] characterization of a stationary process holds for x t (finite variance and is one of five EG characteristics). Now investigate the same property for x 2 t : Here, the finite variance characteristic is clearly violated, given that the variance is a function of time. Since this problem cannot be solved by further differencing, it is not possible to determine the order of integration of x 2 t . This in turn creates fundamental problems if the empirical analysis of y t = θ 1 x t + θ 2 x 2 t + u t is to be based on the arguments of cointegration. The difficulty arises from the requirement of the EG characterization to investigate the differences of a process, with the intrinsic linearity of the difference operator creating obvious problems for nonlinear processes.
The following briefly introduces a novel set of methods for nonlinear processes that closely resemble the standard toolkit in linear time series analysis (tests for unit root behavior and cointegration). The motivation for these new methods is to create "a summary measure of the stochastic properties-such as persistence-of the time series without relying on linear structures" [Berenguer-Rico and Gonzalo (2014b)]. The implementation of these tests is straightforward, involving OLS regressions of transformed variable series, where transformations avoid the first differencing so central to the Dickey-Fuller-type unit root analysis and instead build on running sums. Like in the case of unit root analysis, the distributions of these test statistics are nonstandard, but estimates for subsamples can be used to create confidence intervals for inference.
Berenguer-Rico and Gonzalo (2014b) build on earlier work by Gonzalo and Pitarakis (2006) to develop a nonlinear alternative to linear integration, based on the "order of summability." 6 The empirical procedure to determine the order of summability analyzes the rate of convergence of a rescaled sum Y * k of the variable of interest y t . Using least squares, we can estimate for k = 1, . . . , T : where from which the estimate of the order of summabilityδ * = (β * − 1)/2 is obtained. Inference can be established using confidence intervals constructed from subsample estimation [Politis et al. (1999)], whereby the above procedure is applied to Summability is a more general concept than integration, but they are closely related: If a series x t is integrated of order d, I (d) for d ≥ 0, then it is also summable of order d, S(d); however, not all S(d) processes are also I (d). Summability analysis thus provides important insights into the time series properties of a variable but in contrast to unit root analysis is not limited to linear processes. In the case of the debt-growth application, I pursue here that this allows me to investigate the time series properties of squared and cubed debt-to-GDP ratios as well as piecewise linear debt-to-GDP series.
In a second step, Berenguer-Rico and Gonzalo (2014a) offer a test to investigate the "balance" of the empirical relationship, namely the condition that the two sides of the empirical equation have the same order of summability: Again there is a close analogy with the linear unit root and cointegration case: Before cointegration between two or more variables can be tested, it is necessary to establish that these variables possess the same order of integration. Regressing stationary on nonstationary variables-as would be the case if we regressed the per capita GDP growth rate on the debt-to-GDP ratio in levels-is referred to as an inconsistent regression that leads to invalid inference. However, in the present study, I do not test for balance due to an unresolved problem with the testing procedure that invalidates the results. 9 It should be noted that the main arguments put forth in this paper are based on the cosummability tests, which do not suffer the same problem.
Finally, the concept of cosummability is tested by investigating the error terms of a candidate specification. In empirical practice, letê t be the least squares residuals from a balanced regression y t =θg(x t ) +ê t , and then "strong cosummability" will imply that the order of summability ofê t is statistically close to zero, S(0) [Berenguer-Rico and Gonzalo (2014a)]. Note, the analogy to a linear cointegrating relationship where the residuals from a linear regression between I(1) variables will be I(0). The order of summability forê t can be estimated to determine whether a candidate model is cosummable. 10 Inference follows the subsampling approach as in the previous testing procedures, and under the null of cosummability, the confidence interval includes zero.

Specifications
I adopt two specifications for nonlinearity in the debt-to-GDP ratio in line with standard approaches in the literature: First, in addition to a standard linear model (Model 1), I use polynomial specifications including linear and squared (Model 2) or linear, squared, and cubed (Model 3) debt-to-GDP terms (in logarithms)-examples for this specification include Calderon and Fuentes (2012) and Checherita-Westphal and Rother (2012). Second, I adopt piecewise linear specifications where the debt-to-GDP ratio (in levels, not logs) is divided into two variables made up of values below and above a specified threshold, which is treated as exogenous [examples for this specification include Kumar and Woo (2010), Baum et al. (2013), Panizza and Presbitero (2014)]. 11 For Great Britain, I adopt three threshold values: 90% , 70%, and 50%. For the United States and Japan, I can only adopt the 50% threshold, since even over the full time horizon too few observations are above the other two thresholds: only 12 (Japan: 22) for 70% and 6 (Japan: 17) for 90%. In Sweden, the debt-to-GDP ratio only surpasses the 50% threshold in 15 sample years (7% of observations) so that I cannot investigate even a 50% threshold for this country. Note that all of the empirical approaches in the debt-growth literature discussed above are based on models that are linear in parameters but nonlinear in the variables-my implementation follows this assumption. Although there are of course alternative transformations [e.g., "integrable functions" proposed by Park and Phillips (2001)] to model the potential nonlinearity in the debt-growth relationship, I restrict myself to the above polynomial and threshold models beacuse these feature in the vast majority of empirical applications-see Panizza and Presbitero (2014) for a recent survey.
The cosummability analysis thus investigates a number of specifications for the debt-growth relationship, inspired by the simple Reinhart and Rogoff (2010b) setup. The polynomial specifications are where y is the per capita GDP and x is the debt-to-GDP ratio (both in logarithms), α 0 is an intercept, t is the linear trend term with parameter ϕ, and ε t is the white noise.
The threshold model specifications are based on where 1(X t < threshold) is an indicator function that is 1 for the debt-to-GDP ratio X t below the threshold and 0 otherwise-similarly for 1(X t ≥ threshold) at and above the threshold. I investigate the evidence for long-run equilibrium relationships between debt burden and per capita GDP levels-since the focus of the applied literature is on the long-run relationship, I adopt the levels variable for income, rather than its growth rate. The popularity of the "growth" specification in the cross-country empirical literature is justified by the presence of the lagged level of per capita GDP as additional regressor [as is the case for the "debt-growth" analysis of Kumar and Woo (2010), Cecchetti et al. (2011), Checherita-Westphal and Rother (2012), Baum et al. (2013), among others]. This quasierror correction specification provides estimates for a long-run levels relationship although researchers frequently refer to this type of specification as a "growth" equation [see Eberhardt and Teal (2011)].
In addition to the analysis for the full time horizon, I investigate cosummability in the four OECD countries using a window of 60 years, which is moved along the time horizon from the 1800s to 2010. The purpose of this exercise is to provide both an indication of possible changes in the long-run debt-growth relationship over time and to safeguard the analysis from undue impact of severe shocks such as the two world wars or changes in the definition or the debt variable. 12 Due to the nature of the data, this approach is only feasible for the polynomial specifications: As highlighted by Chinn (2012) in his review of Reinhart and Rogoff (2011), there are comparatively few episodes in developed economies where the debt-to-GDP ratio exceeds 90% and I can therefore not implement the moving window for the piecewise linear specification. Since this rolling window analysis represents a form of data mining, I adjust the confidence intervals (CI) for all estimates following a standard Bonferroni correction, whereby CI * = (1 − α/m) for the conventional confidence level 1 − α (I adopt α = 0.05) and the number of subsamples tested m (varies from 80 for Japan to 152 for the United States, Great Britain, and Sweden). In practice, this makes the confidence intervals much wider, thus representing a more conservative approach to rejecting the null hypothesis of cosummability. A number of additional robustness checks are carried out, for which the motivation, approach, and results are presented in an online appendix. The focus of these robustness checks is on (i) a diverse sample of 23 additional economies (including some developing countries); (ii) a reverse specification with debt-to-GDP ratio (in logs) as the dependent variable for models including the (log of) per capita GDP, and its squared and cubed polynomial terms as regressors; (iii) economic theorybased specifications that add a number of determinants of growth as favored by the cross-country growth literature to the model. (2013)] of the series compiled by Maddison (2010). I match these data to information on the gross government debt-to-GDP ratio (in percent) from Reinhart and Rogoff (2009). The debt figures refer to the total gross central government debt, comprising domestic and external debt (see the online appendix for exceptions). Data coverage differs across countries: For the United States, Britain, and Sweden data series start in 1800, for Japan in 1872-all series end in 2010.

I use annual per capita GDP (in 1990 Geary-Khamis $) from an updated version [Bold and Van Zenden
Descriptive statistics for these four countries are presented in the online appendix, where I also plot the levels and first differences of the per capita GDP and debt-to-GDP ratio variables (in logs). Although my summability analysis provides insights into the time series properties of these data, I also carry out a number of unit root tests to illustrate the difference in order of integration between the per capita GDP growth rate and the debt-to-GDP ratio in levels that rules out the existence of any long-run relationship (cointegration, cosummability) between these two variable series.
In the online appendix, data from a further 23 countries using the same sources are employed to carry out summability and cosummability tests. Here, countries were included in the sample provided their per capita GDP and debt-to-GDP ratio series extended back to 1900 or earlier.
Extended empirical models analyzed in an online appendix incorporate inflation and schooling data primarily taken from the Clio Infra project at the International Institute of Social History, population data from the original Maddison (2010) data set, investment and additional debt data from Maddison (1992), Mitchell (2007a,b), and the World Bank World Development Indicators as well as a number of other sources (for details see online appendix). Figure 1 charts the evolution of the debt-to-GDP ratio for the four economies, where in the spirit of Reinhart and Rogoff (2010), I highlight periods with debt burden in excess of 90% of GDP. While the four time series all display idiosyncracies, it is nevertheless notable how similar, in particular, the patterns for British and American debt-to-GDP ratios are over much of the 20th century, albeit with substantially higher debt in the former. Britain is also the only economy studied that experienced sustained periods of debt-to-GDP above 90%.
In Figure 2, I plot the debt-income relationship in each of the four countries, taking variables in deviation from the country-specific time-series mean. In all four economies, the most significant turning points for the debt-growth nexus were marked by the Great War of 1914-18, the Great Recession of the late 1920s, and World War II.  Table 1 provides estimates of the order of summability for all model variables, including polynomial as well as threshold terms for debt. None of the confidence intervals for tests on per capita GDP levels or any of the debt variables include zero, thus rejecting the null of summability of order zero. The estimated order of summability for the per capita GDP growth rates in contrast is always very close to zero. For the linear terms of per capita GDP and the debt-to-GDP ratio (in logs or levels) and their growth rates, these results are perfectly in line with unit root and stationarity test results presented in the online appendix, where I establish stationary growth rates and nonstationary levels series (whether in logarithms or not).

Main Results: Order of Summability
These findings highlight the significant persistence in the data and provide a strong motivation for the concerns over time series properties I argue are of primary importance when analyzing the long-run debt-growth nexus. In analogy to integrated data, we run the risk of spurious results in any regressions containing these variables unless we can confirm our empirical models as balanced and cosummable. Note that with the exception of the study by Balassone et al. (2011) on Italy none of the papers in this literature show concern for time series properties of the data.  Table 2 provides results from cosummability tests using per capita GDP levels as dependent variable. Cosummability is rejected in all countries and specifications-residuals from these models were not found to be summable of order zero, S(0). Note that the rejection of cosummability is by no means  marginal, with all confidence intervals some distance away from zero. The fact that subsampling confidence intervals are at times very wide is a further strong signal for misspecification. These findings imply that from a long-run perspective per capita income and the debt-to-GDP ratio do not move together, precluding any causal relationship between these variables. 13

Results from Subsample Analysis
Subsample analysis yields three sets of results: (i) country-specific time-varying cosummability statistics for the entire 152 subsamples (80 for Japan) of 60 years, which I present in a graphical form; (ii) comparison of the cosummability subsample results for the United States, Great Britain, and Sweden, again in graphical form-is intended to uncover patterns of commonality and difference in the equilibrium relationship across countries; (iii) cosummability statistics for the post-WWII period as well as results omitting the most recent years covering the global financial crisis (2008)(2009)(2010).
Graphical results for the subsample analysis of cosummability, including Bonferroni-adjusted confidence intervals, are presented in Figure 3. 14 In each plot, the end-year of the 60-year window of analysis is marked on the x-axis and shading indicates the Bonferroni-adjusted 95% confidence intervals-due to different data availability this time dimension of the plots differ for Japan. Note first that across all models and countries, the confidence intervals are fairly large, typically from 0 to 2 or larger. Second, while (Japan aside) in each country, the share of samples that satisfy cosummability is typically above 50%, 15 this data property does not appear to be satisfied consistently over longer stretches of time, but instead appears sporadically. Both of these findings provide a strong signal of misspecification and thus echo the full sample results presented above.
In Table 3, I compare the subsample periods for which the 60-year data series constituted cosummable specifications in the data for the United States, Great Britain, and Sweden: Panel A refers to the linear model (Model 1), Panels B and C to the polynomial specifications with (additional) squared terms and squared and cubed debt terms, respectively (Models 2 and 3). For each country, a shaded cell indicates the 60-year subsample ending in the year specified constitutes a cosummable specification, while the intensity of the shading indicates whether this property occurs in one (lightest), two (intermediate), or all three (darkest) countries. Japan is excluded in this graphical analysis, since the difference in available time series data would necessitate different shadings between earlier (excluding Japan) and later periods (including Japan) that would make a mockery of my attempts to use graphs to illustrate commonality. I begin by focusing on those "episodes" of long-run comovement when the tests for all three countries find cosummability: In all models, clusters of such episodes can be found in the 1860s (thus, for the series starting in the early 1800s), the 1890s-1910s (1830s-1850s), and the 1950s and 1960s (1890s-1900s, incorporating both World Wars). Thereafter, isolated episodes pop up in the 1970s. The most recent episodes occurred in the early 2000s, which incorporate sample years during WWII and its immediate aftermath. Taken together these various episodes account for 28% of all subsamples across the three specifications. 16 Note that the years of the global financial crisis (2008-10) do not form part of this cluster of cosummable episodes in all three countries.
Referring back to Figure 2, it can be seen that the first of these clusters, covering subsamples ending in the 1860s, occurred when all three countries substantially reduced their country-specific debt-burden (movement to the left in Figure 2) albeit with comparatively modest increase in growth in the United States and Sweden (relatively flat line plots). No such pattern is revealed for the second cluster for subsamples ending in the 1890s and 1900s, while the third cluster with end years in the 1950s and 60s occurred when all three countries shifted from a relative debt build-up in years prior to and during WWII to significant debt reduction thereafter, FIGURE 3. Cosummability testing (subsamples). The shaded areas represent the Bonferroni-corrected 95% Confidence Intervals for the Cosummability statistic computed in a moving window of 60-year time periods. The solid black line represents the computed Cosummability statistic. I allow for an intercept in the cosummability analysis. The coverage of the data differs across countries: for the United States, Great Britain, and Sweden I have data from 1800 to 2010 (152 subsamples), for Japan from 1872 to 2010 (80 subsamples). Model 1 refers to a specification with linear debt terms only, Model 2 to a specification with linear and quadratic debt terms, Model 3 further includes a cubed debt term.  whereby the latter period also represented a return to steady economic growth. The final cluster in the early 2000s again does not reveal any systematic patterns in the evolution of debt burden and growth across these three economies.
In between these episodes, there are stretches where two countries have cosummable specifications (around 31-38% of subsamples in each model), although these are often clustered around the episodes just described. The remainder of subsamples is made up of single country episodes (20-34% in each model) and subsamples with no cosummability in any country (7-16% in each model). Table 4 then zooms in on the post-WWII period that forms the focal point for virtually all existing empirical studies on the debt-growth nexus. Here, we find some evidence for cosummability in nonlinear specifications (these subsamples are shaded in gray), especially for the model including a cubed term. Note however that the confidence intervals for the overwhelming majority of these results are very large (indicated with the darker shading) such that they include 1 and at times even 2: A large confidence interval is indicative of serious misspecification and these findings of cosummability should thus be treated with caution.
These robustness checks provide a number of important insights: First, there is no overwhelming evidence that these full sample results are severely distorted by global shocks or structural breaks in the long-run debt-growth relationship, given that a very considerable share of subsamples was found to not to be cosummable across all countries and specifications. Second, having said that my results point to a distinct possibility that certain countries experienced linear or nonlinear comovement between debt and income during certain periods of time over the past two centuries, although seemingly much less so during the 20th century.

CONCLUDING REMARKS
This study took an alternative approach to investigating the presence of nonlinearities in the long-run equilibrium relation between public debt and growth. Empirical results for four OECD countries using data from 1800 to 2010 and the various robustness checks carried out provide limited evidence for nonlinear, or indeed linear, long-run relationships between these variables. There are however certain subperiods over this long time horizon for which tests confirm comovement between debt and income. The timing of these subperiods of comovement frequently appears to differ across countries. These findings are not narrowly confined to the four OECD economies studied in detail but seem to have much wider validity, and further are not an artifact of the simple model specification adopted: I investigated summability and cosummability in a sample of 23 additional countries (including some developing countries), and furthermore studied a number of theory-based extended specifications for the four OECD economies; results in the online appendix provide strong support for the findings presented above.
It is important to emphasize that this study does not and cannot address causality from high(er) debt to low(er) growth as has been the focus in most of the empirical work on this topic. This is by no means a shortcoming of the approach taken. Instead, it highlights a central inconsistency in the empirical analysis of nonlinearities in the debt-growth relationship in the existing literature: In order to establish a long-run causal relationship from debt to growth, it is necessary to first establish a long-run equilibrium relationship. This study documents the difficulties for establishing the latter using standard empirical specifications adopted in the literature when variable series are integrated. Once these difficulties are addressed, I find no evidence for a long-run equilibrium relation in the data for four OECD countries. Various robustness checks provide assurance that this finding is not an artifact of sample selection. Since a long-run equilibrium relationship represents a prerequisite for any long-run causality between variables my analysis by necessity stops at this point.
The results presented in this paper undermine some of the popular conclusions for this politically charged issue that represent fiscal adjustment as a necessity for long-run economic stability and sustainability. I do not claim that a high debt burden is a matter of no concern for policy makers or that in the short-run debt may not be detrimental to growth. Instead, I highlight the absence of evidence for nonlinearities such as the popular 90% debt-to-GDP threshold or polynomial specifications in the long-run relationship with growth and development, which has been the explicit focus of the empirical literature I cite and review. NOTES 1. A further empirical study by Baum et al. (2013) is cited in Reinhart et al. (2012) but (erroneously) argues to focus on the short-run relationship. They note that their sample selection is driven by the finding that data for 1990-2007 appear stationary, whereas the longer 1980-2007 data appear nonstationary.
2. The former two economies are presently at the center of a policy debate relating sustainable growth to fiscal austerity (e.g., US Senate Budget Committee, June 4, 2013), Japan is at times taken as an example for sustained growth at comparatively high levels of debt, while Sweden (alongside the United States and Britain) represents the country with the longest time series in my matched data set.
3. Notable exceptions include studies by Henderson and Parmeter (2013) and Kourtellos et al. (2013) that emphasize the heterogeneity of the debt-growth nexus across countries and adopt nonparametric methods to identify a threshold in the cross-section dimension. 4. A thorough critique of this implementation in the macro-panel context is beyond the scope of this paper. Eberhardt and Teal (2010) highlight the problems arising, Bun and Sarafidis (2015) provide an analysis of the impact of nonstationary initial conditions on this set of estimators, while Pesaran and Smith (1995) discuss the bias arising from heterogeneity misspecification. 5. Adopting the threshold specification, I find that in my data series for Italy none of the various thresholds adopted pass the cosummability test (100% threshold CI low 0.313,δê t = 1.134, CI up 1.954; 90% CI low 0.486,δê t = 1.052, CI up 1.619; 70% CI low 0.910,δê t = 1.695, CI up 2.480; 50% CI low 0.811,δê t = 1.471, CI up 2.130)-see results section for notation.
6. For a formal definition of summability see Definition 2 in Berenguer-Rico and Gonzalo (2014b). 7. The deterministic component m t can be accounted for by the partial mean of y t , namely m t = (1/t) t j =1 y j in the case of a constant. Given the trending behavior of my data, I focus below on the case of constant and linear trend terms, where partial demeaning of y t is carried out twice.
8. I am grateful to a referee who emphasizes that the validity of the subsampling procedure has only been shown by simulation.