Partial Identification of Economic Mobility: With an Application to the United States

Abstract The economic mobility of individuals and households is of fundamental interest. While many measures of economic mobility exist, reliance on transition matrices remains pervasive due to simplicity and ease of interpretation. However, estimation of transition matrices is complicated by the well-acknowledged problem of measurement error in self-reported and even administrative data. Existing methods of addressing measurement error are complex, rely on numerous strong assumptions, and often require data from more than two periods. In this article, we investigate what can be learned about economic mobility as measured via transition matrices while formally accounting for measurement error in a reasonably transparent manner. To do so, we develop a nonparametric partial identification approach to bound transition probabilities under various assumptions on the measurement error and mobility processes. This approach is applied to panel data from the United States to explore short-run mobility before and after the Great Recession.


INTRODUCTION
There has been substantial interest of late in intra and intergenerational mobility. Dang et al. (2014, p. 112) stated that mobility "is currently at the forefront of policy debates around the world." Within the popular press, it has been noted that "social mobility . . . has become a major focus of political discussion, academic research and popular outrage in the years since the global financial crisis." 1 In this article, we study economic mobility while accounting for measurement error in income data. Specifically, we offer a new approach to addressing measurement error in the estimation of transition matrices.
Measurement error in income data is known to be pervasive, even in administrative data. In survey data, measurement error arises for two main reasons: misreporting (particularly with retrospective data) and imputation of missing data (Jäntti and Jenkins 2015). It is now taken as given that self-reported income in survey data contain significant measurement error, and that the measurement error is nonclassical in the sense that it is mean-reverting and serially correlated (Bound, Brown, and Mathiowitz 2001;Kapteyn and Ypma 2007;Gottschalk and Huynh 2010). Compounding matters, Meyer, Mok, and Sullivan (2015) found that both problems-nonresponse and accuracy conditional on answering-are worsening over time.
In administrative data, measurement error arises for three main reasons: misreporting (tax evasion or filing errors), conceptual differences between the desired and available income measures, and processing errors (Bound, Brown, and Mathiowitz 2001;Kapteyn and Ypma 2007;Pavlopoulos, Muffels, and Vermunt 2012;Meyer, Mok, and Sullivan 2015). Even if administrative data are entirely accurate, they are only available in a handful of developed countries.
However, existing studies of mobility either ignore the issue or use complex solutions that invoke strong (and often nontransparent) identification assumptions and have data requirements that are quite limiting. The most frequent response to measurement error in the empirical literature on mobility is to mention it as a caveat (Dragoset and Fields 2006). While the usual assumption is that measurement error will bias measures of mobility upward, the complexity of mobility measures along with the nonclassical nature of the measurement error makes the direction of any bias uncertain. Glewwe (2012, p. 239) stated that "all indices of relative mobility tend to exaggerate mobility if income is measured with error," yet others offer a different opinion. Dragoset and Fields (2006, p. 1) contended that "very little is known about the degree to which earnings mobility estimates are affected by measurement error." Gottschalk and Huynh (2010, p. 302) noted that "the impact of nonclassical measurement error on mobility is less clear since mobility measures are based on the joint distribution of reported earnings in two periods." Our approach to the analysis of mobility given measurement error in income data concentrates on the partial identification of transition matrices. We provide informative bounds on the transition probabilities under minimal assumptions concerning the measurement error process and a variety of nonparametric assumptions on income dynamics. To our knowledge, this is the first study to extend the literature on partial identification to the study of transition matrices (see, e.g., Horowitz and Manski 1995;Manski and Pepper 2000). 2 Within this environment, we first derive sharp bounds on transition probabilities under minimal assumptions on the measurement error process. We then show how the bounds may be narrowed by imposing more structure via shape restrictions, level set restrictions that relate transition probabilities across observations with different attributes (Manski 1990;Lechner 1999), and monotonicity restrictions that assume monotonic relationships between the true income and certain observed covariates (Manski and Pepper 2000).
In contrast to existing approaches to address measurement error in studies of mobility (discussed in Section 2), our approach has several distinct advantages. First, the assumptions invoked to obtain a given set of the bounds are transparent, easily understood by a wide audience, and easy to impose or not impose depending on the particular context. Moreover, bounds on the elements of transition matrices extend naturally to bounds on mobility measures derived from transition matrices. Second, our approach only requires data at two points in time. Third, our approach is easy to implement (through our creation of a generic Stata command). 3 Fourth, our approach extends easily to applications other than income, such as dynamics related to consumption, wealth, occupational status, labor force status, health, student achievement, etc.
The primary drawback to our approach is the lack of point identification. Two responses are in order. First, our approach should be viewed as a complement to, not a replacement for, existing approaches. Indeed, one usefulness of our approach is to provide bounds with which point estimates derived via alternative estimation techniques may be compared. Second, many existing approaches to deal with measurement error in mobility studies end up producing bounds even though the solutions are not couched as a partial identification approach (e.g., Dang et al. 2014;Lee, Ridder, and Strauss 2017). This arises due to an inability to identify all parameters in some structural model of observed and actual incomes.
Perhaps a secondary drawback of our approach is the focus on transition matrices to capture mobility. Such matrices have the disadvantage of not providing a scalar measure of mobility, simplifying spatial and temporal comparisons of mobility. 2 In closely related work, Vikström, Ridder, and Weidner (2018) study the partial identification of treatment effects where the outcomes are conditional transition probabilities. In their setup, measurement error is not considered. Rather, point identification fails even under randomized treatment assignment as treatment assignment is not guaranteed to be independent of potential outcomes in future periods conditional on intermediate outcomes. Our approach is also similar to Molinari (2008); she studies the partial identification of the distribution of a discrete variable that is observed with error. 3 Available at http:// faculty.smu.edu/ millimet/ code.html.
While there is merit to this critique, there are several responses. First, transition matrices are an obvious starting point in the measurement of mobility. Jäntti and Jenkins (2015, p. 822) argued that, when measuring mobility across two points in time, "the bivariate joint distribution of income contains all the information there is about mobility, so a natural way to begin is by summarizing the joint distribution in tabular or graphical form." Second, transition matrices are easily understood by policymakers and the general public and thus are frequently referenced within these domains. Third, transition matrices allow one to examine mobility at different parts of the income distribution (Lee, Ridder, and Strauss 2017). Finally, bounds on (scalar) measures of mobility derived from the elements of transition matrices are easily obtained from our approach.
We illustrate our approach with an examination of intragenerational mobility in the United States using data from the Survey of Income and Program Participation (SIPP). Specifically, we examine mobility over two four-year periods, 2004-2008 and 2008-2012. Understanding mobility patterns in the U.S. is important as there is convincing evidence that income inequality has been increasing in the U.S 4 However, the welfare impact of this rise depends crucially on the level of economic mobility. Shorrocks (1978Shorrocks ( , p. 1013 argues that "evidence on inequality of incomes or wealth cannot be satisfactorily evaluated without knowing, for example, how many of the less affluent will move up the distribution later in life." More recently, Kopczuk, Saez, and Song (2010, pp. 91-92) concluded that "a comprehensive analysis of disparity requires studying both inequality and mobility" as "annual earnings inequality might substantially exaggerate the extent of true economic disparity among individuals." Our analysis of U.S. mobility yields some striking results. First, we show that relatively small amounts of measurement error leads to bounds that can be quite wide in the absence of other information or restrictions. Second, the restrictions considered contain significant identifying power as the bounds can be severely narrowed. Third, allowing for misclassification errors in up to 10% of the sample, we find that the probability of being in (out of) poverty in 2008 conditional on being in poverty in 2004 is at least 35% (27%) under our most restrictive set of assumptions. The probability of being in (out of) poverty in 2012 conditional on being in poverty in 2008 is at least 36% (25%) under our most restrictive set of assumptions. Finally, the probability of being in poverty in 2008 conditional on not being in poverty in 2004 is at least 2% and no more than 11% under our most restrictive set of assumptions. The probability of being in poverty in 2012 conditional on not being in poverty in 2008 is at least 4% and no more than 13% under our most restrictive set of assumptions.
The rest of the article is organized as follows. Section 2 provides a brief review of existing approaches to address measurement error in studies of mobility. Section 3 presents our partial identification approach. Section 4 contains the empirical application. Section 5 concludes. Burkhauser and Couch (2009) and Jäntti and Jenkins (2015) provided excellent reviews of the numerous mobility measures. Bound, Brown, and Mathiowitz (2001) and Meyer, Mok, and Sullivan (2015) offered excellent surveys regarding measurement error in microeconomic data. Tamer (2010), Bontemps and Magnac (2017), and Ho and Rosen (2017) provided in depth reviews of the recent literature on partial identification. 5 Here, we focus on approaches that have been taken to address (or not address) measurement error in analyses of economic mobility. We identify three general approaches in the existing literature: (i) ignore it, (ii) ad hoc data approaches, and (iii) structural approaches. In the interest of brevity, we relegate much of the discussion of the prior literature to Appendix A in the supplementary materials. Here, we discuss only those methods most comparable to our approach. These methods fall within the third category and use structural models to simulate error-free income. Armed with the simulated data, any mobility measure may be computed, including transition matrices. Clearly, the validity of this approach rests on the quality of the simulated error-free data. Obtaining simulated values of errorfree data is not trivial and typically relies on complex models invoking a number of fairly opaque assumptions.

LITERATURE REVIEW
Studies pursuing this strategy include McGarry (1995), Glewwe andDang (2011), Pavlopoulos, Muffels, andVermunt (2012), Dang et al. (2014), and Lee, Ridder, and Strauss (2017). McGarry (1995) posits a variance components model to isolate the portion of observed income that represents measurement error. Upon simulating error-free income, conditional staying probabilities for the poor are examined. The results indicate substantially less mobility in the simulated data. However, the model defines measurement error as the individual-level, timevarying, serially uncorrelated component of income. Thus, all time-varying idiosyncratic sources of income variation are removed. Moreover, the individual-level, time-varying, serially correlated component of income is not considered measurement error. Finally, parametric distributional assumptions are required for identification in practice. Glewwe and Dang (2011) began with the assumption that log income follows an AR(1) process. The authors then combined OLS and IV estimates of the forward and reverse regressions, along with assumptions about the variance components of the model, to simulate error-free income. The simulated data are then used to assess income growth across the distribution. As in McGarry (1995), the results suggest substantial bias from measurement error. However, as in McGarry (1995), identification of error-free income relies on strong assumptions for identification, such as serially uncorrelated measurement error, particular functional forms, and valid instrumental variables. Pavlopoulos, Muffels, and Vermunt (2012) built on Rendtel, Langeheine, and Berntsen (1998) and specified a mixed latent 5 Within the partial identification literature, our analysis is most closely related to Molinari (2008), who posits a direct misclassification approach to bound the distribution of a discrete variable in the presence of misclassification errors, and studies of partial identification of treatment effects under nonrandom selection and misclassification of treatment assignment (e.g., Pepper 2007, 2008; Kreider 2008, 2009;Kreider et al. 2012).
Markov model to examine error-free transitions between low pay, high pay, and nonemployment. The model requires data from at least three periods, as well as requires perhaps strong assumptions concerning unobserved heterogeneity and initial conditions. In addition, serial correlation in measurement error is difficult to address and extending the model to more than three states is problematic. Nonetheless, the results align with the preceding studies in that mobility is dampened once measurement error is addressed. Dang et al. (2014) considered the measurement of mobility using pseudo-panel data. Since the same individuals are not observed in multiple periods, the authors posit a static model of income using only time invariant covariates available in all periods. The model estimates, along with various assumptions concerning how unobserved determinants of income are correlated over time, are used to bound measures of a two-by-two poverty transition matrix. This approach implicitly addresses measurement error through the imputation process as missing data can be considered an extreme form of measurement error. However, measurement error in observed incomes used to estimate the static model and compute the poverty transition matrix is not addressed. Moreover, it is not clear how one could extend the method to estimate more disaggregate transition matrices.
Finally, Lee, Ridder, and Strauss (2017) estimated a complex model based on an AR(1) model of consumption dynamics with time invariant and time-varying sources of measurement error to simulate error-free consumption and estimate transition matrices. Consistent with the preceding studies, significantly less mobility is found in the simulated data. While the authors' model has some advantages compared to earlier attempts to simulate error-free outcomes, these advantages come at a cost of increased complexity, decreased transparency of the identifying assumptions, and a need for four periods of data. In addition, bounds are obtained as not all parameters required for the simulations are identified.
In summary, the literature on addressing measurement error in studies of mobility has witnessed significant recent growth. However, there remains much scope for additional work. While simulation-based methods allow for estimation of transition matrices, these methods are complex, lack transparency, rely on strong functional form and distributional assumptions, and often require more than two years of data. Moreover, the common reliance in the majority of the simulation approaches on an AR(1) model of income or consumption dynamics is worrisome. Lee, Ridder, and Strauss (2017, p. 38) acknowledged that "this model is not so much derived from a welldeveloped theory, but it is a convenient reduced-form model." Finally, the reliance on precise assumptions concerning the nature of the variance components is unappealing in light of Kapteyn and Ympa's (2007, p. 535) finding that "substantive conclusions may be affected quite a bit by changes in assumptions on the nature of error in survey and administrative data." Our proposed approach complements these existing approaches. However, in contrast to simulation approaches, which often end up with bounds on transition probabilities, we set out to estimate bounds from the beginning, making it transparent exactly how the bounds are affected by each assumption one may wish to impose. Furthermore, the assumptions imposed to narrow the bounds are optional and much easier for nonexperts to comprehend.

Setup
Let y * it denote the true income for observation i, i = 1, . . . , N, in period t, t = 0, 1. An observation may refer to an individual or household observed at two points in time in the case of intragenerational mobility or a parent-child pair observed at two points in time in the case of intergenerational mobility. Further, let F 0,1 (y * 0 , y * 1 ) denote the joint (bivariate) cdf, where y * t ≡ [y * 1t · · · y * Nt ]. While movement through the distribution from an initial period, 0, to a subsequent period, 1, is completely captured by F 0,1 (y * 0 , y * 1 ), this is not practical. A K×K transition matrix, P * 0,1 , summarizes this joint distribution and is given by ( 1 ) Elements of this matrix have the following form where the ζ s are cutoff points between the K partitions such that 0 = ζ t 0 < ζ t 1 < ζ t 2 < · · · < ζ t K−1 < ζ t K < ∞, t = 0, 1. 6 Thus, p * kl is a conditional probability. A complete lack of mobility implies p * kl equals unity if k = l and zero otherwise. 7 Finally, we can define conditional transition matrices, conditioned upon X = x, where X denotes a vector of observed attributes. Denote the conditional transition matrix as P * 0,1 (x), with elements given by Implicit in this definition is the assumption that X includes only time invariant attributes. 8 6 For example, if K = 5, then the cutoff points might correspond to quintiles within the two marginal distributions of y * 0 and y * 1 . 7 In contrast, "perfect" mobility may be characterized by origin-destination independence, implying p * kl = 1/K for all k, l, or by complete rank reversal, implying p * kl = 1 if k + l = K + 1 and zero otherwise. See Jäntti and Jenkins (2015) for discussion. 8 Note, while the probabilities are conditional on X, the cutoff points ζ are not. Thus, we are capturing movements within the overall distribution among those with X = x.
For clarity, throughout the paper we consider two types of transition matrices: (i) those with equal-sized partitions and (ii) those with unequal-sized partitions. With equal-sized partitions, the ζ s are chosen such that each partition contains 1/K of the population. For example, equal-sized partitions with K = 5 correspond to a quintile transition matrix. In this case, the rows and columns of P * 0,1 sum to one and mobility is necessarily zero-sum (i.e., if an observation is misclassified in the upward direction, there must be at least one observation misclassified in the downward direction). With unequal-sized partitions, only the rows of P * 0,1 sum to one and mobility is not zero-sum. For example, we shall consider the case of a 2 × 2 poverty transition matrix, where ζ t 1 is the poverty line in period t.
Given the definition of P * 0,1 or P * 0,1 (x), our objective is to learn something about its elements. With a random sample {y * it , x i } and a choice of K and the ζ s, the transition probabilities are point identified as they are functions of nonparametrically estimable quantities. The corresponding plug-in estimator is consistent. However, as stated previously, ample evidence indicates that income is measured with error. Let y it denote the observed income for observation i in period t. With data {y it , x i } and a choice of K and the ζ s, the empirical transition probabilities are inconsistent for p * kl and p * kl (x). With access only to data containing measurement error, our goal is to bound the probabilities given in (2) and (3). The relationships between the true partitions of {y * it } 1 t=0 and the observed partitions of {y it } 1 t=0 are characterized by the following joint probabilities While conditional misclassification probabilities are more intuitive, these joint probabilities are easier to work with (e.g., Kreider et al. 2012). In (4) the subscript (k, l) indexes the true partitions in period 0 and 1 and the superscript (k − k, l − l) indicates the degree of misclassification given by the differences between the observed partitions k and l and true partitions k and l. If k − k, l − l > 0, then there is upward misclassification in both periods. If k − k, l − l < 0, then there is downward misclassification in both periods. If k − k and l − l are of different signs, then the direction of misclassification changes across periods. θ (0,0) (k,l) represents the probability of no misclassification in either period for an observation with true income in partitions k and l. 9 With this notation, we can now rewrite the elements of P * 0,1 as p * kl = Pr(y * 0 ∈ k, y * 1 ∈ l) Pr(y * 0 ∈ k) 9 θ (0,0) (k,l) may be strictly positive even though income is misreported in either or both periods (i.e., y it = y * it for at least some i and t) as long as the misreporting is not so severe as to invalidate the observed partitions (i.e., k = k and l = l regardless). Throughout the article, we use the term measurement error to refer to errors in observed income (y it = y * it ) and misclassification to refer to errors in the observed partitions (k = k and/or l = l). = Pr(y 0 ∈ k, y 1 ∈ l) + k ,l =1,2,...,K (k ,l ) =(k,l) where the final line holds only in the case of equal-sized partitions. 10 Q 1,kl measures the proportion of false negatives associated with partition kl (i.e., the probability of being misclassified conditional on kl being the true partition). Q 2,kl measures the proportion of false positives associated with partition kl (i.e., the probability of being misclassified conditional on kl being the observed partition). Similarly, Q 3,k and Q 4,k measure the proportion of false negatives and positives associated with partition k, respectively. The transition probabilities in (5) and (6) are not identified from the data alone. The data identify r kl and p k (and, hence, p kl ≡ r kl /p k ), but not the misclassification parameters, θ . One can compute sharp bounds by searching across the unknown misclassification parameters. There are K 2 (K 2 − 1) misclassification parameters in P * 0,1 . However, several constraints must hold (see Appendix B in the supplementary materials). Even with these constraints, obtaining informative bounds on the transition probabilities is not possible without further restrictions. Section 3.2 considers assumptions on the θ s. Section 3.3 considers restrictions on the underlying mobility process.
Prior to continuing, it is worth relating our framework to the direct misclassification approach posited in Molinari (2008). Let R * denote a K 2 × 1 vector of the stacked elements of P * 0,1 , given by One can similarly define a K 2 × 1 vector, R, of observed conditional transition probabilities, given by The direct misclassification approach introduces a K 2 × K 2 matrix of conditional misclassification probabilities, , such that where the representative element of , π cd , is given by 10 The expression in (5)  This setup is identical to Molinari (2008) with the exception that the probabilities in R * and R represent conditional transition probabilities. Molinari (2008) proceeded to derive sharp bounds given various assumptions on using a nonlinear programming approach. The assumptions concerning the joint misclassification probabilities given in (4) that we consider in Section 3.2 can be written in terms of restrictions on . However, it is not obvious if the additional restrictions on the underlying mobility process, R * , considered in Section 3.3 are amenable to this framework. Moreover, the estimation approach in Molianari (2008) becomes computationally challenging as the dimensionality of R * gets large (above 13 elements). Our code accommodates up to 5 × 5 transition matrices.

Assumptions. Allowing for measurement error,
we obtain bounds on the elements of P * 0,1 , given in (5). 11 We consider the following misclassification assumptions.
Assumption 1 (Classification-preserving measurement error). Misreporting does not alter an observation's partition in the income distribution in either period. Formally, k,l θ 00 kl = 1 or, equivalently, Assumption 2 (Maximum misclassification rate). (i) (Arbitrary misclassification) The total misclassification rate in the data is bounded from above by Q ∈ (0, 1). Formally, 1 − k,l θ 00 kl ≤ Q or, equivalently, (ii) (Uniform misclassification) The total misclassification rate in the data is bounded from above by Q ∈ (0, 1) and is uniformly distributed across partitions. Formally, Assumption 1 is quite strong, but is simply used as a benchmark. Under this assumption, measurement error is allowed as long as it does not cause observations to be classified into incorrect partitions. With equal-sized partitions, this could occur if measurement error is rank-preserving. Formally, defining F t (y it ) and F * t (y * it ), t = 0, 1, as the marginal cdfs of observed and true income in each period, then the measurement error is rank-preserving if F t (y it ) = F * t (y * it )∀i, t. This is similar to Heckman, Smith, and Clements' (1997) ranked invariance assumption in the context of the distribution of potential outcomes in a treatment effects framework. With unequal-sized partitions, rank-preserving measurement error is not sufficient to ensure Assumption 1 holds. 12 Assumption 2 places restrictions on the total amount of misclassification allowed in the data. As we discuss below, the amount of misclassification is dependent on the choice of K. As such, one could express Q as Q(K); we dispense with this for expositional purposes. 13 For the case of equal-sized partitions, misclassification is necessarily zero-sum; upward misclassification of some observations implies downward misclassification of others. Thus, even if measurement error in income is unidirectional, misclassification errors must be bidirectional. However, for the case of unequal-sized partitions, this need not be the case. In such cases, we also consider adding the following assumption.

Assumption 3 (Unidirectional misclassification).
Misclassification occurs strictly in the upward direction. Formally, Assumption 3 rules out the possibility of any false positives (negatives) occurring in the worst (best) partition. Note, this assumption is consistent with mean-reverting measurement error as long as the negative measurement errors for observations with high income are not sufficient to lead to misclassification. For example, if P * 0,1 is a 2 × 2 poverty 12 For example, if P * 0,1 is a 2 × 2 poverty transition matrix and all individuals over-report their income by a constant amount, then rank preservation will hold. However, some individuals may now be incorrectly classified as above the poverty line. Instead, Assumption 1 allows measurement error to be unrestricted as long as true poverty status is observed for all observations. 13 As suggested by an anonymous reviewer, two additional restrictions might also be considered in conjuction with Assumption 2. First, one might impose independence between the misclassification probabilities in the initial and terminal periods. This implies that the misclassification probabilities is the probability of being observed in partition k (l ) in the initial (terminal) period when the true partition is k (l). This restriction reduces the number of misclassification parameters from K 2 (K 2 − 1) to 2K(K − 1). Second, one might wish to assume the misclassification probabilities are time invariant, implying α k −k k = β k −k k ∀k. This restriction further reduces the number of misclassification parameters to K(K − 1). Both assumptions are quite strong. The former restriction requires that individuals' misclassification probabilities are independent of their income history. However, one might suspect different misreporting propensities, say, for an individual who finds him/herself in poverty for the first time versus someone who has been in poverty throughout his/her lifetime. The latter restriction assumes that data accuracy and other sources of measurement error such as stigma are constant over the analysis period. In the interest of brevity, we leave the consideration of such restrictions to future work. transition matrix, Assumption 3 permits observations with true incomes exceeding the poverty threshold to underreport income, but not to a degree whereby they are misclassified as in poverty. This assumption may not hold, for instance, if some households above the poverty threshold report incomes below the poverty threshold in an attempt to qualify for means-tested transfers. Such violations seem plausible in administrative data as responses may have consequences for safety net eligibility; unidirectional errors are more likely to arise in survey data.
Proposition 1. Under Assumption 1 the transition probabilities are nonparametrically identified by where E[·] is the expectation operator and I(·) is the indicator function. Proof : See Appendix C in the supplementary materials.
Estimation proceeds by replacing the terms with their sample analogs, given by where the last line follows in the case of equal-sized partitions. 3.2.2.2. Maximum Misclassification Rate (Assumption 2). Under Assumption 2 with Q > 0, the transition probabilities are no longer nonparametrically identified. We have the following propositions.
Proposition 2. Consider a transition matrix, P * 0,1 , with equal-sized partitions. The transition probabilities are bounded sharply by where LB kl ≡ max K(r kl − Q), 0 , UB kl ≡ min K(r kl + Q), 1}, and Q = Q/2 under Assumption 2(i) and Q = Q/K under Assumption 2(ii). Proof : See Appendix C in the supplementary materials.
Proposition 3. Consider a transition matrix, P * 0,1 , with unequal-sized partitions. Under Assumption 2(i), the transition probabilities are bounded sharply by Assumption 2(ii), the transition probabilities are bounded sharply by Estimation of the bounds in Propositions 2 and 3 proceeds by replacing r kl and p k with their sample analogs and then verifying that the required conditions are met. 3.2.2.3. Unidirectional Misclassification (Assumption 3). For simplicity, we only consider Assumption 3 in the case of a 2 × 2 transition matrix. We have the following proposition.
Proposition 4. Under Assumption 3, the four elements of a 2 × 2 transition matrix with unequal-sized partitions are bounded sharply by where LB kl and UB kl denote the lower and upper bounds of p * kl , respectively. Under Assumption 2(i), Q = Q and Q = 0. Under Assumption 2(ii), Q = Q/2 and Q = min Q, p 2 . Proof : See Appendix C in the supplementary materials.
Estimation of the bounds are straightforward using the appropriate sample analogs and then verifying that the required conditions are met.

Restrictions
Propositions 2-4 provide bounds on transition probabilities considering only restrictions on the misclassification process. Here, we explore the identifying power of incorporating restrictions on the mobility process. The restrictions may be imposed alone or in combination.

Shape Restrictions. Shape restrictions place
inequality constraints on the population transition probabilities. 14 Here, we consider imposing shape restrictions assuming that large transitions are less likely than smaller ones.
Assumption 4 (Shape restrictions). The transition probabilities are weakly decreasing in the size of the transition. Formally, p * kl is weakly decreasing in |k − l|, the absolute difference between k and l.
This assumption implies that within each row or each column of the transition matrix, the diagonal element (i.e., the conditional staying probability) is the largest. The remaining elements decline weakly monotonically moving away from the diagonal element. This assumption, which may be plausible if large jumps in income are less common than small ones, leads to the following proposition.
Proposition 5. Denote the bounds on p * kl under some combination of Assumptions 2 and 3 as Adding Assumption 4 implies the following sharp bounds 14 See Chetverikov, Santos, and Shaikh (2018) for a recent review of the use of shape restrictions in economics.
Proof : See Appendix C in the supplementary materials.
Estimation is straightforward given estimates of the preliminary bounds, LB kl and UB kl .

Level Set Restrictions. Level set restrictions
place equality constraints on population transition probabilities across observations with different observed attributes (Manski 1990;Lechner 1999).
Assumption 5 (Level set restrictions). The conditional transition probabilities, given in (3), are constant across a range of conditioning values. Formally, For instance, if x denotes the age of an individual in years, one might wish to assume that p * kl (z) is constant for all z within a fixed window around x.
From (3) and (5), we have where now Q j,· (x), j = 1, . . . , 4, represent the proportions of false positives and negatives conditional on x. As such, we also consider the following assumption regarding the conditional misclassification probabilities.
Assumption 6 (Independence). Misclassification rates are independent of the observed attributes of observations, x. Formally, , ∀k, k , l, l , x.
The plausibility of Assumption 6 depends on one's conjectures concerning the measurement error process. However, two points are important to bear in mind. First, the misclassification probabilities, θ , are specific to a pair of true and observed partitions. As a result, even if misclassification is more likely at certain parts of the income distribution and x is correlated with income, this does not necessarily invalidate Assumption 6. Second, Assumption 6 does not imply that misclassification rates are independent of all individual attributes, only those included in the variables used to define the level set restrictions. This leads to the following proposition.
Proposition 6. Denote the bounds for p * kl (x) under some combination of Assumptions 2-4 and 6 as Adding Assumption 5 implies the following sharp bounds on the conditional transition probabilities Assuming X is discrete, sharp bounds on the unconditional transition probabilities are given as Proof : See Manski and Pepper (2000).
To operationalize Proposition 6, bounds on the conditional transition probabilities in (8) must be obtained. This is done in the following corollaries.
Corollary 6.3. Consider a 2 × 2 transition matrix, P * 0,1 , with unequal-sized partitions. Under Assumption 3, the four elements are bounded sharply by where LB kl (x) and UB kl (x) denote the lower and upper bounds of p * kl (x), respectively, Under Corollaries 6.1-6.3, estimation of the bounds for p * kl (x) are straightforward using the appropriate sample analogs and minimizing (maximizing) the lower (upper) bound subject to the appropriate constraints. Upon obtaining bounds for p * kl (x), sharp bounds for the conditional and unconditional transition probabilities are given in (9) and (10). 15 Before continuing, it is worth pointing out a special case of level set restrictions when the conditioning variable, x, represents time. For example, one might separately bound transition matrices from t = 0 → 1 and t = 1 → 2 and then impose the restriction that mobility is constant across the two time periods. Here, the level set restriction is identical to a stationarity assumption about the Markov process governing the outcome variable. This is formalized in the following assumption and proposition.

Monotonicity Assumptions. Monotonicity res-
trictions place inequality constraints on population transition probabilities across observations with different observed attributes (Manski and Pepper 2000;Chetverikov, Santos, and Shaikh 2018).
Assumption 8 (Monotonicity). The conditional probability of upward mobility is weakly increasing in a vector of attributes, u, and the conditional probability of downward mobility is weakly decreasing in the same vector of attributes. Formally, if u 2 ≥ u 1 , then For instance, if u denotes the education of an individual, one might wish to assume that the probability of upward (downward) mobility is no lower (higher) for individuals with more education. Note, the monotonicity assumption provides no information on the conditional staying probabilities, p * kk (u), for k = 2, . . . , K − 1.
This leads to the following proposition.
Proposition 8. Denote the bounds for p * kl (u) under some combination of Assumptions 2-6 as p * kl (u) ∈ [LB(u), UB(u)] . Adding Assumption 8 implies the following sharp bounds on the conditional transition probabilities Assuming U is discrete, sharp bounds on the unconditional transition probabilities are given as Proof : This is a simple extension of Manski and Pepper (2000, Proposition 1 and Corollary 1).

Summary Mobility Measures
Several scalar measures of mobility considered in the literature are derived directly from the elements of the transition matrices. The Prais (1955) measure of mobility captures the expected exit time from partition k and is given by Bradbury (2016) defined measures of upward and downward mobility that account for the size of the partitions. The upward mobility measure is given by downward mobility is given by Mobility is decreasing in the value of the Prais measure; increasing in the remaining two measures. The measures in (11)-(13) can be sharply bounded in a straightforward manner using sharp bounds on the individual conditional staying probabilities since each measure depends on only one element from the transition matrix. 16 16 A fourth measure derived from the transition matrix is the immobility ratio, attributable to Shorrocks (1978). The measure is given by

Properties
3.5.1. Bias Correction. In most of the cases considered here, estimates of the bounds are obtained via plug-in estimators relying on infima and suprema. Such estimators are biased in finite samples, producing bounds that are too narrow (Kreider and Pepper 2008). To circumvent this issue, a bootstrap bias correction is typically used in the literature on partial identification. Denote the plug-in estimators of the lower and upper bounds under some set of the preceding assumptions as LB and UB, respectively. The bootstrap bias corrected estimates are given by where LB c and UB c denote the bootstrap bias corrected estimates and E * [·] denotes the expectation operator with respect to the bootstrap distribution. See Kreider and Pepper (2008) and the references therein. However, there is an added complication here. Because we are estimating bounds on probabilities, the upper (lower) bound is constrained by one (zero). It is well known that the traditional bootstrap does not work for parameters at or near the boundary of the parameter space (Andrews 2000). Instead, we employ subsampling, using replicate samples with N/2 observations (Andrews and Guggenberger 2009; Martínez-Muñoz and Suáreza 2010). 17 3.5.2. Inference. A substantial body of literature exists on inference in partial identification models. Early work focused on confidence regions for the identified set (Stoye 2009). Imbens and Manski (2004) instead derived confidence regions for the partially identified parameter of interest. Here, inference is handled via subsampling and the Imbens-Manski (2004) correction to obtain 90% confidence intervals (CIs). 18 As with the bias correction, we set the size of the replicate samples to N/2. Some comments on this choice is necessary as there has been much recent work on inference in partially identified models; Bontemps and Magnac (2017), Canay and Shaikh (2017), and Ho and Rosen (2017) provided excellent reviews. For instance, intersection bounds, (conditional) moment inequality, and random set theory and Bayesian approaches are also used for estimation and inference in partial identification models.
where tr(·) denotes the trace of a matrix. Since the trace is a function of multiple elements of the matrix-one from each row and column-bounds on IR using the upper and lower bounds on the diagonal elements of the trace under Assumption 2(i) are not sharp. They are sharp under Assumption 2(ii). Future work may wish to consider sharp bounds on IR under arbitrary errors. 17 We employ sub-sampling (without replacement) rather than an m-bootstrap (with replacement), where m < N, as sub sampling is valid under weaker assumptions (Horowitz 2001). Noneless, our Stata code allows for both options. Moreover, we set m = N/2 as it is unlikely that an optimal, data-driven choice of m is available (or computationally feasible in the present context). Politis, Romano, and Wolf (1999, p. 61) stated that "subsampling has some asymptotic validity across a broad range of choices for the subsample size" as long as m/N → 0 and m → ∞ as N → ∞. Martínez-Muñoz and Suáreza (2010, p. 143) note that setting m = N/2 is "typical." 18 Since a K × K transition matrix entails the estimation of K(K − 1) free parameters, one might be concerned with issues related to multiple hypothesis testing depending on the nature of the hypotheses being considered. While not considered here, our code does allow for a Bonferonni correction if one so chooses.
When a single parameter is being bounded, the endpoints of the bounds are asymptotically normal, and the sample is randomly drawn from an infinite population, then the approach in Imbens and Manski (2004) or Stoye (2009) is applicable and straightforward. However, when the endpoints are obtained via intersection bounds, as in the case of level set or monotonicity restrictions, then methods such as those provided in Chernozhukov, Hong, and Tamer (2007) or Chernozhukov, Lee, and Rosen (2013) are available depending on whether the conditioning variable is discrete or continuous. However, we do not pursue such approaches here for two reasons. First, it is not clear how to convert all the restrictions we wish to consider into a set of (conditional) moments. Second, in the case of our level set or monotonicity restrictions, the method in Chernozhukov, Lee, and Rosen (2013) seems applicable if one is interested in bounds and confidence regions for the conditional transition probabilities, p * kl (x) and p * kl (u). However, as we are ultimately interested in bounds for the unconditional transition probabilities, p * kl , which are weighted averages of the bounds on the conditional transition probabilities, application of this method is not obvious.

Data
To assess U.S. intragenerational mobility, we use panel data from the SIPP. Collected by the US Census Bureau, SIPP is a rotating, nationally representative longitudinal survey of households. Begun in 1984, SIPP collects detailed income data as well as data on a host of other economic and demographic attributes. Households in the SIPP are surveyed over a multiyear period ranging from two and a half years to four years. Then, a new sample of households are drawn. The sample sizes range from approximately 14,000 to 52,000 households. Here, we use the 2004 and 2008 panels to examine mobility leading up to the Great Recession and during the early recovery period. For the 2004 panel, the initial period is November 2003 and the terminal period is October 2007. For the 2008 panel, the initial period is June 2008 and the terminal period is September 2012. Thus, we investigate householdlevel income dynamics over two separate four-year windows. We also assess mobility pooling the two panels.
For the analysis, the outcome variable is derived from total monthly household income (variable THTOTINC). This includes income from all household members and sources: labor market earnings, pensions, social security income, interest dividends, and other income sources. When analyzing the 2 × 2 poverty matrix, we determine poverty status for each household in each period by comparing income with the SIPP-reported poverty threshold for the household (variable RHPOV). When analyzing general mobility, we estimate 3 × 3 matrices based on terciles of the income distribution in each period. However, to adjust for household composition, we construct three different measures of so-called equivalized household income. 19 Adjusting income for household size 19 There is no need to adjust income for household size when estimating the poverty transition matrix since the poverty threshold already accounts for differences in household composition. when drawing welfare or policy conclusions is known to be crucial (e.g., Chiappori 2016). In our baseline analysis, we use OECD equivalized household income (OECD 1982). 20 As alternatives, we also construct OECD-modified equivalized household income (Hagenaars, De Vos, and Zaidi 1994) and per capita household income. 21 Specifically, the OECD (OECDmodified) equivalence scale assigns a value of one to the first household member, 0.7 (0.5) to each additional adult, and of 0.5 (0.3) to each child. In contrast, the per capita measure assigns a value of one to all household members. In the interest of brevity, results based on these alternative equivalence scales are relegated to Appendix E in the supplementary materials.
In constructing our estimation sample, we use only the initial and terminal wave for each panel. The sample, by necessity, must be balanced. Households with any invalid or missing information on the relevant variables are excluded. Finally, we restrict the sample to households where the head is between 25 and 65 years old in the initial period. The sample size for the 2004 panel is 7834 and for the 2008 panel is 16,006. 22 Summary statistics are presented in Table 1.
When assessing the two panels separately and imposing level set restrictions, we use age of the household head in the initial period. Specifically, we group households into 10year age bins (25-34, . . . , 55-65) and impose the restriction that mobility is constant across adjacent bins. For example, we tighten the bounds on mobility for households where the head is, say, 35-44 by assuming that mobility is constant across households where the head is 25-34, 35-44, and 45-54. When pooling the two panels and imposing level set restrictions, we combine the age of household head restriction used in the case of separate panels with a stationarity assumption that mobility is constant across the two panels. For example, we tighten the bounds on mobility for households where the head is, say, 35-44 in the initial period of the 2004 panel by assuming that mobility is constant across households where the head is 25-34, 35-44, and 45-54 in the 2004 and 2008 panels. When imposing the monotonicity restrictions, we use the education of the household head in the initial period. Here, households are grouped into three bins (high school graduate and below, some college but less than a four-year degree, and at least a four-year college degree). 20 OECD equivalized household income for an individual household is defined as Y/N, where Y is total household income, N = 1 + 0.7(A − 1) + 0.5C, and A (C ) is the total number of adults (children) in the household. 21 OECD-modified equivalized household income for an individual household is defined as Y/N, where Y is total household income, N = 1 + 0.5(A − 1) + 0.3C, and A (C) is the total number of adults (children) in the household. 22 The 2004 panel contains 10,503 households observed in the initial and terminal periods. Two observations are dropped due to negative household income. The remainder are dropped because the household head is outside the 25-65 year old age range. The 2008 panel contains 21,616 households observed in the initial and terminal periods. Eighty-eight observations are dropped due to negative or missing household income. The remainder are dropped because the household head is outside the 25-65 year old age range.  Table 2 allow for misclassification, but impose arbitrary (Assumption 2(i)) and uniform (Assumption 2(ii)) errors, 23 In all cases, we use 25 replicate samples for the subsampling bias correction and 100 replicate samples to construct 90% Imbens-Manski (2004) confidence intervals via subsampling using m = N/2 without replacement. For brevity, we do not report bounds based on all possible combinations of restrictions. Unreported results are available upon request. 24 Throughout the analysis, poverty status is measured only at the initial and terminal period. Thus, for example, "remaining in poverty" does not mean a household is necessarily in poverty continuously over the four-year period. For expositional purposes, however, we describe the results in terms of remaining in or out of poverty.
respectively. The assumed maximum misclassification rate is 10% (Q = 0.10). The rationale for this choice is discussed in Appendix D in the supplementary materials; we also explore sensitivity to this choice below. In Panel II the bounds are nearly uninformative on the mobility of households in poverty in the initial period in both SIPP panels. Thus, a relatively small amount of arbitrary misclassification results, in the absence of other information, in an inability to say anything about the four-year mobility rates of households initially in poverty. This arises because the maximum allowable misclassification rate is nearly as large as the fraction of the sample reported to be in poverty in the initial period. For households initially above the poverty line, more can be learned even in the presence of an arbitrary 10% misclassification rate as this includes the majority of the sample. First, the probability of remaining out of poverty four years later is at least 0.825 (0.808) in the first (second) SIPP panel. 25 Second, the probability of being in poverty despite not being in poverty four year prior is at most 0.175 (0.192) in the first (second) SIPP panel. For the second SIPP panel, this provides a useful upper bound on the transition rate into poverty around the time of the Great Recession. In Panel III the bounds are more informative. Thus, the assumption of uniform errors has some identifying power. Under this assumption, the probability of escaping poverty is at least 0.130 (0.142) in the first (second) SIPP panel. The probability of remaining out of poverty is at least 0.882 (0.865) in the first (second) SIPP panel. Conversely, the probability of being in poverty despite not being in poverty four year prior is at most 0.118 (0.135) in the first (second) SIPP panel. This is about a six percentage point decline relative to Panel II. Finally, in both panels we are able to rule out the possibility (at the 90% confidence level) that no households move into poverty over the four year period; the probability of transitioning from out of poverty in the initial period to in poverty in the terminal period is at least 0.005 (0.020) in the first (second) SIPP panel. Panels IV and V in Table 2 add the assumption that misclassification is only in the upward direction (Assumption 3). This assumption has no identifying power on the transition probabilities for households above the poverty line in the initial period. However, it is useful in tightening the bounds on the transition probabilities for households in poverty in the ini-tial period. With arbitrary and unidirectional misclassification (Assumptions 2(i) and 3), bounds on the probability of remaining in poverty four years later are [0.243, 1.000] in the first SIPP panel and [0.258, 1.000] in the second SIPP panel. Under uniform and unidirectional misclassification (Assumptions 2(ii) and 3), bounds on the probability of remaining in poverty four years later are further tightened to [0.315, 0.870] in the first SIPP panel and [0.331, 0.858] in the second SIPP panel. While the assumptions of uniform and unidirectional misclassification certainly tighten the bounds, the width of the bounds under the assumption of a 10% misclassification rate makes it clear than even relatively small amounts of misclassification add considerable uncertainty to estimates of poverty mobility in a (relatively) low poverty environment. That said, one still learns that the four-year poverty persistence rate is at least  Table 3 imposes different combinations of Assumptions 2-7. For the separate SIPP panels, level set restrictions are based on the age of the household head in the initial period. For the pooled panels, households are more likely to maintain the same poverty status over the four-year period than change status. With each panel, we present results based on different types of misclassification errors based on Assumptions 2-3.
Several findings stand out. First, under arbitrary and independent misclassification errors (Assumptions 2(i) and 6), Panels IA and IIA reveal that the level set and shape restrictions have little identifying power. There is some tightening of the lower bounds relative to Panel II in Table 2, but it is modest.
Second, under uniform and independent misclassification errors (Assumptions 2(ii) and 6), Panels IB and IIB reveal that the level set and shape restrictions have some identifying power.   (Table 3, Panel IIB). Thus, we also find under these assumptions that at least 1 in 5 impoverished households in the initial period are out of poverty four years later.
Third, adding the assumption of unidirectional misclassification errors has additional identifying power on the transition probabilities for households below the poverty line in the initial period. Now the bounds on the probability of remaining in poverty over the four-year period in the first SIPP Panel are [0.345, 0.822] (Table 3, Panel IIC), implying that at least 3 in 10 impoverished households in the initial period remain in poverty four years later. Finally, adding the stationarity assumption modestly tightens the bounds further; bounds on the probability of remaining in poverty over the four-year period under uniform, independent, and unidirectional errors are [0.357, 0.823] (Table 3, Panels IIC). Furthermore, under the strongest set of assumptions (Table 3, Panel IIC, using the pooled panels), we obtain bounds on the probability of escaping poverty four years later to be [0.177, 0.643] and on the probability of entering into poverty to be [0.030, 0.115]. Knowledge of the minimum probability of escaping poverty and maximum probability of entering into poverty are useful policy parameters and the bounds appear narrow enough to be useful. 4.2.1.3. Monotonicity Restriction. Table 4 is similar to   Table 3, but adds Assumption 8. The monotonicity restriction requires upward mobility to be weakly increasing in the household head's education level in the initial period. The monotonicity assumption has some identifying power. First, under arbitrary and independent misclassification errors (Assumptions 2(i) and 6), Panels IA and IIA reveal wide bounds, but now exclude the endpoints of zero and one in some instances. Second, under our strongest set of assumptions, bounds on the probability of remaining in poverty over the four-year period are [0.357, 0.723] ( Under the baseline assumption of classification-preserving measurement error (Table 5, Panel I) the conditional staying probabilities in the first (second) SIPP panel are 0.683, 0.533, and 0.692 (0.685, 0.538, and 0.685) for terciles 1, 2, and 3, respectively. Thus, the observed four-year conditional staying probabilities do not vary much across the two panels. Furthermore, we find that the probability of observing larger movements in the income distribution are less likely than smaller movements. For example, pooling the two panels together, the probability of moving from the first to second tercile is 0.245 and the first to third tercile is 0.071. Similarly, the probability of moving from the third to second tercile is 0.217 and the third to first tercile is 0.095. 4.2.2.1. Misclassification Assumptions. Panels II and III in Table 5 allow for misclassification, but impose Assumption 2(i) and 2(ii), respectively. The assumed maximum misclassification rate is 20% (Q = 0.20). The rationale for this choice is discussed in Appendix D in the supplementary materials; we also explore sensitivity to this choice below. Under arbitrary misclassification (Assumption 2(i)), the width of the bounds is 0.6 (= KQ) unless the bounds include one of the boundaries. Under uniform misclassification (Assumption 2(ii)), the width is 0.4 (= 2Q) unless the bounds hit one of the boundaries. Thus, the bounds are guaranteed to be at least somewhat informative only in the latter case. Uniform misclassification is reasonable if misclassification is equally likely in the upward and downward directions. With meanreverting measurement error in income, this may be plausible.
In the first SIPP panel, we find that the bounds on the conditional staying probabilities are Bounds on the off-diagonal elements, while generally lower as one moves further from the diagonal, cannot rule out the possibility that large movements in the income distribution are more likely than smaller movements (conditional on changing terciles). Moreover, bounds on the off-diagonal provide a useful upper bound on the probability of large income changes. For example, the probability of moving from tercile 1 to tercile 3 (tercile 3 to tercile 1) in the first SIPP panel under uniform misclassification is no greater than 0.271 (0.287). 4.2.2.2. Level Set Restrictions. Table 6 allows for misclassification, but imposes different combinations of Assumptions 2-7. 26 Because of the similarity of the results across the two SIPP panels in Table 5, we focus on the results for the pooled sample where the stationarity restriction (Assumption 7) is imposed. In Panel I, the level set restrictions are not combined with shape restrictions (Assumption 4). In Panel II, shape restrictions are imposed on top of the level set restrictions. This assumption corresponds to the restriction that households are more likely to make smaller movements in the income distribution than larger movements. Several findings stand out. First, under arbitrary and independent misclassification errors (Assumptions 2(i) and 6), Panels IA and IIA reveal that the level set restrictions have some identifying power. The shape restrictions do not add new infor-mation. As stated previously, the bounds under arbitrary errors in Table 5 have a width of 0.6 unless the boundary comes into play. After imposing the level set restrictions, the width of the bounds on the conditional staying probabilities falls to around 0.5. Thus, while still wide, there is some information in the level set restrictions. Second, under uniform and independent misclassification errors (Assumptions 2(ii) and 6), Panels IB and IIB reveal that the level set restrictions continue to have some identifying power. The shape restrictions continue to add no new information. The bounds under uniform errors in Table 5 have a width of 0.4 unless the boundary comes into play. After imposing the level set restrictions, the width of the bounds on the conditional staying probabilities falls to around 0.3. For example, bounds on the probability of remaining in the bottom tercile over the four-year period in the pooled sample under uniform errors alone are [0.485, 0.885] ( Now, if Q is increased but the remaining assumptions are maintained, suppose the bounds for p * kl (x j ), j = 1, 2, widen to [0.10, 0.30] and [0.25, 0.45], respectively. The level restrictions now yield identical, tighter bounds on p * kl (x j ), j = 1, 2, given by [0.25, 0.30]. Thus, the increase in Q allows the level set restrictions to now be plausible, leading to significantly tighter bounds. The tighter bounds reflect not just the higher Q, but also the ability to impose the level set restrictions.

CONCLUSION
That self-reported income contains complex, nonclassical measurement error is a well-established fact. That administrative data on income is imperfect is also relatively incontrovertible. As such, addressing measurement error in the study of income mobility should no longer be optional. To that end, several recent attempts to address measurement error have been put forth. Here, we offer a new and complementary approach based on the partial identification of transition matrices.
Among others, our approach has the advantage of transparency, as the assumptions used to tighten the bounds are easily understood and may be imposed in any combination depending on the particular context and the beliefs of the researcher. Moreover, our approach only requires data at two points in time. Finally, our approach extends easily to applications other than income. The primary drawback to our approach is the lack of point identification. Consequently, our approach should be viewed as a complement to existing approaches that produce point estimates under more stringent (or, at least, alternative) identifying assumptions. Using data from the SIPP, we show that relatively small amounts of measurement error leads to bounds that can be quite wide in the absence of other information or restrictions. However, the restrictions we consider contain significant identifying power. We are hopeful that future work will consider additional restrictions that may be used to further tighten the bounds on transition probabilities, as well as bounds on additional summary measures of mobility derived from the transition matrix.

SUPPLEMENTARY MATERIALS
Appendix A: Literature review. Appendix B: Misclassification probabilities. Appendix C: Proofs of propositions. Appendix D: Simulated misclassification rates. Appendix E: Supplemental tables.