Does Class Size Matter in Postgraduate Education?

The paper examines the impact of class size on postgraduate grades using administrative data from one of the largest Schools of a Russell Group University in the UK. As well as estimating Fixed Effects models on the population of postgraduate candidates in the School, we exploit a policy change aimed at reducing class size to implement a regression discontinuity design (RDD). We find that class size does impact grades adversely overall; and the policy aimed at reducing class size impacts grades favourably. Our findings are robust to alternative specifications as well as being supported by the validity tests we conducted.


Introduction and Motivation
The issue of class-size and its effect on student learning has been extensively studied in primary and secondary school settings. The weight of the evidence to date supports the view that smaller classes promote student learning (see Angrist and Lavy 1999 for extensive review). In the context of tertiary education, where independent learning is a major part of the education landscape, class-size may not be as important. However, this may not mean that the question of class-size is altogether unimportant in this context given the resource implications of larger class-sizes, including the ease with which students access their professors. Such concerns may be particularly valid given two important recent developments in the higher education sector in the UK and elsewhere in the OECD. First, there has been a significant rise in tertiary education, which is driven by supply side policies in these countries as Bandiera et al. (2010) noted. Secondly, there have been changes in the higher education funding environment, particularly following the great recession of 2008, with a view to sustainable financing of higher education. The Government abolished paying teaching grants to universities in September 2012, and the cap on tuition fees has been raised significantly to make up for funding shortfalls (Crawford et al. 2014). The changes mean that most universities now source a significant part of their funding from tuition fees. 1 This has made the postgraduate sector, particularly the more lucrative international postgraduate market, a lot more attractive to universities generally and, in particular, the Russell Group universities given their established international reputation.
There is extensive literature linking class-size to test scores in schools (Kruger, 1999;Kruger and Whitmore, 2001;Angrist and Lavy, 1999;Browning and Heinesen, 2007;Leuven et al., 2008;Hanushek, 1979;Hoxby, 2000;Case and Deaton, 1999). However, there is a dearth of evidence relating to tertiary education. De Paola and Scoppa (2011), Monks and Schmidt (2011), Bandiera et al. (2010), Kokkelenberg et al. (2008) and De Paola et al. (2013) are some of the few recent studies, which found class-size having a negative effect on college scholastic outcomes reaffirming earlier findings in Gibbs et al. (1996) in the context of tertiary education, which found that students in larger classes perform less well. In a recent study Huxley et al. (2018) have reported significant variation in teaching intensity across higher education in the UK, which they attribute to variation in class-size. Examining the link 1 Barr and Turner (2013) dwell on these in the context of the US, which they describe as growing conflict between expanding enrolments in postsecondary education and contracting public budget support. The independent Browne Review also dealt with the issue of sustainable HE funding in the UK with the recommendation that more of the burden of funding HE be placed on graduates. between tertiary-level class-size and test score is therefore a worthy exercise particularly given the on-going debate on the future of higher education funding and the recent recommendation for reducing the cap on tuition fee. 2 In this paper, we provide evidence of the link between class-size and postgraduate grades.
Similarly to Bandiera et al. (2010), we first estimate this relationship by means of student fixed effects regressions. The outcome of this approach, however, may still suffer given the possibility that students self-select into modules for reasons that are unobservable to the researcher. In an ideal experiment, one would control for such unobservables by randomly allocating students into modules of different sizes. The empirical design used in this paper, mimics this ideal experiment by exploiting a recent policy change, which is aimed at maintaining a high standard of teaching, that envisaged double teaching (double-up, hereinafter) on the basis of the specific number of students enrolled to a certain module.
Specifically, module convenors with enrolment size above a certain level were expected to split students into two groups and double teach the module content as a result of the policy.
Our use of the discontinuous module enrolment function and the policy change to examine the link between class-size and postgraduate grades is likely to yield superior instrumental variables estimates. 3 We use administrative data of a postgraduate programme of one of the largest Schools of a Russell Group public university in the UK to examine the impact of the class-size policy on postgraduate grades in a Regression Discontinuity Design (RDD). Our empirical results echo earlier research and suggest that indeed class size matter significantly for postgraduate education. Specifically, our RDD estimates suggest that students exposed to the double-up policy earned significantly higher grades vis-à-vis their counterparts who were not affected by the double teaching policy. We find these results to be strongest for British students.
Moreover, our results suggest that the double-up policy significantly reduced the probability of students failing in their module. Importantly, our results are robust to a variety of specifications. The RDD estimates are insensitive to the inclusion (or exclusion) of various control variables and different functional form specifications of the running variable. More importantly, we demonstrate that our results are unlikely to be driven by discontinuities in pre-intervention individual characteristics, or endogenous sorting around the threshold.
education, but these developments are likely to lend some prominence to existing concerns about increasing student-to-staff ratio and worse student performance in large class settings. 4 The same issues have become increasingly relevant in tertiary education, particularly given the current funding climate in higher education institutions. Examining the link between postgraduate class-size and postgraduate scholastic outcomes is therefore of significant policy interest both for the higher education sector and postgraduate students alike, which this paper aims to achieve.
The remainder paper is organised as follows. Section 2 describes the data used and institutional setting. Section 3 outlines the identification strategy used and discusses the results. Section 4 presents the sensitivity analysis before the final Section concludes the paper.

Data and Institutional Background
We use administrative data covering the population of all enrolees in the postgraduate (PG) programme of one of the largest Schools of a Russell Group public university in the UK for the academic year 2017/18. The PG programme has some 16 specialities in total and recruitment to the programme depends on applicants' prior academic achievement at an undergraduate level, which is typically a strong 2.1 or equivalent, language proficiency in the case of international applicants and candidates' character references. Candidates join one of 15 speciality areas once enrolled. The programme requires students to attend a set of core and elective modules. A candidate's performance in modular final examinations leads to the award of modular scores that range between 0 and 100. 5 The scores obtained are typically averaged across programme modules to yield overall postgraduate grade, which gets translated into four distinct degree classifications: distinction (>70%), merit/credit (60-70%), pass (50-60%) and fail (<50%). The school's success in attracting increasing number of postgraduate applicants in recent years and the desire to maintain a high standard of teaching prompted the school to pursue a policy of double teaching (double-up hereinafter) since 2008 depending on module enrolment size. The policy stipulates that module convenors with enrolment size above a certain level split students into two groups and double teach. 6 The policy has evolved over the years with varying cut-off points for triggering the double-up. Since 2016 the recommended cut-off point has been 110 students per module, so that a module with enrolment size in excess of 110 students become candidate module to double teach by splitting students enrolled on the module into two (or more) smaller classes.
The study sample includes 987 full-time MSc students, who could attend up to 10 different modules during the academic year, yielding a total of 7,696 student-module observations.
The data includes information on students' modular grades, the number of students enrolled in each module and a set of student attributes including age, gender, and nationality. Table 1 reports basic summary statistics, which indicates the typical postgraduate student being 23 years of age and 70% of them being females. 7 Academically, students in the sample achieve an overall mark of 61%, on average, with 20.1%, 42.5% and 34.1% of the students achieving a distinction, merit and pass degree classifications, respectively, while 3.4% failed.

Baseline Analysis
The paper first conducts a baseline analysis on the link between class-size and grades following Hanushek (1979) and using the following panel data model: where , represents the standardised module (test) score of student on module ; is a student-specific fixed effect; is the number of students enrolled in module , and is the main parameter of interest, as it measure the effect of class-size on grades; , ′ is a set of control variables that might affect students' outcomes, which include student characteristics, such as age, gender, and country of origin, as well as module characteristics, such as a whether the module was core or optional, or whether the module was taught in the spring term. Finally, , is a random error term. 6 The policy was not compulsory, however, and there are some exceptions to itfor example depending on classroom size, where some modules with large-size lecture halls may opt out of the double-up. 7 These are figures comparable to the national average postgraduate student characteristics in England in 2016/17 as compiled by the Higher Education Student Statistics (https://www.hesa.ac.uk/news/11-01-2018/sfr247-higher-education-student-statistics/numbers).
[ Table 2 about here] Table 2 reports the descriptive results based on Equation (1). Columns (1) and (2) report estimated coefficients from a pooled regression model, while columns (3) and (4) Table reveal that girls and British students tend to achieve better modular scores than their male and international counterparts, respectively. The results also suggest that, on average, students tend to perform better in spring term and in optional modules, which is intuitive in some sense given that students are likely to enrol in optional modules that they expect to perform better at.
In order for the coefficient estimated from Equation (1) to be reliable, it requires the assumption [ , | , , ′ , ] = 0 to be satisfied. This assumption entails that, conditional on characteristics, , ′ , and student-specific fixed effects, , students select into modules of different sizes randomly. This may be a strong assumption if students were to sort into modules of different sizes according to their preferences and/or their idiosyncratic gains. For example, as noted by Bandiera et al. (2010), students may choose modules of smaller sizes to maximise their time spent with professors and minimise their effort into a specific subject.
This type of sorting would lead to a downward bias in the estimated coefficients. In this study we tackle this potential issue by exploiting the double-up policy mentioned above in a regression discontinuity design framework.

Regression Discontinuity Design
We use a Regression Discontinuity Design (RDD) to estimate the effect of the double-up on modular grade. The RDD design was first introduced by Thistlethwaite and Campbell (1960) and then formalised by Hahn et al. (2001) who derived the necessary conditions for identification of causal effects. RDD are becoming increasingly popular in empirical studies given that the assumptions needed for identification of causal effects are quite weak. The defining feature of this class of models is that the probability of receiving the treatment changes discontinuously as a function of an assignment variable being above or below a certain cut-off point. The underlying idea of a RDD is that, as in a randomised experiment, for individuals just above and below the pre-identified cut-off point, assignment to treatment is as good as random. Drawing from Angrist and Lavy (1999), in this case we exploit the fact that the probability of receiving double-up changes discontinuously depending on enrolment number in a specific module, which we denote by , being above a certain cut-off point 0 .
There are two types of RDD: the sharp and the fuzzy design (see Trochim, 1984). In the sharp design, treatment status depends deterministically on the running variable being above or below the cut-off point 0 . In contrast, in the fuzzy design the probability of receiving the treatment is known to be discontinuous in the cut-off point s 0 , but it is not a deterministic function of and selection on unobservables may therefore still be an issue. As noted earlier, the double-up policy might not have been implemented in some cases even when ≥ 110. This makes it essential that we implement the fuzzy RDD design.
Formally, let be the (discrete) running variable of enrolment size on a module, , and the cut-off point of interest is 0 =110, such that for students enrolled in modules with ≥ 0 , the probability of being exposed to the double-up jumps from zero to positive. Further, define an assignment to double-up rule = 1( ≥ 0 ), and let be the double-up indicator, that identify students in modules where double-up took place. Then we can write: where, due to the discontinuity at the cut-off point, 1 ( ) ≠ 0 ( ). 9 In the spirit of Hahn et al. (2001), a regression framework for a fuzzy RDD is offered by the instrumental variable approach so that: where ( and without the polynomial term, (. ). We chose the parametric approach to preserve sample size, but results are robust when performing the non-parametric approach (see next Section). 11 I.e., as in experimental framework where all subjects who receive treatment are compliers (see Bloom, 1984).
Here we do not have always takers (see extensive discussion on this in Angrist and Piscke (2008), pp. 161-166).
We next test the robustness of these findings in a regression framework, as specified in Equation (3), controlling for a number of other confounding factors. Table 3 reports results from this exercise, which represent first stage estimates of the effect of class size on the probability of being exposed to the double-up policy.
[ Table 3 about here] The results confirm the graphical intuition of Figure one and show that individuals in classes of size 110 and above, who were assigned to the double teaching, were significantly more likely to have actually been exposed to the policy than their counterparts who were not. Table 4 reports RDD estimates of the effect of the double-up policy on grades, using the (partially fuzzy) RDD approach. Our preferred specification accounts for student fixed effects, thus controlling for predetermined unobservable characteristics such as academic ability and family background. Column (1) reports results from the basic specification with a second order polynomial term with no additional covariates. Column (2) includes the full set of covariates as explained above, while columns (3) and (4) include, respectively, third and fourth degree polynomial terms in addition to test the robustness of the results. Also, the RDD estimates in Table 4 and below are clustered by the running variable as suggested by Lee and Card (2008). 12 The results show a significant and positive effect of the double-up policy on students' academic performances. Specifically, estimated coefficients in column (2) suggest that students exposed to the double-up policy achieved, on average, significantly higher modular grade than did their counterparts without double-up. These estimates are robust to the inclusion of various controls and different functional form choices. 13 [ Table 4 about here] To check the robustness of our results, we estimated non-parametric RDD as specified in equation (3), but focusing only on the sub-sample of subjects within arbitrarily small 12 Lee and Card (2008)'s clustering approach is the standard approach to date, which involves clustering using observations with similar/comparable values in the running variable around the cut-off as members of the same cluster. However, in their recent paper Kloesar and Rothe (2018) have recommended against this approach. 13 In Appendix Table A.2, we also report RDD estimates for a number of falsification tests, which show that no significant effect is found for different cut-off points windows around the cut-off point. Results from this analysis are reported in Table 5, which broadly confirm that our results are robust. 14 [ Table 5 about here] In table 6, we report heterogeneous effects of the double-up policy by disaggregating the sample into gender and broad nationality categories. The results reveal hardly any genderspecific difference in the effects of the double-up policy. On the other hand, the policy appears to have a differential effect linked to the broad nationality category. Specifically, the students who benefitted from the double-up policy are British students. This may suggest that international students with a relative inexperience of the higher education culture and their relative lack of fluency in the medium of instruction do not appear to gain from the double-up policy as their British counterparts do.
[ Table 6 about here] Finally, in Table 7 we investigate the potential heterogeneous effects of class-size, and report RDD estimates of the effects of the double-up policy separately by degree classifications. The results indicate that the policy significantly decreased the probability that students achieved the bottom degree classifications as can be gleaned from Column 4.
Overall, the results presented provide compelling evidence that class size does matter, and that students who were exposed to the double-up policy achieved significantly higher grades.
In the section that follows, we present a series of tests and checks to confirm the validity of these findings.

Validity
The main assumption, which needs to be satisfied for our identification strategy to produce unbiased estimates, is the continuity assumption. Borrowing from the jargon of the treatment effects literature, let { 1 , 0 } be the potential outcome for individual in case of treatment and in the absence of treatment, respectively. Then, the continuity assumption that needs to be satisfied for the validity of a partially fuzzy RDD can be formally written as follows: where 0 + and 0 − represent, respectively, students just above and below the cut-off point, 0 . In our case, this assumption entails that students enrolled in modules just above and just below the pre-identified cut-off level are identical in every respect, both in terms of observables and unobservables, but only differ in the probability of being exposed to the double-up policy. A direct way to assess the validity of this assumption is to examine if preintervention variables do not change discontinuously around the cut-off point. Figure 3 depicts the results from a set of local polynomial smoothing (LPS) regressions of variables such as age, gender, and country of origin that could not have been affected by the policy.
Accordingly, we do not observe a statistically significant discontinuity at the cut-off point for any of these variables.
[ Figure 3 about here] As noted by McCrary (2008), however, the continuity assumption may be invalidated in cases where the treatment assignment rule is a public knowledge. Specifically, if people knew about the discontinuous nature of the assignment to treatment mechanism, those who expect to benefit from the double-up would manipulate the running variable, , in order to receive the intervention, and selection bias would still be an issue. Given the nature of the double-up policy, it is quite unlikely that this would happen in our setting. However, in order to dispel any potential concerns about students sorting around the running variable, we also implemented the "donut hole" approach suggested by Barreca et al. (2016). 15 The main idea behind this approach is that units closest to the cut-off are those most likely to have engaged in manipulation. Consequently, excluding such units from the analysis would eliminate any potential concern. In Table 8  around the cut-off. As the results reported in Table 8

Conclusion
The paper examined the link between class-size and postgraduate grades using administrative data covering the population of candidates in one of the largest Schools of a Russell Group public university in the UK. As well as estimating fixed effects regressions we exploited a policy change aimed at reducing class-size to construct instrumental variables estimates of the impact of class-size on postgraduate grades using regression discontinuity design (RDD). We found that class-size impacts modular grades adversely confirming the well-established link between class-size and student performance. On the other hand, the policy designed to reduce class-size is found to have a significant positive impact on postgraduate modular grades. Importantly, we also found that the policy has reduced the probability that postgraduate students fail in their programme of study.
As noted earlier, supply side policies have led to a significant increase in tertiary level education overall. On the other hand, the changing funding environment in higher education institutions has made student fees a vital part of higher education funding, particularly at the PG level. This has renewed institutions' drive to recruit more students thus further reinforcing the effects of the supply side policies. In turn, this has revived some of the preexisting concerns regarding student-to-staff ratio and the quality of tertiary education.
Against this background, there has been a dearth of evidence on the effect of class-size on postgraduate grades, which this paper attempted to contribute to.
As noted earlier, it is apparent that independent learning is an integral part of the tertiary education landscape. However, the current funding climate within the higher education sector may make the question of class-size and its impact on student outcomes in that context all the more important. The recent Auger Review into post-18 education and funding has, for example, recommended reducing the cap on tuition fee. Faced with the prospect of reduction in the cap on tuition fee, higher education institutions may increase student intake to make up for funding shortfalls. If so, this is likely to have implications for class-size and student performance in tertiary education. The recent government undertakings (DFE 2018) to understand the role of contact hours and class-size highlights such concerns and underscores the importance of contact hours and class-size in determining student performance. Given these, the findings in this paper regarding double teaching are likely to be informative for policy makers, higher education institutions and students alike. As Huxley et al. (2017) noted, however, double teaching may not always be beneficial if doing so were to compromise teaching quality in some sense. Future work may usefully examine this issue in a wider context than has been done in this paper. The table reports summary statistics of the sample of interest.
1 Note: The table presents baseline estimates of the eect of class size on student overall mark. The dierence in sample size is due to the fact that for this exercise we exclude students who were exposed to the double-up teaching. Robust standard errors in parentheses. * p < 0.1, ** p < 0.05, *** p < 0.01.
2 Figure 1: Graphical Analysis -First Stage Note: The gure shows local polynomial estimates of the probability of being exposed to the double-up teaching as a function of module enrolment size.      Note: The gure shows local polynomial estimates of pre-treatment variables as a function of enrolment size. Note: The gure shows number of student registered at the University over the last ve years.  Results are conditional on a vector of covariates as shown in table 3. Robust standard errors in parentheses are clustered by unique values of the running variable. * p < 0.1, ** p < 0.05, *** p < 0.01.