Measures of Campaign Negativity: Comparing Approaches and Eliminating Partisan Bias

We compare measures of the tone of parties’ campaigns in the 2015 General Election in England, based on, respectively, coverage of parties’ campaigns in six national newspapers, citizens’ perceptions, and expert judgments. It is the most extensive study of such measurements outside the United States and one of very few to include expert judgments. We find that citizen perceptions and expert judgments are heavily affected by partisan bias. We show how these biases can be eliminated with a regression-based procedure. After such adjustment, seven of the eight resulting measures of parties’ campaign tone (five based on newspapers, one on citizen perceptions, and one on expert judgments) are strongly correlated. The eighth measure (based on one of the newspapers) depicts the tone of parties’ campaigns very differently owing to different criteria of what to cover in a campaign. Each of the three kinds of empirical information is adequate as a basis for measuring parties’ campaign tone, but adjustment for partisan biases is essential for perception and judgment data. Common apprehensions about the “subjectivity” of citizen perceptions are not justified, while expert judgments are equally useful, as long as sufficient information is available to eliminate their partisan bias.


Introduction
The extensive literature on negative campaigning and its effects on voters' attitudes and behavior contains very few studies that assess the quality and validity of campaign tone measures. Notable exceptions are Ridout and Franz (2008), Sigelman and Kugler (2003), Lipsitz and Geer (2017), Allen and Stevens (2015), Walter and Vliegenthart (2010), Gélineau and Blais (2015). Yet such assessments are imperative in view of widespread scholarly disagreement about substantive findings and the suggestion that these disagreements are at least in part attributable to differences in measurement (Ridout and Franz 2008;Sigelman and Kugler 2003). Moreover, such assessments are invaluable for future studies to choose optimally from different approaches to measure the tone of parties' campaigns. This study contributes to such an evaluation by comparing different measures of campaign tone in the context of the British General Election of 2015. As far as we know, this is the most extensive study of this kind in a multiparty context, and one of the few that includes expert judgments. We compare measures of parties' campaign tone based on newspaper coverage, citizens' perceptions, and expert judgments. We demonstrate that citizens' perceptions and expert judgments are strongly biased by partisan preferences, and we present a procedure to eliminate such biases. After this correction, we find that parties' campaign tones are described very similarly when using citizens' perceptions, expert judgments, and five of the six newspapers included in the study. This study concludes not only that convergent validity of most of our measures is highly satisfactory but also that not all newspapers cover parties' campaigns in similar ways, which demonstrates the problematic nature of aggregation across different newspapers.
This paper starts with a brief discussion on conceptualization, before reviewing existing studies of measurement and instrumentation. It then discusses measuring the tone of parties' campaigns from different kinds of empirical information: voter perceptions, expert judgments, and newspaper content. We then present a novel and simple procedure for eliminating partisan biases in perception and judgment data. Finally, we compare the information from our different measures and find substantial levels of agreement between most of them (i.e., acceptable levels of convergent validity).

Conceptual Considerations
Evaluating existing measures of negative campaigning requires clarity of what should be measured. This involves two inter-connected aspects. The first is the specification of the phenomenon to be measured (cf. Adcock and Collier 2001). We subscribe to the most common, and generally accepted definition in the literature, that specifies negative campaigning as attacking one's political opponent(s) based on their character, abilities, accomplishments, and policy stands (cf. Geer 2006;Lau and Pomper 2004). A party (or candidate) is, thus, more negative in its campaigning the more it focuses on critiquing opponents, and the less it focuses on its own abilities, accomplishments, and policy stands.
The second aspect of conceptual clarification involves the (epistemological) clarification of the "cases" that are to be measured or described. The definition of negative campaigning implies that the concept applies to actors-parties, candidates, other individuals, groups or institutions-that are involved in election campaigns. This may appear obvious, but it is sometimes disregarded in the literature when measures of "negativity" are introduced that pertain not to actors but to the "overall" campaign (i.e., the ensemble of behaviors of all competing actors). 1 Of course, an overall campaign can, in principle, also be characterized in its "negativity," but then only based on first having measured the extent of negative campaigning of the actors involved and combining these separate measures in a theoretically and empirically defensible manner (cf. Lazarsfeld and Menzel 1961).
In this paper, we focus on political parties as actors, which are represented in their campaigns by candidates, officials, and spokespersons. The election campaign of a party is, thus, the subject of measurements, which we refer to, interchangeably, as negative campaigning, negativity, or campaign tone. We acknowledge that, in some way, the notion of "the" campaign of a party is problematic since some of parties' campaign communications are micro-targeted to very specific audiences (cf. Elmelund-Praesteker 2010). What we focus on here is the campaign that is not personalized or narrowly targeted but aimed at a "general" audience. In this respect, we follow the same approach that is dominant in the extant literature.

Existing Work
Research on negative campaigning uses a large variety of empirical indicators, including voters' perceptions (e.g., Donovan et al. 2016;Pattie et al. 2011), experts' judgments (Lipsitz and Geer 2017;Nai and Maier 2018), content analysis of controlled campaign communications (Dolezal et al. 2016;Hoppmann et al. 2018;Walter 2014), and uncontrolled campaign communications (de Nooy and Kleinnijenhuis 2013;Sigelman and Shiraev 2002;Song et al. 2017). An obvious question, addressed by most studies that focus on instrumentation and measurement issues, is whether using different kinds of empirical information leads to different substantive findings. Yet their conclusions are far from congruent.
Some studies conclude that substantive results about campaign tone are the same across different empirical indicators. Ridout and Franz (2008), for example, measure campaign tone using, respectively, campaign advertisements that were aired, advertisements that were produced; voters' perceptions; and election coverage in newspapers. They find-for U.S. Senate races in the 1998-2002 period-that all these measures are closely related and that substantive findings are rarely affected by replacing one measure with another. In a similar vein, Walter and Vliegenthart (2010) compare campaign tone as indicated by television ads, televised election debates, and newspaper coverage in the 2006 Dutch parliamentary election campaign. The extent of negative campaigning is very similar for all these indicators, but the specific forms of negative campaigning vary, with personalized attacks being most prominent in newspapers.
Other studies, however, do not find such congruence of findings when using different kinds of information. Sigelman and Kugler (2003) use the 1998 American National Election Study Pilot to compare voters' perceptions of the campaign tone in their state with tone as measured by newspaper coverage and advertisements. They find no congruence, and attribute this to perceptions being biased by partisanship, political information (more informed people perceiving more negativity), and timing in the campaign (more perceived negativity toward later stages of the campaign). Lipsitz and Geer (2017) compare perceptions of negative campaigning between ordinary citizens and scholars who were presented ads from the 2012 U.S. presidential election. Often, the two groups have strongly different perceptions (between 25 and 75 percent disagreement, depending on specific groups being compared) owing to partisan biases in both groups.
Another group of studies compares negativity of texts measured by content analysis with judgments of relevant experts. Such comparisons are particularly important as a form of validation of coding procedures. Gélineau and Blais (2015) compare judgments of experts with a content analysis of television and web-based ads in the context of the 2012 Quebec General election. Rankings of parties' campaign tone correlate strongly. Similarly, Ansolabehere et al. (1994) use two expert judges (prominent consultants for, respectively, the Republicans and Democrats) to validate their content analysis of news coverage data. They report that the two judges agree on the tone of all but one of thirty-five races in the 1992 Senate elections when asked to classify them as positive, negative, or mixed. Finally, Lau et al. (1999) compared negativity of campaign ads as coded directly and the negativity of those same ads as inferred from secondary (newspaper) accounts. They report that both measures lead to the same substantive conclusions, which suggests that newspaper coverage of parties' campaign communication is not unduly biased.
When taken all together, these studies do not lead to very strong conclusions or recommendations. Some scholars suggest that choice of indicators of negative campaigning can be based sensibly on convenience grounds (Ansolabehere et al. 1994;Gélineau and Blais 2015;Lau et al. 1999;Ridout and Franz 2008). Others identify strong biasing factors, particularly in voters' perceptions but also in expert judgments (Lipsitz and Geer 2017;Sigelman and Kugler 2003). Walter and Vliegenthart (2010) identify channel-specific factors, not in the degree of negativity but in the forms by which it expresses itself. In view of the relatively small number of studies that pay explicit attention to issues of measurement and instrumentation, this variety of findings is perhaps not surprising. No two of these studies are directly comparable in terms of the kind of elections, the kind of empirical indicators, the precise procedures for comparison, and the wider socio-political context, and all these differences are likely to affect the findings and comparisons.

Context: Electoral System, Party Systems, and Media
We study the 2015 General Election in the United Kingdom. These parliamentary elections are conducted in 650 single-member constituencies using the First-Past-the-Post (FPTP) electoral system. The parties, party systems, and media systems differ across the four countries of the United Kingdom-England, Scotland, Wales, and Northern Ireland-which affects party competition and campaigning. To avoid resulting issues of comparability, we restrict ourselves in this paper to England (which contains almost 84 percent of all eligible voters in the United Kingdom, and 82 percent of the seats in the House of Commons).
Conservatives and Labour are the largest and politically most important parties, and the only ones with realistic hopes of leading the government and electing the prime minister. The previous elections of 2010 resulted in a hung parliament and a coalition government of Conservatives and Liberal Democrats. It was generally expected that 2015 would again produce a hung parliament, and the actual outcome of an absolute majority won by the Conservatives was a total surprise (cf. Cowley and Kavanagh 2016;Green and Prosser 2016). Despite the prominent position of the two major parties, electoral competition is not restricted to these two parties. Over 27 percent of the votes in England in 2015 were cast for other than the two major parties, mainly the Liberal Democrats, UKIP (United Kingdom Independence Party) and the Greens (cf. Green and Prosser 2016). Moreover, about half of all citizens held virtually equally strong electoral preferences for at least two of the five parties (van der Eijk and Fox 2015). The English party system is, thus, a multiparty system, skewed toward the two major parties, but with other parties of considerable importance in terms of voter preferences and party competition.
We focus for our measurements of negative campaigning on the period from the dissolution of parliament (March 30) to polling day (May 7). During this period, British voters experience the campaigns of the various parties through a variety of channels, including, 2 • • party-controlled communications, including televised or radio broadcasted Party Election Broadcasts (PEBs); print and online advertising; pamphlets, leaflets and billboards; Facebook pages and Twitter accounts of parties and candidates; • • semicontrolled communications, including televised election debates; • • uncontrolled communication, particularly newspaper and television coverage of parties, candidates, and the campaign. The English media system is generally viewed as between the free market liberal model of the United States and the more regulated democratic corporatist model of many Northern European countries (Curran et al. 2010;Scammell and Langer 2006;Semetko et al. 1991). Newspaper circulation is relatively high in the United Kingdom in comparison with the United States, and the press is characterized by commercial ownership and national circulation (Sanders and Hanna 2012;Scammell and Langer 2006). Newspapers can be divided along the lines of quality versus tabloid press, and in terms of partisanship. Most newspapers openly support one of the main parties, although their endorsements sometimes change across elections (Wring and Deacon 2010; see Leach et al. 2011 for an overview over time). Because they serve distinct audiences, competition between the quality press and the tabloid press is minimal, while competition between tabloid newspapers is fierce and strengthened by their national distribution (Leach et al. 2011;Semetko et al. 1991). In contrast to newspapers, television news in Britain is highly regulated and required by law to be impartial (McNair 2003;Scammell and Semetko 2008).
Paid political advertising on television is prohibited, but major parties receive free broadcasting time on public and commercial television (PEBs) (Holtz-Bacha and Kaid 2006: 10; Leach et al. 2011;Scammell and Langer 2006: 65).

Content Analysis of Newspaper Campaign Coverage
Many researchers use media coverage of campaigns as their empirical basis for measuring the tone of parties' campaigns (cf. Buell and Sigelman 2009;de Nooy and Kleinnijenhuis 2013;Djupe and Peterson 2002;Kahn and Kenney 1999;Lau and Pomper 2004;Papp and Patkos 2019;Sigelman and Shiraev 2002;Song et al. 2017;Walter 2019). News values and commercial pressures contribute to such coverage to exaggerate negativity, however (cf. Benoit et al. 2003;Lau and Pomper 2004). Implicit in many studies using media coverage is the assumption that this negativity bias is the same across parties, making it a valid basis for measuring the (relative degree of) negativity in parties' campaigns. In this study, we assess whether campaign coverage results in similar descriptions of parties' campaigns as when using other kinds of data. We, therefore, conduct a content analysis of newspaper coverage of the campaigns of English parties.
We selected six newspapers: The Sun, The Daily Mail, The Daily Mirror, The Guardian, The Daily Telegraph, and The Independent. These are the largest in categories defined by tabloid versus quality papers and partisan color (see Supplementary  Information Table A1 for more detailed characterization of these papers). We coded 5,019 articles that were kindly made available to us by the research group "Media in Context and the 2015 General Election" at the University of Exeter, and that were identified as campaign coverage by a trained computational classifier applied to Lexis-Nexis (see the supplementary information for details). Coding was performed by four trained PhD students, using an adapted version of Geer's (2006) fine-grained and widely used coding frame. The unit to be coded is the "appeal"-any singular statement by a political party or politician containing self-praise or criticism of an opponent. In all, 18,943 appeals were coded qua tone (criticism of an opponent vs. self-praise) and source (the party from which the criticism or self-praise originated). The supplementary information provides further details about the method, reliability, and distributions across newspapers and political parties.
The coded material consists only of quotes and paraphrases from parties, and reflects thus campaign behavior of the parties involved (as far as covered by a newspaper; this approach is also used by Buell and Sigelman 2009;Lau and Pomper 2004;Sigelman and Shiraev 2002;Walter 2019;Walter and Vliegenthart 2010). 3 We refrain from aggregating the results across newspaper titles, as that is only sensible if they are all highly similar in what they cover from parties' campaign behavior and communications, which cannot be taken for granted. As it turns out, they are not, as we show later where we compare all our measures of parties' negative campaigning. 4

Citizens' Perceptions of Negative Campaigning
Surveys are frequently used for measuring the tone of parties' election campaigns, particularly when researchers are also interested in the effect of campaign tone on respondents' attitudes and behaviors. When using citizens' perceptions to measure the tone of campaigns, they are implicitly regarded as "judges" (in Coombs' 1964 sense of the word). That assumes that respondents can be regarded as (stochastic) replications and that the variation in their "judgments" is random and unrelated to any antecedent voter characteristics of relevance. If true, the average of individual perceptions is a useful measure of the tone of campaigns. This assumption is untenable, however, given the consistent findings of strong associations between perceptions of campaign tone and partisan orientations (cf. Lipsitz and Geer 2017;Sigelman and Kugler 2003). Perceptions can, therefore, not be used without further ado to characterize the tone of parties' campaigns. However, we will shortly demonstrate that these biases can be modelled and offset, and that doing so leads to valid "adjusted" measures of campaign tone.
Our analyses are based on data from the 2015 British Election Study Internet Panel (BESIP, see Fieldhouse et al. 2015). We use information from waves 4, 5, and 6. Wave 4 was conducted before the formal campaign (March 4-30, 2015); wave 5 took place during the formal campaign (March 31 to May 6, 2015), and wave 6 is a postelection wave of the panel (May 8-26, 2015). We use the 19,123 English respondents who took part in all three of these waves.
We asked respondents for their perceptions of the tone of the election campaigns of each of the five parties that campaigned in England, thus following the approach of, for example, Brooks (1997) and Pattie et al. (2011), and not the approach of, for example, the 1998 American National Election Study (ANES) Pilot Study (Sapiro et al. 1999) where a single perception of the entire campaign was solicited. As discussed earlier, we feel that the latter is not a valid measure of the campaigning behavior of political parties, as it implicitly (and counterintuitively) assumes that citizens do not distinguish between parties or candidates when experiencing a campaign.
Questions about the perception of campaigns are quite often formulated in terms of negative and positive (cf. Ridout and Fowler 2012;Ridout and Franz 2008;Sides et al. 2010;Sigelman and Kugler 2003;Stevens 2012). As Mattes and Redlawsk (2014: 52) report, such formulations contribute to partisan and social desirability bias. To minimize this, and to enhance construct validity by linking the operationalization closer to the theoretical construct of negative campaigning (see earlier), we employed the following question: In their campaigns political parties can focus on criticizing the policies and personalities of other parties, or they can focus on putting forward their own policies and personalities. What is, in your view, the focus of the national campaign of the [fill in party name]?
In England, this question was asked for each of the following parties: Conservatives, Labour, Liberal Democrats, Greens, UKIP. Responses could be given on a 5-point scale: 1 = focuses on criticizing the policies and personalities of other parties, 5 = focuses on putting forward their own policies and personalities.
When analyzing the responses, we find-as expected-that respondents' party preferences are strongly related to the perception of parties' campaign tones. Across all parties, we find that the campaign of the party that a respondent voted for is perceived much more positively than the campaigns of other parties. Across all parties, this difference is 1.22 (in wave 5) and 1.16 (in wave 6) on the 5-point response scale, a partisan difference that is highly significant (see Tables A4 and A5 in the supplementary information). This understates the extent of party preference bias, as this bias also affects perceptions of parties that respondents did not vote for but that they considered highly attractive as an option to vote for. For all parties, we find a strong and virtually linear relationship between the electoral attractiveness of a party and the extent to which that party's campaign is seen as focusing on its own policies and personalities (see "Discussion and Conclusion" and Figure A1 in the supplementary information).
These strong partisan biases imply that respondents' perceptions (or their central tendency) cannot be validly used to characterize and compare the tone of parties' campaigns. But we can adjust these perceptions for this bias. A simple way to do so is by a regression-based procedure that will be described shortly. The resulting adjusted perception scores (from which partisan biases have been eliminated) can realistically be regarded as stochastic replications, so that their averages (per party) can be used as measures of parties' campaign tones. We will then compare these measures of parties' campaign tones with similar measures derived from newspapers (discussed earlier) and from expert judgments (to be discussed next).

Expert Judgments to Measure Negative Campaigning
Citizens' perceptions of political phenomena are generally marked by misperceptions, misattributions, random noise, and systematic biases (cf. Huckfeldt et al. 1998;Mattes and Redlawsk 2014;Pattie et al. 2011;Sigelman and Kugler 2003). These shortcomings derive from lack of motivation, time, or cognitive sophistication to follow the content of election campaigns. To circumvent these problems, social scientists resort increasingly to surveying experts. This use of experts is quite influential when looking at the impact of studies such as the Chapel Hill Expert Survey (see www.chesdata.eu) and the Quality of Government Expert Survey (https://qog.pol. gu.se/). Experts are also used to measure negative campaigning, although yet on a limited scale (see Abbe et al. 2001;Gélineau and Blais 2015;Lipsitz and Geer 2017;Nai and Maier 2018;Patterson and Shea 2004;Weaver-Lariscy and Tinkham 1996). Nai recently started a large cross-national survey using expert judgments to measure negative campaigning. 5 In this study, we used election agents as experts. Our choice requires that we clarify the role of this typically British institution, which does not have a direct equivalent in many countries. Moreover, we need to explain why these actors can be considered as experts, and what the main strengths and weaknesses are of considering them as such.
In U.K. elections, every candidate is required by law to have an election agent, who is legally responsible for the campaign of that candidate. Although a candidate may be his or her own election agent, this is very rarely done. Election agents are responsible for ensuring that a campaign is run within the law, a responsibility that extends to fundraising and the authorization of expenses, approving the content of campaign communications, and filing required information to relevant regulatory authorities. Although their legal responsibilities do not include acting as a campaign manager (which is not a legally defined role), the two functions are often combined in a single person. Election agents can, thus, be regarded as a mixture of campaign managers and accounting officers (Fisher et al. 2006). Because of this, they are generally directly involved in the management and organization of the campaign in their constituency, including in implicit or explicit decision making about "going (or not going) negative." Given their deep involvement in partisan campaigns, one may wonder why we consider election agents as experts. The term expert is often associated with impartiality and absence of partisan preferences. Academics, journalists, or marketing specialists who are not directly involved in a campaign are then seen as plausible experts but not campaign managers. We disagree with this perspective for three reasons. First, it defines experts somewhat as unicorns, mythical beings that we do not find in the real world. Indeed, various studies demonstrate that scholars, journalists, and other allegedly impartial "experts" are not at all without biases, including partisan biases (cf. Budge 2000; Lipsitz and Geer 2017;Powell 1989;Steenbergen and Marks 2007;Whitefield et al. 2007;Albright and Mair 2011). Second, the presence of bias in experts' judgments is relatively innocuous if we can correct for it. This notion is well established in the field of the methodology of expert-surveys where it is fully acknowledged that experts of all kinds may well be biased in their judgments and that the way forward is to "model and purge known sources of bias" (Maestas et al. 2014: 358; see also Curini 2010). That is exactly what we will do when adjusting responses of election agents for partisan bias (just as we apply such adjustments to citizens' responses). Third, such adjustment for partisan bias allows us to define experts not on a negative criterion (i.e., absence of bias) but on a positive criterion instead. Experts should be more informed, based on first-hand experience of (in our case) election campaigns. Apart from general cognitive capabilities, this requires two conditions: having wideranging opportunities to observe what happens in the campaigns of the various parties and having the motivation to be (and to remain) as fully informed as possible about these campaigns. For election agents, both conditions are fulfilled amply, probably better than for many other alleged experts (such as academics) who often have to rely on secondary sources (including election agents) for empirical information. For these reasons, we consider election agents as relevant experts. Our reasoning here is similar to that of Ansolabehere et al. (1994) who employed highly partisan campaign consultants as experts.
A survey of election agents was conducted in 2015 by Fisher et al. (2016), 6 which included exactly the same questions about the tone of the campaign of their own and of other parties as had been included in the BESIP (see previous section). We use the data from all English agents (n = 968), who represent candidates in 482 constituencies.
Given their role in the campaign, it is not surprising that the answers provided by election agents are biased by their party affiliation. Across all parties, we find that agents perceived the campaign of their own party as more focused on its own policies and personalities than the campaigns of other parties. These differences are on average 1.43 on a 5-point scale, which is even more pronounced than the corresponding differences among ordinary citizens, discussed in the previous section. 7 Therefore, and just as we did for citizens' perceptions of campaign tones, we model and offset this bias, using the same regression procedure as discussed earlier, using as independent variable agents' party affiliation (represented by dummies). Here, too, this procedure results in adjusted judgments of campaign tone that can be realistically regarded as stochastic replications, and their averages (per party) are, therefore, useful indicators of parties' campaign tones. In the following section, we compare these expert-based measures of campaign tones with similar measures derived from newspapers and from citizens' judgments (both discussed earlier).

Adjusting Citizen and Expert Responses for Partisan Bias
Our procedure to adjust citizens' and experts' views of parties' campaign tone requires two kinds of variables. First, we need to measure perceptions of campaign tone for each of the parties. In both our citizen and expert surveys, we have five such perceptions, one for each of the parties that campaigned in England. Second, we need at least one indicator of respondents' partisan orientations. This may be (actual or intended) party choice, multiple (non-ipsative) party preferences, or any other variable that is deemed to be a valid indicator of partisan orientations. Because many citizens in contemporary multiparty systems have strong preferences for more than just one party (cf. Kroh et al. 2007), multiple party preferences capture partisan orientations more comprehensively than a simpler indicator such as party choice. We, therefore, use the socalled "propensity to vote" measure (cf. van der Eijk et al. 2006), which has been asked for each of the five parties in the mass survey we analyze. The expert survey we analyze provides us only a single indicator of partisan orientations, the party for which the expert works as an election agent.
Regression can be used to eliminate partisan bias because it allows two components to be distinguished in the biased perception of campaign tone (the dependent variable). One component is the regression prediction, which is accounted for by partisan preferences. The other component is the residual, which is independent of such preferences and can, thus, be regarded as the perception from which partisan biases have been eliminated. 8 However, the means of these perceptions from which biases have been eliminated cannot be used to characterize the tone of parties' campaigns because they are (by definition) zero, for each of the parties. This problem can be solved by rearranging the data into a so-called stacked matrix, where the rows are dyads of respondents and parties. 9 Each respondent is, therefore, represented (in our case) by five records; for Respondent 1, these records are Respondent1-Conservative, Respondent1-Labour, Respondent1-Liberal Democrat, Respondent1-UKIP, and Respondent1-Green. In this stacked matrix, the five campaign perception questions can now be represented as a single variable, which can be regressed on a similarly stacked variable that reflects ties with the respective parties. A visual illustration of this stacking is provided in the supplementary information. In this way, a single regression relates the perceptions of all parties' campaigns to partisan measures for the same parties. Although the mean of all residuals is zero, this is not the case for the residuals for each of the party-stacks; these party-specific means of residuals can, thus, be used as measures of parties' campaign tones that are free from partisan biases.
The consequences of this procedure to eliminate partisan bias for the scores of the parties are reported in Table A7 in the supplementary information, and their consequences for the correlations between the various measures are reported in Table 1, which is discussed in the following section.

Comparing Measures of Campaign Tones
We used three kinds of empirical information to obtain eight different measures of the tone of the election campaigns of the English parties in the 2015 General Election. Media coverage of parties' appeals yielded six measures, one for each newspaper that we coded. Citizens' perceptions and expert judgments each yielded one measure, resulting in a total of eight measures of parties' campaign tones. Which of these is most appropriate to use depends on one's theoretical framework and research questions. But these may be compatible with several of these measures. The (relative) construct validity of each measure is then to be assessed mainly in terms of convergent validity, that is, the extent to which these measures lead to similar (ideally: identical) results (cf. Campbell and Fiske 1959). Traditionally, such similarity is assessed by correlations, presented in Table 1. Yet correlations reflect only part of the differences and similarities between the various measures. Therefore, we first present in Figure 1 a visualization of the (relative) negativity of each party's campaign according to these eight measures. To compare these measures, they have first to be expressed in a common metric. The original measures based on newspaper coverage are percentages of "positive" (or conversely "negative") appeals per party and per newspaper (for details, see section A1.3 of the supplementary information). The measures based on the mass and expert surveys consist of means of perceptions from which partisan biases have been eliminated (see earlier and section A4 of the supplementary information). To express these different measures uniformly, we standardized them to a mean of 0 and a standard deviation of 1 (see Table A7 in the supplementary information for parties' scores on these measures). Figure 1 shows the campaign tone for each party according to the various measures. A party with a "positive" campaign tone (compared with the other parties) has a. Measures indicated as "(a)" have been adjusted to eliminate partisan bias; measures indicated as "(r)" have not been adjusted for partisan bias.
a positive score; a party that is "negative" (compared with the other parties) has a negative score. We see that (to the right of Figure 1), each of the eight different measures describes the campaign of the Greens as positive. We also see that all eight measures describe the campaign of Liberal Democrats as negative, but that some do so much more distinctly than others. Based on the coverage of The Daily Mirror, The Daily Telegraph, and The Guardian, the negativity of the LibDems' campaign was in excess of one standard deviation, while citizens and experts assessed their campaign as much less negative. The largest differences between the various measures involve the Conservatives, UKIP, and Labour. According to the measure based on The Daily Mirror, the Conservatives waged a very positive campaign, but all the other measures consistently suggest that their campaign was negative. The opposite holds for UKIP: its campaign is described as positive by five of our eight measures but as distinctly negative by (again) The Daily Mirror (and as slightly negative by The Sun and expert judgments). All eight measures agree that the Greens fought a positive campaign (according to seven out of our eight measures, theirs was the single most positive campaign) and that Labour and the LibDems were predominantly negative. Conservatives are often seen as waging a negative campaign (with the measure based on the Daily Mirror being once again very different), and UKIP was more often than not seen as positive (but with three measures pointing in the opposite direction). Figure 1 also allows a comparison between the different measures, which clearly shows that The Daily Mirror's coverage provided substantially different information about parties' campaigns than the other newspapers, or than the experiences of citizens and experts. 10 Figure 1 also shows that campaign tone measures based on citizen perceptions 11 and on expert judgments are very similar. Moreover, these two survey-based measures are also very similar to measures based on newspaper coverage (again, apart from The Daily Mirror and, to a lesser extent, The Sun). When we assess the correlations between the different measures, shown in Table 1, we see that, apart from the measure based on The Daily Mirror's campaign coverage, all correlations are strongly positive and of a reasonably high magnitude, many exceeding .80. Table 1 also includes measures based on citizen perceptions and on expert judgments that were not adjusted to eliminate partisan bias. We find that the adjustments for partisan bias increased the correlations between some measures but decreased the correlations between others. It is noteworthy that the correlation between the citizenbased and expert-based measures increases from .70 to .83 when partisan bias is eliminated from both. The average correlation between citizen-based, and all six media-based measures does not increase as a consequence of adjustment for partisan biases. 12 This average correlation declines from .80 to (a still quite respectable) .75 as a result of eliminating partisan biases. For expert judgments, however, adjustment for partisan biases strengthens the average correlation between the expert-based measure, and media-based measures increases from .76 to .86.
Most importantly, Table 1 shows that the various measures of campaign tone are strongly correlated with each other (with the exception of the measure based on coverage by The Daily Mirror). Moreover, elimination of partisan biases from citizen-based and expert-based measures strengthens, on balance, their relationship with other measures and with each other.

Discussion and Conclusion
We measured the tone of parties' campaigns in the 2015 General Election in England, using voters' perceptions, expert judgments, and newspaper campaign coverage of six national newspapers. This breadth of indicators makes this the most extensive study of the measurement of negative campaigning in a non-U.S. context, and one of the first to include expert judgments.
This study leads to several major conclusions. First, we find considerable similarity between measures based on different kinds of empirical evidence (and thus also on different epistemological perspectives). From the standpoint of convergent validity, this implies that all three kinds of empirical bases for measuring campaign tone lead to viable indicators. Hence, when using these instruments in research, the findings will be in broad terms the same irrespective of which of these indicators is used. That, in turn, suggests that the lack of agreement in the extant literature about consequences of negative campaigning is more likely generated by incomplete specification of contexts and conditions of negative campaigning than by the use of different measurement instruments.
The second important conclusion is that, despite a considerable degree of convergent validity between the three kinds of measures, among measures based on newspaper coverage of campaigns, the choice of newspaper may make a large difference. Measures of campaign tone based on the coverage of five of the six newspapers included in this study provide roughly the same description of the (relative) negativity or positivity of parties' campaigns. When using the coverage of the sixth newspaper (The Daily Mirror), however, we observe a very different picture (similar differences between newspaper-based measures have been reported by Sigelman and Shiraev 2002). It is obvious that these differences reflect contrasting criteria for the selection of events and communications to be included in the paper's campaign coverage. What drives these differences in news selection remains to be ascertained, but a "simple" partisan explanation is implausible, since The Daily Mirror leans toward Labour, but its coverage of the Conservatives' campaign included many more "positive" statements than any of the other newspapers. These differences demonstrate that the U.K. newspaper industry is far from monolithic in how it covers campaigns. That, in turn, highlights the risks of measuring campaign tone on the basis of a single newspaper's coverage, assuming that the findings can be generalized to the entire industry. One might wonder whether "averaging" the various newspaper-based measures of campaign tone would be sensible. Implied in doing so is the assumption that the differences between various newspaper-based measures are generated by random error (rather than different criteria for coverage, and thus systematic bias). In the absence of a clear understanding of what drives the differences in newspapers' campaign coverage, we feel that assumption to be rather dubious, and we would, therefore, not advocate such an "averaged" composite measure.
The third major finding of our study is that both citizens' perceptions and expert judgments of the tone of parties' campaigns are strongly biased by partisan orientations. This is in line with many similar findings in the literature. We demonstrated that these biases can be modelled and eliminated, and that doing so increases their mutual correlation, and, on balance, also increases their correlations with measures based on newspaper campaign coverage. Eliminating such biases, thus, increases the (convergent) validity of these perceptual or judgment measures. In turn, this calls into question the common apprehension that citizens' perceptions are wholly subjective (Brooks 1997) and only tenuously related to inter-subjectively shared information. An implication of these insights is that both mass surveys and expert surveys of parties' campaign tone should include sufficient additional information to allow the identification, modelling, and elimination of suspected biases.
A final conclusion that stands out from our findings is that it is somewhat simplistic to speak about an election campaign in the singular. Different parties and candidates run campaigns with different mixes of positive and negative contents, and such differences are clearly observable, irrespective of whether one uses newspaper coverage, perceptions, or expert judgments.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported in this paper is part of the research project "CSNCC: Comparative Study of Negative Campaigning and its Consequences." This work was supported by a Marie Curie Intra-European Fellowship (n° 629012: FP7-PEOPLE-2013-IEF).

Notes
1. Studies that ignore the actor-focus include Sigelman and Kugler (2003) and (Donovan et al. (2016). An interesting study by Sides et al. (2010) asks respondents about perceptions of negative campaigning by specific candidates, as well as about the negativity of the campaign as a whole but fails, however, to assess whether the latter adds relevant information beyond the former. 2. Nonparty actors also contribute to the overall campaign and public debate about the election, but given our focus on campaigning by political parties, we can ignore these. 3. Other studies code all newspaper content about campaigns (see Benoit et al. 2005;de Nooy and Kleinnijenhuis 2013; Ridout and Franz 2008;Song et al. 2017). We feel that less appropriate for characterizing party-specific campaigns. 4. Sigelman and Shiraev (2002) also experience that campaign coverage by different papers leads to quite different measures of parties' negative campaigning in their study of the 1996 and 2000 presidential campaigns in Russia. 5. Negative Campaigning Comparative Expert Survey Database (NEG_ex), see http://www. alessandro-nai.com/#!negative-campaigning-comparative-data/x181a. 6. Constituency Campaigning in the 2015 British General Election Study, deposited in the U.K. Data Archive. 7. Detailed descriptions of the judgments by election agents are reported in Table A6 in the supplementary information. 8. The logic of distinguishing "useful" and "contaminated" variance components that is applied here is similar to that in instrumental variable analysis. 9. Stacking data lies at the heart of conditional logit analysis (when the dependent variable is ipsative), and has become well-established for the analysis of non-ipsative party preferences (cf. van der Eijk 2017; van der Eijk et al. 2006). 10. These findings prompted us to check the coding of party appeals covered by the Daily Mirror and an additional sample of appeals covered by other newspapers. This did not lead to any need to revise the coded data. 11. The measure based on citizens' perceptions in Figure 1 and Table 1 is based on data from wave 6 of the panel (i.e., immediately after the elections). Using perceptions from wave 5 (during the campaign) yields exceedingly similar results (not reported separately). 12. This was calculated as the square root of the average of the squared unrounded correlations reported in Table 1.

Supplemental Material
Supplemental material for this article is available online.