Assessing institutional empathy in medical settings

The use of role-play in standardised medical assessments raises many problems, particularly around the measurement of interpersonal skills such as ‘empathy’. Research on the use of simulations has tended to focus on quantified, psychometric assessments of their reliability and validity. However, communication, which often forms a central part of the assessment in medical simulations, is a difficult matter to address through this post hoc analytic method. A sociolinguistic approach to analysing real recordings of simulated medical assessments allows greater insight into communication and interpersonal skills. We use our research on medical licensing exams to illustrate some of the current questions about the examining of such skills through understanding the fine-grained detail of talk. Drawing on Goffman to critique simulated relationships and on Gumperz to analyse the role of differences in communicative style, and based on microanalysis of video-recorded role-players and candidates, we argue that the focus on interpersonal skills in standardised assessments amplifies the problem of using simulated empathy and requires additional interactional work. This focus on interpersonal skills in such assessments can lead to inequalities, since an unfair weight may be put upon candidates trained overseas. The paper concludes that the debate on ‘language’ in such exams and their standardisation in superdiverse societies needs to be reset.


Introduction
In a rather dark set of essays about empathy, the medical actor Leslie Jamison (2014) writes about how standardised, simulated consultations are used to assess medical students' and trainees' clinical skills. While these skills include such obvious categories as diagnostic eliciting and reasoning and clinical management, a significant component concerns what is variously called interpersonal effectiveness, compassion, affective behaviour and empathy. Jamison, as a standardised role-playing patient, is trained in the importance of checklist item 31: 'Voiced empathy for my situation/problem' where students 'have to say the right words to get credit for compassion' (Jamison 2014: 3). She writes: I grow accustomed to comments that feel aggressive in their formulaic insistence: that must be really hard [to have a dying baby], that must be really hard [to be afraid that you'll have a seizure in the middle of the grocery store], that must be really hard [to carry in your womb the bacterial evidence of cheating on your husband]. (Jamison 2014: 4-5, italics and square brackets in original).
These glimpses into the experience of managing simulated empathy hold within them many conflicting positions: (1) the assumed value of standardised, simulated assessments; (2) the taken-for-granted importance of empathy, within the interpersonal skills area, both as a quality for all health professionals that should be assessed and one that can be assessed through simulated consultations; and (3) issues of fairness arising from these two. This article is structured around these three themes and refers to what is, at present, a small set of studies analysing simulated consultations from a linguistic and ethnographic perspective. We argue that the analysis of the fine-grained detail of the talk of standardised assessment and engaging with the larger concerns that shape the local interaction, such as professional discourses and institutional regimes, contribute to a growing debate about reliance on 'the unreal' (Greenhalgh 2014) in medical education and high-stakes assessments. In this paper, we draw on our research on the licensing exam of one professional body in the UK based on what are widely known as Objective Structural Clinical Examinations (OSCEs), and we suggest that the arguments we put forward are relevant for many such gatekeeping assessments.

Standardised, simulated consultations
Simulated consultations are widely used in Australia, Europe and North America, in undergraduate training and interim and final examinations (Bradley 2006;Cleland et al. 2009;Khan et al. 2013) as well as also for gaining membership of professional medical associations responsible for postgraduate curricula and licensing examinations (Swanson et al. 2013;First et al. 2013; atkins and roberts 13 Khan et al. 2013). Their reputation in the medical world as the gold standard assessment is widely asserted (Boulet et al. 2009). In medical training, simulations provide opportunities for practice, reflection and focusing on certain skills, particularly technical skills (Korkiakangas et al. 2015), and so less importance is attached to authenticity. However, when used in exams, where they are known as standardised assessments or Objective Structured Clinical Examinations (OSCEs), they must be seen to be a fair and authentic reflection of real-life consulting skills. The widespread adoption of these standardised exams offers a replicable experience (Boulet et al. 2009;Cleland et al. 2009;Swanson and van der Vleuten 2013) readily able to be subjected to psychometric validation, thus aligning with the quantitative paradigms and models that are the hallmark of medicine.
The great bulk of research on simulated consultations has used psychometric tests to address reliability in terms of consistency and predictability (Brannick et al. 2011;Lievens and Sackett 2012) but fails to ask more fundamental questions about how like reality the interactions themselves are (Malhotra et al. 2009 being an exception), how power relations in the exam setting play out in the interaction (Atkins, forthcoming) and to what extent such simulations may disadvantage certain groups. Sociolinguistics and linguistic ethnography, in looking at the fine-grain detail of interactions and situating these within professional and institutional practices, can begin to answer these questions Roberts et al. 2014;Seale et al. 2007;Skelton 2009, 2013;O'Grady and Candlin 2013;Niements 2013).
In medical education, Niements (2013) looks at role-plays of interpretermediated consultations, describing how they 'cannot reproduce the orientations of real interactions. […] [W]hat is authentic to those users when they "live" a specific situation cannot be authentic to trainers/trainees when they play it' (Niements 2013: 317). De la Croix and Skelton (2009) similarly find that the reality of the medical encounter is difficult to 'reproduce' in a simulation. They make a detailed sociolinguistic study on interactional power in medical simulations in an evidence-based, corpus linguistic analysis of undergraduate OSCEs. They find role-players talk and interrupt more than candidates, citing this as evidence of the conversational dominance role-players exert in a way that a patient in a real encounter tends not to do. However, there is still considerable scope for better understanding the asymmetric participation structures that can be established in simulated encounters and how these are instantiated in the moment-by-moment interaction.
Within medical research, any doubts about OSCE-type exams are often trumped by the psychometric findings on reliability and most types of validity. How simulated consultations are experienced differently from real ones and how this, in turn, may have a differential impact on certain groups of 14 assessing empathy in medical settings candidates is masked by the methodologies of standardisation and statistical analysis and their disengagement from the local and the contextual. In addition to the psychometric evidence, these exams are defended on two somewhat contradictory grounds: that the exams are a proxy for the real -a good enough mimic of reality (so it does not matter that they are a set performance) -and that the simulated behaviour they require is appropriate since consultations are a performance anyway (so in some sense real). Goffman's extensive discussions of how we experience face-to-face interaction disturb both of these assumptions. Goffman (2005) argues that the sense of feeling an activity is real depends upon our sense of self as we relate to others. Each interaction creates and reinforces a shared reality to keep the relationship going. It is through interactional frames, Goffman argues, that interactants feel real to each other and evaluate each other as they display their understanding of situated intent. But the simulated consultation is a complex encounter where multiple frames are at play (Goffman 1974: 156-200). For example, in an OSCE-style exam, the frame of showing empathy to a role-playing patient is nested in a frame of displaying competence to an examiner, which in turn is nested in the institutional frame of the overall assessment process. So what matters is not how emotionally and sincerely connected the candidate feels to the role-player but how far they are seen as 'empathic' by the examiner. The real quality of empathy is assessed through the unreal -the display of empathy for institutional assessment (Seale et al. 2007;De la Croix and Skelton 2013).
While one defence of exams using standardised patients is that they can be real enough -their simulated character is not attended to -they are also defended on the grounds that they are a performance and that all consultations are, in effect, 'performed' . This is a common response to those candidates who voice concern that OSCE examinations test acting skills as much as clinical. Again, Goffman helps with unpacking the subtleties of what is meant by performance and acting. The 'frontstage' performance (Goffman 1959) required, for example, of a doctor, teacher or waiter involves constraints on behaviour. And this behaviour requires a heightened performance when monitored and assessed for institutional purposes. But these frontstage and institutional performances are different from a simulated performance, where the event is acted. The actor has to be convincing within the terms of the play (in this case the simulated consultation), while monitoring their own behaviour and so sustaining a 'dual consciousness' (Konijin 2000). The trainee/ candidate therefore has to work hard to create a synthetic reality -one that convinces the audience/observer -but not one that is real to candidates in terms of consequences for patients. Whereas the role-player's task is to be authentic within the terms of the simulation/drama, the candidate's task is to display themselves as they would in a real consultation while also monitoring their conduct vis-à-vis the examiner and maintaining the illusion. This is hard communicative work (Thomassen 2009).
An additional complexity in the drama of the exam is some shift in power from the doctor to the role-player (De la Croix and Skelton 2009). Textbooks and training on communication and interpersonal skills assume an asymmetry, as Jamison's 'that must be really hard' illustrates, with the trainee/candidate doctor gracing the 'patient' with their understanding. Being 'empathic' to someone who is in a more powerful interactional position than oneself is harder than it is to someone relatively less powerful, and puts additional demands on acting skills; and role-players' power is made even more difficult to manage, since while their behaviour is ostensibly standardised, precisely how they react and interact depends upon their instinctive resources, meaning that it is near impossible to standardise interaction completely. Role-players are instructed to react in specified ways to the candidate's behaviour but exactly how they do this is up to them (Atkins et al. 2016). It is important to stress that there is no simple inversion from powerful doctor (candidate) to powerful patient (role-player). Rather, it is that the role-player has much more freedom in how the interaction unfolds and the potential to claim power at that micro-level. Sometimes they are actually very compliant (even overly so), but (very often with candidates who are nervous or make a slight mistake) the role player has much more freedom to claim some power -withhold responses, highlight mistakes and so on.
A close look at the simulated consultation reveals a highly complex and hybrid activity. As well as multiple roles and identities at play, considerable interactional work has to be carried out to sustain the illusion of reality and to avoid or cover up any overt frame-breaking which would undermine this illusion (Seale et al. 2007: 181). The fact that not just the role-players, but also the candidate doctors themselves, are required to act is avoided in the discourse of the exams. This is, though, voiced backstage by some examiners, as in this feedback session looking at exam video clips: I wonder whether he was having trouble acting this particular consultation. He just seemed a little bit kind of remote from him. I just wondered if he was having an acting problem this candidate (examiner feedback from Roberts et al. 2014: Appendix D 47-54).
The experience of being in a role-play is fundamentally different from a reallife consultation (Niements 2013), and it is in the very elastic notion of 'interpersonal' that candidates' capacity to sustain the illusion of reality is most crucial and most vulnerable to negative criticism. 16 assessing empathy in medical settings

Patient-centredness and interpersonal skills
The major discursive shift from doctor-centred to patient-centred consulting over the last 50 years (Balint 1970;Stewart 2001) has brought with it changes in how to relate and communicate with patients. While the asymmetric relationship between doctor and patient has remained largely intact in contemporary healthcare interactions, the discourses of patient-centredness, and so the importance of interpersonal skills, have become thoroughly embedded in training and assessment. It is argued that interpersonal skills (IPS) should be rated alongside and as an equal partner with the bio-psychosocial approach (Howie et al. 2004). This discursive shift is part of the more seismic movement in how individuals and relationships are conceptualised. This is the shift to the neo-liberal focus on the self and soft skills in which the social and cultural self becomes the centre of attention in assessment, as much as or even rather than acquired technical expertise (Grugulis and Vincent 2009). So, it is no surprise that the soft skills of interpersonal effectiveness and communication generally (and the distinction between them is not always clear) and 'clinical empathy' (Halpern 2003) in particular, appear in virtually all models of consultation and are taught as a matter of course (Neighbour 1987;Kurtz et al. 1998;Hojat et al. 2004;Bonvicini et al. 2009).
Most research and assessment of interpersonal and communication skills has been modelled on medical sciences. Such modelling reduces the complexity of these phenomena to patterns and procedures that can be statistically analysed within the same psychometric paradigm mentioned above (Skelton 2005;Iedema et al. 2006). This has led, despite some concerns (Campion et al. 2002;Howie et al. 2004), to a relatively unproblematic stance on the assessment of interpersonal effectiveness. However, sociolinguistic analysis has identified several overarching problems which suggest that the interpersonal is not easy to pin down. Firstly, IPS cannot be readily categorised as a separate domain, since the manner in which all aspects of the simulated consultation are carried out contributes to interpersonal evaluation (Roberts et al. 2014). The interpersonal leaks into everything. Secondly, the assessment tools are conceived as universal, and not relative to particular social groups: '[M]uch work has still to be done to make performance measurement into a culturesensitive and equitable science' (Howie et al. 2004: 464). Thirdly, despite their 'objective' naming, the interpersonal in such exams is, necessarily, subjectively assessed and, in settings of intense social evaluation, small differences and difficulties are amplified (Gumperz 1982).
Arguably, just as the interpersonal is the most problematic component to assess in standardised, simulated consultations, so the concept of empathy is the most untameable element within the interpersonal domain, intensifying the problems outlined above in assessing IPS. The notion of empathy is widely taken for granted as part of good doctoring in mainstream medicine. In textbooks and training on consultation and communication skills, trainees are encouraged to voice empathy just as role-players and examiners are expected to look out for and acknowledge it, as widely used empathy models attest (Suchman et al. 1997). A typical example of modelling empathy is found in the Cambridge-Calgary guidelines on good clinical communication: 'Work out exact phrases which demonstrate empathy in specific situations' (Kurtz et al. 1998: 134). It is hardly surprising, therefore, that candidates voice standard empathy utterances, given the direction to perform these 'phrases' from the medical communication literature. However, from within sociology and ethics the notion of empathy and its 'emotional labour' (Larson and Yao 2005) is contested from three different perspectives: -is it a quality that is essential for good doctoring? -if it is, how can it be best understood and assessed? -if it is to be assessed in OSCE-type exams, can simulated empathy be assessed as real empathy?
The assumption that empathy is a moral requirement for the practice of good medicine has begun to be critiqued. Bouma (2008) argues that there are important ethical values which trump empathy, such as patient autonomy, and that doctors can be caring, ethical professionals without showing 'true' empathy, defined as sharing patients' emotions and being moved by their suffering (Halpern 2003). Similarly, it is suggested that doctors cannot be the superior moral beings that 'true' empathy requires, and recommend 'etiquette' -that is, being clear and courteous -as an acceptable goal: '[W]e need to make far more moderate claims about what is being taught, and how and why' (Smajdor et al. 2011: 383). But despite such criticisms of the place of empathy in doctoring, it remains a dominant theme in the discourses and practices of training and assessment. While there is some acknowledgement in the medical education literature that empathy is hard to define and measure (Hojat et al. 2004;Stepien and Baernstein 2006;Marshall and Bleakley 2009;Pedersen 2009), the fundamental problem remains the difficulty in objectively judging aspects of IPS such as empathy, rapport or sincerity from the outside. Empathy is an inner state, experienced (or not) only by someone to whom it is directed (Seale et al. 2007). Only those who are on the receiving end of intended empathy or its lack can say how they felt about that moment of the interaction.
In OSCE-type exams, this inner state cannot be judged, since the roleplayer is simulating their pain and their concerns. All that can be judged is the candidates' display of empathy and, as Jamison points out, empathy and assessing empathy in medical settings compassion must be 'voiced' (Jamison 2014: 3-4) so that it is explicitly markable by the examiner. The display of empathy phrases also raises questions about whether soft skills such as empathy and rapport are, in real life, always verbalised explicitly or whether such skills are often conveyed much more indirectly and subliminally. A corpus linguistic study of how health professionals show compassion (Brown et al. 2006) shows that explicit compassion phrases are not frequently used.
In our study of an OSCE-type exam for licensing health professionals, successful candidates manage this displayed empathy well and indeed use empathy phrases more frequently than unsuccessful candidates (Roberts et al. 2014). And yet in examiner feedback sessions, where examiners were asked to comment on clips from video-recorded candidates, there were frequent criticisms that some candidates sounded formulaic and insincere. For example: 'It seems just very formulaic and a lot of it seems learned: "I understand why you would be worried"' (examiner feedback from Roberts et al. 2014: Appendix D, 41). Clearly the frequency of these phrases was not the issue, as high-performing candidates used them more often and they index the patient-centred model that frames such exams. What is perceived as 'trained empathy'  stems from both the design of the empathy phrases and how they are delivered. Successful candidates customised their voiced empathy, redesigning these phrases by adding small discourse markers and conversational phatics to relax these standardised phrases, as in the example below (here with punctuation added for clarity): 'Do you know what -I really do understand that' . So, in order to meet the requirements of the exam and yet not sound formulaic, candidates have to do additional communicative work. As well as the 'double consciousness' (Konijin 2000) of both acting authentically within the terms of the drama (assessment within a patient-centred model) and monitoring one's performance, there is a third element -voicing 'empathy' laminated with 'sincerity' linguistic tokens; what we might call a 'triple consciousness' is at work.
Candidates also have to deal with how they sound in context. We draw heavily on Gumperz's early work (Gumperz 1982: 100 -129) on prosody and how differences can lead to misevaluation, misunderstanding and gaps between speaker intention and pragmatic assessment. And it is now possible to benefit from software that allows us to analyse the waveforms and pitch of each line of a speaker's talk (MacWhinney 2000; ELAN 2017). So we can examine single utterances to see how different aspects of prosody (pitch, volume, rhythm and intonation contours) work together to give off contextual cues which, in turn, may form a particular impression. While the negative effects of different accents (including aspects of prosody) in assessing overseas doctors have been generally discussed (Hoekje 2011), the microanalysis of prosodic features indicates how complex and nuanced these different features atkins and roberts 19 can be and how difficult to notice and understand unless recorded and subject to sociolinguistic analysis. In addition, the power relations in simulated consultations are likely to amplify negative attitudes. Although in real medical encounters such attitudes might be mitigated because of the relative high status of doctors, as Rubin et al. (1997) suggest, in the simulation, where the power dynamic is shifted, the same allowances for differences in accent and prosody may well not occur. The following examples look at prosody for two candidates and the assessments that accompanied their overall performance.
The first example, in Extract 1, is a case in which the role-player patient (RPL) has requested a medication not ordinarily provided by the UK National Health Service. The candidate (CAN) achieves full marks across all domains. This extract begins six minutes into the consultation, when the patient returns to the topic of his requested prescription (see Appendix for transcription conventions). The role-player simulates that he is disgruntled and in order to display 'empathy' with him and his frustrations the candidate uses a typical phrase at line 486. Here we can analyse the 'prosodic contour' with which she delivers the utterance. The rise and fall in volume and pitch are shown in Figure 1.
In local British English, information units are produced and processed in smooth prosodic 'envelope contours' , with the volume and the pitch register generally following one another: a chunk of information is given within one single prosodic contour, the pitch typically going down at the end, as this candidate's smoothly does. The emphasis at the opening of the utterance, which is a little higher and louder than the rest, triggers a marked affect. After the false start at line 486 'No I do' the candidate says 'do you know what -I really do understand that' with a high pitch and raised volume at the onset of the main utterance 'do' . This whole line is enclosed in this one smooth enveloping affective contour, without a pause, and is rounded off at the end with a drop in 20 assessing empathy in medical settings pitch. She conveys expressiveness with the higher tone and makes the whole package sound smoothly and sincerely delivered. She also customises her 'understanding' phrase, to make it sound less formulaic and more sincere. Firstly, she is able to adapt the wording a little, so that it sounds less like one of the rote-learned phrases that were commented on by examiners: 'No I do -do you know what -I really do understand that' . Particularly the small emphasis marker 'really' and the more conversational focusing and framing prefix -'do you know what' -would seem to stress her sincerity. She also follows up her expression with a longer account of why she can understand his frustrations (lines 487-490), at a point in the consultation when she can take up a little more time. In combination, these multi-channelled features of communication produce a highly rated performance. And although such small moments do not a whole consultation make, cumulatively the voiced empathy and the work done to mitigate any formulaic dangersmuch of it below the level of consciousness as the examiner listens in -is a masterful exemplum of the 'language game' which constitutes standardised examinations.

Issues of fairness
The degree to which simulated consultations, and specifically the simulation of interpersonal skills, are valid means of assessment is of critical concern to any institution using them. An equally pressing concern is one of fairness in contexts of superdiversity (Vertovec 2007), particularly when there is evidence that ethnic minority examination candidates in UK, many of whom are international medical graduates (IMGs), are faring less well than UK-trained colleagues (McManus et al. 2013: 8).
In particular, our research on an OSCE-style licensing exam in the UK suggests that the assessment of simulated empathy may be a contributing factor in the low success rates of IMGs. In other words, the design of such exams can put a burden on some candidates that others do not feel. In Bourdieu's (1977) terms, they are like fish who feel the constant weight of the water they swim in. Many features of the 'hidden curriculum' in postgraduate professional assessment can produce this weight. They cannot be fully discussed in this article (see Roberts et al. 2014), but they include the talk-heavy character of the exam (candidates on average talked for a greater amount of the floor time, approximately 68% in simulated consultation, in contrast to about 60% in real consultations) and its relatively decontextualised environment, with, for example, no shaping role of the computer in the surgery (Swingelhurst et al. 2012). In the specific context of empathy assessment, it is the difference between how UK medical graduates and IMGs use and deliver empathy phrases. As mentioned above, UK graduates can customise and conversationalise these phrases to 22 assessing empathy in medical settings inoculate them against criticisms of formulaic consulting. However, IMGs, whose consulting practice overseas was unlikely to focus on such patientcentred manoeuvres, not only had to learn appropriate empathy utterances but also then learn that such phrases had to be redesigned.
More profoundly, the prosodic systems used in the varieties of English spoken by most IMGs differ from those of UK graduates (although of course many of these candidates may be increasingly influenced by the local English they hear around them). Again, this analysis draws on Gumperz's work on prosody and contextualisation cues (Gumperz 1982: 173-186) and how differences can lead to misevaluation and misunderstanding and gaps between speaker intention and pragmatic assessment. In contrast to the prosodic analysis of the successful UK graduate discussed above, the following case is of a more borderline candidate, who still passes the assessment but with a much lower score, achieving his lowest mark in the IPS domain.
The extract below comes from a complex case concerning a child protection issue, in which the role-playing mother (RPL) is concerned that her partner's brother is 'cuddling' their son. The candidate (CAN), of South Asian origin, does badly in IPS, achieving only 1 out of 3 and the examiners tick the following feedback comments: 'Does not appear to develop rapport or show sensitivity for the patient's feelings'; 'Does not make adequate use of verbal and non-verbal cues, poor active listening skills' .
The particular expression of understanding analysed in Extract 2 comes at lines 100-102, just after the mother has described her concerns. We can see lines 100-102 perform the similar interpersonal function of voicing understanding for the patient's situation as in Extract 1, albeit in a very different exam case. The three-part turn, reformulating the original phrase twice, suggests very strongly that this candidate both knows about displaying explicit empathy and intends to show that he is 'empathic' within the terms of the exam. The words he uses are very similar to the stock phrases all candidates use. If we look at the prosodic patterning, though, it is different from the candidate in Extract 1 and potentially causes some difficulties in how the expressions are received. His pitch is much lower, but it also varies less, not rising and falling as the volume of his talk does (Figure 2). This candidate draws on aspects of North Indian languages such as Hindi in his use of volume, pitch and intonation. Word order is more flexible in languages like Hindi and a lot of the work that is done in English with stress and intonation is done by juxtaposing words and phrases (Gumperz 1982: 119-129). These prosodic patterns are quite common in this system of English influenced by North Indian languages where small units of talk are juxtaposed: 'right / I can see why you are concerned / and er / it's not something to to / be taken lightly / and I can see where you're coming from' . Each unit uses the same melody, so it sounds like a list and potentially rather formulaic to some ears (Gumperz 1982: 149). And while the volume is raised at the beginning of each information package, the pitch remains quite low and flat. In standard or local British English this could sound uncaring but in Hindi and in English influenced by North Indian languages low pitch is conventionally a marker of respect or conveying bad news (Gumperz 1982: 184).
Findings around the intonation of these token expressions are complex to discuss in the bigger picture of clinical skills assessment. Interpretation of meaning and attitude through prosody is largely unconscious and automatic, a result of learning to interact with others who use features 'like us' . This is clearly not something that is explicitly assessed, but nevertheless may impact on how utterances are understood and judged. In an exam where conventional interpersonal expressions pervade much of the talk, candidates' prosodic delivery is an important part of whether they stand out as 'formulaic' , despite only marginal differences in the words used. It would seem from the grade and comments given to this candidate that such small differences are amplified under the intense gaze of the examiner and that this candidate is sanctioned for a lack of 'rapport' .
In feedback sessions, where examiners were asked to comment on segments of video-recorded candidate performances, many of the IMGs were perceived as formulaic even though the empathy phrases were similar to those of UK graduates. Formulaic-sounding phrases also led to wider judgements 24 assessing empathy in medical settings of manner and attitude such as 'not engaging' , 'not interested enough' , 'a bit Olympian' . So perceived differences can rapidly lead to 'interpretive overdrive' as Blommaert and Rampton discuss: For much of the time, most of the resources materialised in any communicative action are unnoticed and taken for granted, but it only takes a slight deviation from habitual and expected practice to send recipients into interpretive overdrive, wondering what's going on when a sound, a word, a grammatical pattern, a discourse move, or bodily movement doesn't quite fit. There is considerable scope for variation in the norms that individuals orient to, which affects the kinds of things they notice as discrepant, and there can also be huge variety in the situated indexical interpretations that they bring to bear ('good' or 'bad' , 'right' or 'wrong' , 'art' or 'error' , 'call it out' or 'let it pass' , 'indicative or typical of this or that'). (Blommaert and Rampton 2016: 37) Such perceived differences can affect both overall emotional tone -whether the candidate sounds warm, involved, responsive and so on -and overall behavioural smoothness, i.e. whether the interaction progresses without jarring or uncomfortable moments or not (Erickson and Shultz 1982: 169-173). If both role-player and examiner share normative expectations about communicative resources but these are not shared with candidates -even at the most microprosodic level -the whole encounter may seem to have gone awry. This may lead to candidates being assessed as lacking compassion or disengaged, and doubts about how they can be trusted to be a caring professional are raised (see also O'Grady and Candlin 2013).
In a superdiverse context, where we are increasingly exposed to one another's different communicative styles -and no more so than in the health service -the ability to produce these phrases in a manner which sounds sincere to an overhearing examiner, in what is a simulated context, may not be an especial priority as a doctoring skill. Indeed, in superdiverse contexts, there is a strong argument for rethinking what the interactional environment of consultations looks like and whether local British normative expectations of expressing compassion or empathy can remain the dominant style of assessment. Deciding whether talk is empathic or not depends upon whether the observing examiner considers that the candidate's behaviour accords with what they consider empathy to be. The objectivity of standardised assessments is difficult to sustain when judgements such as 'empathic' depend upon an examiner's unavoidably subjective assessment of candidates' simulated behaviour towards simulated patients.
If patients also shared the same communicative expectations, there would be some case for arguing that all candidates should be able to draw on the same set of resources. However, the superdiverse patient populations which are characteristic of so many urban areas require a level of communicative flexibility that is not designed into standardised exams -indeed, it would be 26 assessing empathy in medical settings something of an oxymoron. These exams do not assess whether candidates, whatever their background, are able to manage this level of flexibility in consultations in settings of linguistic diversity. Standardised role-players tend to use a local British way of speaking and so monolingual UK graduates are not judged on whether they can carry out an effective consultation in linguistically challenging situations, with patients from different linguistic/cultural backgrounds. Such standardisation also means that IMGs, many of whom consult regularly in another expert language, have no opportunity to display this linguistic and cultural knowledge, which may be so valuable in everyday practice. In sum, simulations also raise questions about the fairness of assessments and how proportionate they are in an increasingly diverse society where there is no one right way of showing interpersonal effectiveness (Atkins et al. 2016).

Conclusion
The simulated consultation is a highly standardised genre, and in attempting to assess candidates' clinical skills it comes up against the untameable quality 'empathy' . Even if empathy were not assessed, the OSCE-type exam raises as many questions as it answers once we look outside the psychometrics of reliability and most types of validity. Since the institutional frame of these exams overrides any local relationship work, the consultation cannot be experienced as real and instead may depend, in part, on acting skills. Even though candidates can be trained to simulate the examined consultation, there are still several aspects of it which escape from the real. These include the powerful positioning of the role-player and the interactional evidence that they have opportunities to claim this power and asymmetry at a micro level (Atkins forthcoming). In addition, there is a lack of everyday contextual features such as the 'voice' of the computer in shaping the interaction and in eroding some of the modelled features of interpersonal skills, such as amount of eye contact (Swinglehurst et al. 2012).
Together with the tricky business of assessing the real through the unreal, the increasing focus on the soft interpersonal skills, and more especially empathy, raises serious questions about what empathy is, if it can be experienced and if so how, in a standardised exam setting. Here, relationship building over time and the deep values inherent in building capability (Fraser and Greenhalgh 2001;Bleakley 2003) are outsourced to an externally timed case where surface skills are voiced so that they can be monitored and assessed. And, to go one step further, there is the matter of whether 'true empathy' (Halpern 2003) is a necessary moral requirement of being a doctor at all.
Tied into these concerns is the issue of fairness. OSCE-style exams are talkheavy, requiring more voiced empathy phrases and more interactional work to atkins and roberts 27 inoculate them against sounding formulaic and insincere. Under the intense gaze of the examiners, a candidate whose style of communicating is likely to be somewhat different from theirs is under particular pressure. Small differences can have large consequences. And IMGs feel the weight of the water, which others may swim through more freely and lightly. At the same time, the diversity of communicative practices, typical of so many family and hospital practices today, are difficult to assess in such exams, again leading to concerns about whether things being equal means that they are fair.
Linguistic ethnography (LE), the approach used in the research illustrated here, drawing on sociolinguistics and anthropology (Copland et al. 2015), can also feel the weight of difference when used in medical settings. These methodologies can be viewed with scepticism by non-sociolinguists: since people talk and interact all the time, knowledge about language, communication and even culture can seem self-evident or simply uninteresting; also, there seems little space for such research in positivistic and psychometric paradigms where the notion of 'small is beautiful' (Schumacher 1973) is squeezed to the periphery. For sociolinguists, like IMGs, there is much hard discursive work to be done. We need to engage with the medical paradigm, which expects generalisations from big data, by tracing the source of small events and difference to wider social forces and theories through close contextual analysis (Burawoy 1998;Mitchell 1983;Small 2009). For example, we can analyse an intonation contour in the ever widening context of coded typical exam phrases, examiner feedback, interpersonal skills as designed into the exam, institutional systems of assessment, current medical discourses, performance and sociolinguistic theories of language use and difference, and feminised and equality discourses, as the examples above suggest.
A study such as this has value only if its practical relevance is clear and if it contributes, to whatever extent, to re-setting the terms of the debate about 'language' in OSCE-type exams. In the UK, this has involved us in long-term relationships with various institutions that teach and assess postgraduate medical professionals. There is, also, a wider debate around standardisation in increasingly pluralistic societies and how institutions address the challenges of face-to-face assessments in a globalised context. Chris Candlin in the 1970s was one of the first applied linguists to study the medical consultation when, in the UK, society was becoming increasingly diverse. And, more recently, with his associates, he was a significant voice in using close interactional analysis to enhance assessment and teaching practices (O'Grady and Candlin 2013). In the twenty-first century, this early work looks all too prescient, and the recent work increasingly timely, as we grapple with what it means to be professionally competent with a patient population from everywhere.