Delaying the implementation of Payment by Results in mental health: the application of standardisation. Mental Health Review Journal, 20 (3).

Purpose – The purpose of this paper is to explore the issues surrounding a long planned expansion of Payment by Results (PbR) into mental health services and to highlight the factors responsible for the delay. Design/methodology/approach – PbR relies upon “standardisation” of conditions and treatments. This depends upon a scheme of classification that can realistically predict resources required to execute treatment of any one case. Plans to fund NHS mental health services on the basis of tariffs derived in this way have been delayed, and a key reason is the lack of high-quality data. This would require effective “standardisation-to the-average” of both a system of classification and a repertoire of costed treatment pathways. This paper investigated the delay implementation by exploring the difficulties in applying standardisation principles to service provision and tariff calculation. Findings – The paper identified the fundamental difficulty with PbR’s implementation in applying “standardisation” to practice. This is defining the mental disorder that the patient is suffering and designing care pathways at clinical level considering the balance between practical applicability and conceptual/constructional validity. This is necessary to enable the calculation of a national tariff. The conceptual flaws of the Health of the Nation Outcome Scale led to the constructional shortcomings which compromised the credibility and validity of Mental Health Clustering Tool regarding making accurate classification in a standardised way. The validity and credibility of calculating a national tariff thus became contentious on the basis of this inaccurate clinical classification system. Originality/value – This paper explored the driving factors of delay in implementing PbR in mental health through connecting the recent reform with the fundamental assumptions of “standardisation-to the-average”, which provided another perspective to illustrate the current obstacles.


Introduction
An expansion of Payment by Results (PbR) into mental health service provision was initially planned for 2013, but at the time of writing (April 2015) it had yet to become the definitive framework for funding NHS secondary mental health services. In 2013 guidance was published which included indicative costs for each of twenty-one treatment packages or "clusters" (Department of Health, 2013), with a view to implementation in 2013/14. This was delayed until 2014/15 (Lintern, 2013). Subsequently it was reported that implementation was to be further delayed and the expressions "dangerous" and "unintended outcomes" were attributed to key figures in relation to hurried implementation of PbR in this domain (Lintern, 2013). By February 2015 a changing political landscape had moved this debate even further, with focus shifting from PbR as a core feature of competitive tendering, to an emphasis upon a "system-wide approach" (Keohane, 2015).
First introduced to acute services in 2003/04, PbR is intended to control health care costs, enhance capacity and improve quality (Department of Health, 2002). A defining feature of the logic supporting PbR is standardisation of individual treatments. It is based upon the assumption that cases for treatment can be classified into a finite number of categories based upon the likely costs of providing for them. Cases are differentiated on the basis of diagnostic groups within which patients share similar symptoms and service needs. Costs of providing for individual cases within such groups are estimated on the basis of averaged costs of providing for such cases across a range of providers. The validity of this rests upon two assumptions: cases for treatment can be classified into a finite number of categories reflecting the likely costs of providing for them on the basis of information available at the onset of treatment, and a meaningful tariff for each category can be derived as the average current cost of providing for cases falling within such a category. Although these may be valid assumptions for some aspects of healthcare activity, they may not be so for others, and a reason for the delay in implementing PbR in mental health services may be that this is an area of healthcare activity where they are fundamentally invalid.

Mental health "diagnoses": descriptive, but not explanatory routes to treatment
Despite attempts to present them as such (American Psychiatric Association, World Health Organisation) mental health difficulties prove hard to identify as discrete disease entities (Szasz, 1960). Reasons for this include intangible pathology, vague aetiology and the absence of externally verified diagnostic markers such as imaging and laboratory tests that are in daily use elsewhere in medicine (Frances, 2010). As a result, identifying "mental disorder" is heavily dependent upon professional judgment. Psychiatrists' schemes of classification reflect attempts to distinguish "normal" from "abnormal" and arrange the latter into pre-determined categories as if they were discrete disease entities susceptible to distinguishably different treatment approaches (Dalal and Sivakumar, 2009). Although classifying cases on the basis of descriptive criteria can be conducted reliably, the absence of associations between these criteria and mechanisms responsible for the abnormalities they reflect means that such classifications are poor guides to treatment (Middleton, 2008).
The development of PbR in mental health services has avoided heavy reliance upon established descriptive classifications of mental health difficulties such as Chapter V of the International Classification of Diseases (ICD) and the Diagnostic and Statistical Manual (DSM). Nevertheless， the current approach is based upon the assumption that individual cases' needs for treatment, and therefore resource implications can be sufficiently well estimated on the basis of the intensities of certain distressing and/or abnormal behaviours: "symptoms", without any reference to how they might have come about, or in any detail, what might be required to "treat" them. This could be considered akin to proposing that someone suffering painful difficulty walking because of osteoarthritis has the same healthcare resource implications as someone suffering the same degree of painful difficulty walking on account of an injury.

The Mental Health Clustering Tool
Attempts to predict the resource implications of treating individual cases, and therefore set a tariff in acute hospital settings have focused upon defining Health Resource Groups (HRGs). These reflect an approach based upon the assumption that, to some useful degree, diagnosis predicts the cost of providing care. In this context, physical medicine and surgery, "diagnosis" incorporates an understanding of why the patient is distressed, in pain or disabled to a level of certainty that a psychiatric "diagnosis" cannot. As a result, acute care "diagnoses" can often provide a sufficiently accurate prediction of what appropriate treatment might involve, and act as the basis of a tariff system. That is not the case in mental health service settings and rather than base mental health PbR tariffs upon "diagnosis" a different system has evolved, which is known as clustering. This is a process of classifying cases into twenty-one "clusters" which are considered to have distinct and distinguishable treatment resource implications. This classification is supported by a process and an algorithm which are together known as the Mental Health Clustering Tool (MHCT), and it is intended to form the basis of tariff allocations and payments in mental health service settings.
These are generally agreed distinctions, although the phenomenological boundary between "psychosis" and other forms of disturbed mental state is not fixed; consider for instance the status of hypnagogic and hypnopompic hallucinations which are hallucinatory but occur during awakening or falling asleep, the distinction between religious conviction and what are considered "delusions", or the phenomenon common in some other forms of distressed arousal which are referred to as "pseudo-hallucinations". Furthermore, the distinction between "non-psychotic" and "psychotic" is phenomenological, whereas the distinction between either of these and "organic" is based upon assumptions of cause. In other words, this broad distinction reflects traditional classifications, but these are themselves problematic and quite possibly sometimes misleading.
The MHCT sub-classifies cases falling into each of these "super-clusters" on the basis of symptom severity. These sub-classifications and the relationships between them are illustrated in Figure 1. Each of the twenty-one clusters is considered to define a group of patients with similar health care needs and resource requirements (Care Pathways and Packages Project, 2011), and therefore a group for which a tariff can be derived and applied. The HoNOS is a twelve-item scale designed to estimate the severity of psychological disturbance. It was developed during the 1990s in pursuit of a measure that could be used to quantify change during the course of psychiatric treatment and thus support expectations of verified service efficacy referred to in the government White Paper Health of the Nation (Department of Health, 1992;Wing et al., 1998a). The Supplementary SARN is a five-item scale estimating the degree of disturbance across domains which are considered to reflect the more notable difficulties that can arise in relation to individuals with mental health difficulties. Each of the eighteen items is scored on a 0 -4 basis whereby 0 reflects "No Problem" in that domain and 4 reflects "Severe or Very Severe Problem". On the basis of psychometrics derived from some 530 sets of scores an algorithm has been developed which links a profile of scores to one or other of the twenty-one definitive clusters. Figure 2 illustrates the eighteen items, how an imaginary case might have scored and what the profile would have to be if it is to be allocated to Cluster 19 … effectively someone with a significant degree of dementia. Figure 2. Colour coded rules of rating grids.
As treatment proceeds and is reviewed and adjusted, needs for treatment can change. Figure 3 illustrates the process of reassessment and cluster reallocation that is intended to keep estimates of resource implications up to date with patients' changing needs (Department of Health, 2013). Data populating these pro forma are generated at specified intervals by clinical staff in the course of their work with clients, theoretically as a by-product of routine assessments of progress; an agreed element of good clinical practice.

Figure 3. Cluster and care transition protocol
Cluster allocation is only half of the PbR process theoretically linking clinical condition to resource implications on a case-by-case basis. For that to happen, cluster allocation has to imply the suitability of a particular, costed package of care. Identification of the care packages associated with clusters defined in these ways was still under development at the time of writing (April 2015). A reason for this delay is that clustering is an essentially descriptive process that does not directly incorporate the implications of clinicians' judgements concerning appropriate treatment. Mental health services are rightly obliged to tailor individual treatments to the recipient … on the basis of age, gender, background, socio-demographic circumstances and, most importantly, preference. As a result, there remains considerable potential for variance in resource implications even after a case has been allocated a cluster. The resource implications of treating physical illness are largely accounted for by technical costs such as those of operating theatre time, investigations, medication and bed usage, which are all much more closely linked to definable features of the difficulty requiring treatment. In the context of mental health services individual characteristics such as readiness to engage in psychological therapy, keep appointments, sensitivity to drug side effects and the availability of informal support all play a much larger part in determining the resource implications of providing appropriate care (Emmerson et al., 2004;Jones, 2004). An attempt to capture these sources of variance has been made in the form of SARN, but simple ratings of risk, engagement and vulnerability are a very indirect proxy for judgements about the most appropriate course of treatment that are actually made in the field, and it is these judgements that form the basis of attempts to link cluster allocation to resource implications (Cabana et al., 1999;Jones, 2004).

Needs assessment
The foregoing draws attention to the fact that it is more difficult to standardise mental health service clients' needs for treatment, and therefore their resource implications than tends to be the case amongst acute care patients. The NHS is committed to shaping services according to patients' needs and preferences (Department of Health, 2000). This implies that each client should have individualised care plan based upon an individualised assessment of needs (Marshall, 1994). Considered realistically this has to be predicated by an acceptable understanding of "need" as it applies in this context. From a wider perspective the definition of "need" is problematic. There is no broad consensus on the concept of "need" in health and sociology literature (Asadi-Lari et al., 2004). Within the context of mental health services Wing et al. (1992) suggest that "need" should be defined alongside potentially available "state of the art" solutions. A healthcare 'need' only exists when there could be a "treatment". In parallel the NHS defines "need" as the capacity to benefit from services (Asadi-Lari et al., 2004), and so the "need" for treatment and its resource implications cannot be separated from judgements concerning the type and amount of health care that clinical expertise believes to be beneficial in a particular situation (Magi and Allander, 1981). If the resource implications of providing for an individual are to be predicted, then the process of doing so has to include, quite directly, the decisions made by clinicians about which courses of action might be most appropriate. In acute care settings these are often implicit; providing for someone with an osteoarthritic hip could involve pain relief, physiotherapy, mobility support, rest or hip replacement. The course followed may well be determined by a detailed "diagnosis", including information about the state of the hip joint in question, the patient's mobility, muscular tone and living conditions that will predict the course to be followed with little error. The nature of mental health difficulties is such that there is much more scope for variations in "needs for care" even amongst those with the same diagnosis (Wing et al., 1992).

The MRC Needs for Care Assessment Schedule
Attempts to standardise and quantify the care of mental health service clients antedate attempts to implement PbR by several decades. A significant step was development of the MRC Needs For Care Assessment Schedule (NFCAS) (Brewin et al., 1987). The NFCAS has not entered routine use because in its original form it is detailed and dependent upon specialised training, if it is to be used reliably. Nevertheless, those qualities can also be considered virtues, and conceptually it addresses many of the shortcomings already identified with the MHCT. In particular, it sets out to capture clinical judgements about the propriety or otherwise of different courses of action, and therefore generates an assessment of need which much more closely reflects what is deemed to be the appropriate care pathway. As a result, it is worth considering as an approach which could still have useful application.
The NFCAS was originally designed to measure the needs and provide structure to the 8 provision of services for those with long-term mental health difficulties living in the community as large scale mental institutions were wound down (Brewin et al., 1987). In essence it determines the presence or otherwise of difficulties across nine domains of psychiatric symptomatology, such as positive psychotic symptoms, dangerous or destructive behaviour, or distress, and twelve domains of essential everyday living skills, such as the ability to use public transport, to maintain personal hygiene or manage a weekly budget. On the basis of explicit criteria a judgement about the presence or absence of difficulties (Problem Status) is made in relation to each of these twenty-one domains; for each of the nine areas of symptomatology "Problem Status" is classified as "None or Mild", "Recent or Threatened", "Current and significant" or "Unknown", and for each of the eleven areas of essential living skills Problem Status is classified in the same way as "Competence plus performance", "Recent or Threatened Problem", "Lack of competence", "Lack of performance", or "Unknown".
Where there is evidence of threatened, recent or current symptomatology or a skill deficit, the potentially relevant treatments or interventions are evaluated accordingly. The interventions considered conceivably appropriate for each area of symptomatology and living skills are specified, which have been pre-determined by consensus from discussion with a wide range of mental health professionals. Therefore, where positive psychotic symptoms are or might be present, enquiries are made into whether or not any intervention such as medication, domiciliary visits, coping advice to the patient and/or relatives which might include alternative strategies, a family intervention or a sheltered environment might be appropriate and if so, whether or not they are being provided (Brewin et al., 1987). If any one of the relevant interventions is considered appropriate but is not being provided, then further enquiry is made into whether or not this is because it has yet to be provided, has been offered and not taken up or has been tried and found to be ineffective. Where there is a problem with communication skills, for instance, similar judgements would be made concerning social skills training, practice in realistic settings or a sheltered daytime environment. Overall the NFCAS results in a detailed catalogue of clients' difficulties and a statement of their "needs" couched in terms of clinically determined judgements about the suitability of consensually agreed interventions for each of them. This is clearly a more bespoke approach to identifying the resource implications of providing for a client and it incorporates the outcome of clinical judgements. During the 1980s and 1990s a number of studies confirmed both the reliability and validity of this approach (Brewin et al., 1988;Brewin and Wing, 1993;Marshall, 1994) but the very detail that endows the NFCAS with these qualities also makes it cumbersome to use. To ensure reliability judgements about the presence or otherwise of difficulties in each of the twenty-one domains and judgements about the applicability of numerous treatment options all have to be made on the basis of explicit criteria. Although many of these judgements are also, implicitly, the same judgements that might be made by a competent clinician, the NFCAS imposes a structure upon clinical assessment which could only be applied after rigorous training. As a result, it has not found a place in routine practice.
Attempts to popularise a shortened and simplified derivative were made (Phelan et al., 1995) but needs assessment in this form has remained a research exercise. Despite its conceptual and metric superiority over the MHCT this reputation and the demands of training have hindered its adoption as a basis for PbR in mental health service settings.

Strengths and weaknesses of the MHCT in comparison to the NFCAS
The most obvious advantage of the MHCT and this application of HoNOS are their suitability for incorporation into routine clinical practice. It only takes clinicians some 5-15 minutes to complete a HoNOS form (Jacob, 2009) and it is reasonable to assume that most of the related judgements will have been made in the course of routine clinical activity.
The HoNOS was developed by the Royal College of Psychiatrists to serve as an outcome measure and provide evidence of efficacy across mental health services (Wing et al., 1998a). As has been argued, this is fundamentally different from an assessment of needs for treatment, and by implication an estimate of resource implications. A number of studies have considered its validity as a measure of psychological difficulty and its psychometric reliability (Wing et al., 1998a;Wing et al., 1998b;Shergill et al., 1999;McClelland et al., 2000;Idiani, 2011, Lovagilo andMorgani, 2011) and found it to be relatively reliable in terms of inter-rater reliability and stability, and relatively valid as a measure of distress and change with time. Comparisons with other ways of quantifying mental health difficulties, such as symptom check lists, have not been so encouraging (Brooks, 2000). More importantly for the present purpose, there is little direct evidence of its ability to predict resource implications.
The empirical work underpinning the MHCT and its use of the HoNOS was an attempt to standardise practices in one particular service and 'need' was defined descriptively, as the presence of a particular set of difficulties assumed to have homogenous resource implications. Examples include "Acute Non-Psychotic (Medium Severity)", which refers to people who are characterised by moderate amounts of depression and/or anxiety, are continuing to function in everyday life but may encounter problems in their relationships, and are considered low risk, or "Non-Psychotic Chaotic and Challenging Disorder" which refers to people who are expected to have a wide range of symptoms and chaotic and challenging lifestyles. These people are also characterised by moderate to severe repeat deliberate self-harm, and chaotic, over-dependent engagement with services (Self et al., 2008b). Although grouping clients in this way might predict resource implications to some extent, the question remains, however; how well does the MHCT which is based upon it usefully predict service needs and therefore provide a platform for PbR in mental health services? To do so it has to predict resource implications, and therefore order resource allocation to a sufficient level of accuracy to avoid financial instability. This review began with a need to consider the validity of two assumptions underpinning PbR as it might apply to mental health services: that cases for treatment can be classified into a finite number of categories reflecting the likely costs of providing for them on the basis of information available at the onset of treatment, and that a meaningful tariff for each category can be derived as the average current cost of providing for cases falling within such a category.
The NFCAS provides an approach to cataloguing clients' difficulties and how they might be addressed in a way that is both a meaningful approach to determining what interventions they might require, and therefore their resource implications, and incorporates the routine clinical judgements involved in determining those treatment plans. It is an approach which would fulfil the first of these assumptions. The MHCT approaches the challenge from a different perspective. Rather than providing a structured way of considering and cataloguing the difficulties and needs for treatment each client presents in a bespoke manner, it "forces" classification of cases into one or other of twenty-one clusters on the basis of their presenting difficulties, and assumes that this in itself is a sufficient measure of resource implications. As such it is a less promising approach. The HoNOS, the core element of the MHCT classification was not designed for this purpose, but as a brief assessment of functioning intended to quantify clinical outcome changes. What evidence there is suggests that although HoNOS might serve as a valid overall outcome measure, it does not provide enough information to identify patients' needs at an individual level (Teesson et al., 2000). In contrast the NFCAS is specifically designed to do just that.
Advantages of the MHCT/HoNOS approach to classifying cases are that it is convenient to use and readily incorporated into contemporary information systems. The fact that PbR is proving difficult to implement in mental health service contexts suggests that the loss of detail and operational validity accompanying these conveniences are proving problematic.
Furthermore, clinicians regularly complain that the clusters don't fit an individual's condition and this creates confusion for clinicians asked to ascribe people to 'boxes' with no clinical sense behind them (Communitycare, 2013). Kingdon et al. (2012) point out that the abandonment of diagnosis-based system makes it difficult to understand how clusters can work in PbR. Finally, with no prior consideration of services, the classification system MHCT/HoNOS itself has been criticised as a "labelling process" (Callard et al., 2013;Middleton, 2013) which may result in its own adverse consequences. As Kingdon et al. (2012) argue, the credibility of MHCT and its validity in terms of making accurate and proper classification are still to be tested and to do so properly would take years. On the other hand, the NFCAS, cumbersome though it may be, is conceptually more appropriate, clinically grounded and robustly tested. The decision rules, acquiring which makes up NFCAS training can be operationalised and it is possible that these could be formatted as an automated algorithm. There is some evidence that this approach can be applied in everyday practice (Middleton et al., 1996), and so it could be adapted for this purpose.

Deriving a set of tariffs
Even if cases can be catalogued in a way that meaningfully predicts resource implications, there is still work to be done before tariffs can be established and used as the basis of remunerating NHS provider organisations. The direct costs and other resources needed to support each of a wide range of activities have to be defined. In order to act as a quality improvement mechanism, rather than encouraging competition on the basis of price, PbR requires a national average unit cost for each healthcare activity. Thus for each set of treatment activities a national average unit cost has to be estimated (Self et al., 2008b) and this depends upon the availability of data that identify costs of particular activities across a wide range of provider organisations. This in turn assumes that costs of treating a particular category of cases incurred by different providers follow a roughly normal distribution, in order that their arithmetic mean is a meaningful average. It also implies that 'deviances' can be categorised as the extremes of both sides of the cost distribution curve and an effect of pursuing 'standardised cost' will be to reduce deviance employing a 'standardisation-to the-average' principle (Department of Health Payment by Results Team, 2012).
Conventional mental health services involve wide variations in the services they provide, even in relation to very comparable cases. As the Royal College of Psychiatrists (2013) states, minimum standards might be published but wide interpretations and different treatments are clearly apparent. The very nature of mental health difficulties means that the focal point of treatment should be the process of knowing patients and their needs (Jones, 2004). The process of knowing a patient involves building up relationship, trust and intimacy, which requires clinicians to interact with patients and to be flexible towards the specific individual patient in a bespoke manner (Jones, 2004). Insofar as this has been the approach adopted by mental health services to date, there is very little quality data to draw upon which is able to identify the costs of providing for this, that or a third category of client.
Thus there is little to draw upon in pursuit of unit costs that might be averaged to compute a national tariff. In acute care, "diagnosis" and treatment implications are much more closely linked, elements of treatment such as an operation or a course of antibiotics can be much more clearly priced, and so such data are more readily available. Where it has been possible to do so, implementation of PbR without threatening financial instability has relied upon the availability of historical data drawn from already standardised patterns of practice (Appleby et al., 2012). As a much more explicitly patient-centred (and therefore un-standardisable) set of practices mental health services do not have such data to hand.
In order to generate them a period of standardised practice would have to be followed. At present attempts are being made to record service activity alongside MHCT cluster allocation but the inherent weaknesses of this approach to defining "needs" for mental health service interventions threaten the success of that process, and until such a process is followed successfully sufficiently accurate data upon which to base a set of national tariffs cannot be available.
Clinical needs can be assessed in a standardised way and reasonably accurate predictions of resource implications could be made, but considerably more investment in the process is needed if this is to be sufficiently accurate. In comparison with the NFCAS, the MHCT has to be recognised as an expedient short cut, and given the shortcomings that have been considered here, perhaps too peremptory to serve its intended purpose. The debate about implementing PbR in mental health service contexts too easily degenerates into an ideological confrontation. That is no more inescapable in this context than in any other, where the pros and cons of monetary evaluation are debated. As in so many other contexts, establishing the financial costs and benefits of providing mental health services is perfectly possible, but it immediately become contentious if it is conducted in a clumsy, hurried and inaccurate manner.