Low intrinsic efficacy for G protein activation can explain the improved side effect profiles of new opioid agonists

Low intrinsic efficacy can explain the reduced side effects of apparently biased μ-opioid receptor agonists. Opioids: Efficacy versus bias Because of its antinociceptive effects, the μ-opioid receptor (MOR) is an important target for pain management, but serious side effects limit the use of drugs that target this GPCR. Because the MOR stimulates intracellular signaling through both G proteins and β-arrestins, G protein–biased agonists have been developed to promote pain relief without causing β-arrestin–associated side effects. Gillis et al. compared the biochemical, signaling, and physiological properties of some G protein–biased MOR agonists with those of unbiased opioids. The observed reductions in side effects could be explained by the low intrinsic efficacy of the biased agonists rather than by their signaling bias per se. These findings suggest possible strategies for developing new MOR agonists that relieve pain with fewer unwanted side effects. Biased agonism at G protein–coupled receptors describes the phenomenon whereby some drugs can activate some downstream signaling activities to the relative exclusion of others. Descriptions of biased agonism focusing on the differential engagement of G proteins versus β-arrestins are commonly limited by the small response windows obtained in pathways that are not amplified or are less effectively coupled to receptor engagement, such as β-arrestin recruitment. At the μ-opioid receptor (MOR), G protein–biased ligands have been proposed to induce less constipation and respiratory depressant side effects than opioids commonly used to treat pain. However, it is unclear whether these improved safety profiles are due to a reduction in β-arrestin–mediated signaling or, alternatively, to their low intrinsic efficacy in all signaling pathways. Here, we systematically evaluated the most recent and promising MOR-biased ligands and assessed their pharmacological profile against existing opioid analgesics in assays not confounded by limited signal windows. We found that oliceridine, PZM21, and SR-17018 had low intrinsic efficacy. We also demonstrated a strong correlation between measures of efficacy for receptor activation, G protein coupling, and β-arrestin recruitment for all tested ligands. By measuring the antinociceptive and respiratory depressant effects of these ligands, we showed that the low intrinsic efficacy of opioid ligands can explain an improved side effect profile. Our results suggest a possible alternative mechanism underlying the improved therapeutic windows described for new opioid ligands, which should be taken into account for future descriptions of ligand action at this important therapeutic target.


INTRODUCTION
Agonists of the -opioid receptor (MOR), such as morphine and the synthetic opioid fentanyl, are mainstay analgesics for the treatment for severe acute pain. Unfortunately, MOR agonists also elicit several on-target adverse effects that severely limit their use, including respiratory depression, constipation, and the development of tolerance and addiction. The MOR is predominantly coupled to the G i/o protein family, which signals by inhibiting the production of cyclic adenosine 5′-monophosphate (cAMP) by adenylyl cyclase (AC) through G-subunits and activating G protein-coupled inwardly rectifying potassium (GIRK) channels through G subunits, among other effectors (1). Similar to most other G protein-coupled receptors (GPCRs), MOR signaling is regulated by the phosphorylation of intracellular C-terminal serine and threonine residues, which stabilizes the binding of -arrestins. This family of cytosolic scaffolding proteins is understood to terminate G protein signaling and mediate the formation of endocytic complexes and receptor internalization.
-Arrestin 2 has attracted much interest in regard to its interactions with MOR due to its suggested involvement in opioid side effects. Early studies showed enhanced morphine analgesia in arrestin 2 knockout mice with greatly diminished respiratory depression and con stipation (2,3). This has led to the hypothesis that a putative arrestin-dependent mechanism downstream of the MOR mediates the unwanted side effects of opioids (4). However, the nature of this proposed signal altering respiratory and gastrointestinal function has not been dem onstrated. This "arrestin hypothesis" has been challenged by the persistence of morphine-and fentanyl-induced side effects, including respiratory depression and constipation, in a knock-in mouse expressing a phospho-deficient MOR mutant that is unable to recruit -arrestin (5) and by the persistence of morphine-induced respiratory depression in the a -arrestin 2 knockout mouse (6). In addition, there is robust physiological evidence, demonstrating that classical G protein signaling from the MOR contributes substantially to respiratory depression (7,8), as well as to other side effects, such as constipation (9,10).
The proposed role of -arrestins in the unwanted effects of MOR agonists has led to the development of MOR ligands that do not recruit -arrestin 2 to the receptor, under the assumption that this would avoid on-target side effects mediated by the proposed -arrestin-dependent mechanism. Biased or functionally selective agonists engage a subset of signaling pathways to the exclusion of others, as in the case of G protein-biased ligands that maintain the classical G protein signal while reducing interactions with -arrestin 2. The prototypical such ligand of the MOR is oliceridine (TRV130), initially reported to have an improved preclinical profile over morphine in rodent studies (11) and now in clinical trials. Further studies have reinforced that oliceridine induces constipation and abuse-related effects in rodents (12,13), and although it still induces respiratory depression and constipation in humans, these side effects may be reduced when compared to morphine, giving a potentially wider therapeutic window (14,15). Another MOR ligand proposed to be G protein-biased, PZM21, was found using in silico docking to the inactive MOR crystal structure (16). Studies in cell lines and in neurons of MOR-Venus knock-in mice have further shown that both oliceridine and PZM21 induce very limited -arrestin recruitment or receptor internalization (17). The authors suggested that oliceridine and PZM21 have a signaling signature similar to buprenorphine, an extremely low-efficacy MOR agonist with reduced risk of severe adverse effects. Although initially reported not to induce respiratory depression (16), a later study found that PZM21 did slow respiratory frequency in mice to a similar extent as did morphine (18). Schmid et al. (19) generated a series of MOR ligands with increasing degrees of separation between G protein coupling, as measured by guanosine 5′-O-[-thio]triphosphate (GTPS) binding and inhibition of cAMP accumulation, and -arrestin coupling and correlated the quantitative bias factors to the therapeutic window of these compounds. Of these ligands, SR-17018 was the compound with the most extreme apparent bias and the best separation between antinociception and respiratory side effects because it was reported to not alter blood oxygen saturation or respiratory frequency in mice (19).
New MOR agonists proposed to be G protein-biased appear to have other pharmacological properties that could potentially be related to their safety profiles (20), and the mechanism through which reduced recruitment of -arrestin 2 to the MOR could reduce side effects remains unclear. Characterization of biased agonists is complicated by the difficulty in accurately quantifying agonist activity for a given signaling pathway. The de facto standard for bias quantification is the operational model of agonism (21). This model, by design, mathematically accounts for the distinct pharmacological parameters of affinity (ligand binding to a target) and efficacy (ligandinduced activation of that target). Operational model analysis of bias is routinely performed by estimating a combined "transducer coefficient" as an index of intrinsic relative activity of an agonist for a given pathway. This can be normalized to a reference agonist to allow comparison of activity between pathways, through quantitative test agonist bias factors, presumably without system bias. However, there are important confounding factors that must be considered in the application of operational analysis to any system. Ligand binding kinetics and the temporal pattern of signaling processes are not accounted for in this analysis and can have a profound influence on apparent bias (22). In addition, test compounds with very low activity for a particular end point severely confound accurate quantification. Assays in which agonists produce no or minimal signal prevent satisfactory curve fit and, therefore, any robust estimation of bias using the operational model. Regardless of the model, identification of biased agonists requires estimation of agonist efficacy and affinity independent of system and observational bias, such as cellular background and assay conditions. Initial descriptions of proposed G protein-biased MOR agonists did not specifically quantify agonist efficacy, giving either combined transduction coefficients (16,19) or not directly quantifying bias (11). Assays of G protein activity are commonly confounded by the presence of receptor reserve (also known as spare receptors), which is the condition whereby an agonist needs to activate only a small fraction of the existing receptor population to produce the maximal system response (23,24). In the presence of high receptor reserve, most test agonists reach a similar maximal response, preventing straightforward determination of relative efficacy (23). It is only when there is low receptor reserve that the differences in efficacy become apparent. Irreversible receptor inactivation removes receptor reserve and allows the quantification of ligand-intrinsic efficacy (23). Low-efficacy agonists, when compared between G protein assays, which measure highly amplified signaling, and -arrestin recruitment measurements, which measure unamplified signals, may appear biased because of assay conditions (20). Hence, recent studies showing the low efficacy of putatively biased agonists (18,25,26) should prompt a reexamination of the signaling profile of these biased MOR agonists, given that this factor was not initially identified.
Similar to bias factors, accurate quantification of a preclinical "therapeutic window" of lead compounds in vivo is critical to drug development. In this context, therapeutic window indicates the separation between the dose of a compound producing analgesic effect and the dose resulting in side effects such as respiratory depression or constipation. Therapeutic window is typically quantified preclinically by comparison of the potency of a compound for each response. The activity of test agonists in animal assays of antinociception and side effects is highly context dependent. Hence, measures of agonist bias and safety profiles using different in vitro assays, varying in vivo models, and inconsistent mathematical analyses do not allow for direct comparison of biased MOR agonists described in separate studies.
Here, we provide a systematic evaluation of the most recent and promising MOR-biased ligands and assess their pharmacological profiles against existing opioid analgesics in assays that are not confounded by limited signal windows. We found that oliceridine, PZM21, and SR-17018 have very low intrinsic efficacy. We also demonstrated a strong correlation between measures of efficacy for receptor activation, G protein coupling, and -arrestin recruitment for all tested ligands, including those that have previously been described as biased. Last, by measuring the antinociceptive and respiratory depressant effects of these ligands, we showed that low intrinsic efficacy of the so-called biased ligands can explain their improved side effect profiles in terms of a continuum of existing analgesics. Our results therefore suggest an alternative mechanism underlying the improved therapeutic windows described for recently developed opioid ligands, which should be taken into account for future descriptions of ligand action at this important therapeutic target.

In vitro responses of different opioid ligands reveal a spectrum of maximal effects
To understand the relationship between the intrinsic efficacy of opioids for different signaling pathways and, eventually, to assess their correlation with therapeutic indices, we constructed concentrationresponse curves for several opioid ligands and multiple downstream signaling pathways proximally linked to MOR activation in human embryonic kidney (HEK) 293 cells (Fig. 1A). An analog of the endogenous opioid peptide Met-enkephalin, DAMGO, was used as a reference agonist (16,19,24,27). Existing clinical opioid agonists fentanyl, methadone, morphine, oxycodone, and buprenorphine were profiled along with the three most recently described G protein-biased agonists oliceridine, PZM21, and SR-17018 (11,16,19). The ability of ligands to induce the active conformation of MOR was monitored through bioluminescence resonance energy transfer (BRET) assays for the recruitment of a conformationally selective nanobody, Nanobody 33-Venus (Nb33-Venus), to the receptor (Fig. 1B) (28). Similarly, coupling to and activation of G i proteins were assessed using, respectively, BRET assays with truncated, soluble "mini" G i proteins fused to a Venus fluorescent protein (mG si -Venus; Fig. 1C) (29) or previously described G i2 activation BRET-based biosensors ( Fig. 1D) (30,31). Subsequent G i -mediated inhibition of forskolin-induced cAMP production was monitored using the cAMP-yellow fluorescent protein (YFP)-Epac-RLuc (CAMYEL) BRET-based sensor (Fig. 1E) (32). G-mediated activation of GIRK channels in response to MOR ligands was measured using a membrane potential-sensitive dye assay in the absence or presence of the irreversible antagonist -chlornaltrexamine (-CNA) to reduce receptor reserve ( Fig. 1F and fig. S1A).
We also constructed concentration-response curves of the nine opioid ligands using BRET assays to monitor GRK2 and -arrestin 2 recruitment (Fig. 2, A to C), as well as MOR trafficking to Rab5apositive endocytic compartments (Fig. 2D). To improve assay signal, we overexpressed GRK2 in the -arrestin 2 recruitment and MOR trafficking assays (33), because we could not obtain a quantifiable response to all ligands with only endogenous amounts of GRK in these assays ( fig. S1, B and C). Together, we constructed concentrationresponse curves to accurately measure the responses the nine ligands of interest. These curves were then used to estimate signaling efficacy and bias factors.

Oliceridine, PZM21, and SR-17018 have low intrinsic efficacy for receptor activation and G protein signaling
Signaling efficacy was estimated using the operational model of agonism and quantified as the operational efficacy, , a parameter composed of both receptor density and signaling efficiency of the agonist-occupied receptor (Table 1 and see Materials and Methods for detailed description of the analyses). In all G protein assays, oliceridine, PZM21, and SR-17018 had lower intrinsic efficacy than did morphine, itself a partial agonist relative to fentanyl and DAMGO (Fig. 2E). Oxycodone had similar efficacy to morphine, and buprenorphine had very low efficacy. Together, these assays capture different aspects of receptor-G protein coupling and the amounts of signal amplification. In pathways with very limited signal amplification due to direct proximity to receptor activation (Nb33 recruitment and mG si recruitment) or with very limited receptor reserve from partial irreversible antagonism (GIRK activation), partial agonists had lower efficacy and potency than in highly amplified pathways (G i2 activation and AC inhibition) (Tables 2 and 3). However, despite these differences, estimates of relative efficacy for activating G proteins were remarkably consistent across assays of receptor and G protein activation. In addition, assays of receptor regulation, including interactions between MOR and GRK2, -arrestin 2, and Rab5, showed a similar pattern of agonist efficacy with a rank order of maximal effect largely conserved across all assays. Oliceridine, PZM21, SR-17018, and buprenorphine displayed lower efficacy in receptor regulation than did oxycodone and morphine ( Fig. 2E and Table 2).
We also monitored the kinetics of receptor-effector coupling in real-time assays of Nb33 and mG si recruitment. SR-17018 recruited Nb33 and mG si to the MOR at a slower rate than did morphine, DAMGO, or the other putatively biased compounds oliceridine and PZM21 ( fig. S2, A and B). The maximum response to SR-17018 did  not occur until 5 min after agonist addition, whereas it occurred within 1 min of stimulation with oliceridine, PZM21, or buprenorphine. Recruitment of Nb33 and mG si to the MOR by oliceridine, PZM21, and SR-17018 was rapidly reversed after addition of 10 M MOR antagonist naloxone but that induced by buprenorphine was not. This was due to either the inability of naloxone to compete with the very high affinity of buprenorphine or the extremely slow dissociation rate of this compound, or some combination of the two.
Agonist-induced C-terminal phosphorylation of MOR, quantified by phosphosite-specific antibodies, was in agreement with the different efficacy profiles. DAMGO, fentanyl, and methadone induced multisite phosphorylation of MOR, as expected for high-efficacy opioids (34), whereas morphine, oxycodone, oliceridine, and PZM21 only triggered phosphorylation of Ser 375 (Fig. 3A) (34,35). We have previously shown buprenorphine to have a phosphorylation profile restricted to Ser 375 , typical of low-efficacy agonists (34). The phosphorylation profile of SR-17018 was unusual in that although concentrations up to 10 M did not produce substantial phosphorylation of residues other than Ser 375 up until 30 min of stimulation (Fig. 3B), the MOR phosphorylation pattern induced by SR-17018 resembled that of higher efficacy ligands at the longest incubation time point (Fig. 3A). Moreover, similar to high-efficacy agonists, this multisite phosphorylation depended on GRK2 or GRK3 (GRK2/3) because it was blocked by incubation with the GRK2/3 inhibitor Cmpd101 ( fig. S3B). Naloxone blocked the phosphorylation of MOR induced by SR-17018, oliceridine, or PZM21 ( fig. S3A). Such unusual behavior for SR-17018 may also be linked to -arrestin recruitment data obtained using a high temporal resolution Förster resonance energy transfer (FRET) approach, wherein SR-17018 responses showed delayed onset of -arrestin recruitment (similar to Nb33 and mG si from fig. S2) and very slow decay upon agonist washout relative to other agonists, although this observation was difficult to consistently reproduce and warrants further investigation ( fig. S4).

Efficacy for G protein pathways closely predicts efficacy in receptor regulatory pathways
To investigate the relationship between efficacy values of the test agonists in different pathways, we calculated correlation coefficients (Pearson's r) between log() values ( Fig. 4 and fig. S5, A and B). There was robust correlation between the efficacy of test partial agonists in all pathways linked to G protein activation. The log() obtained for G i2 activation significantly correlated with the log() estimated for receptor activation by Nb33 recruitment (Fig. 4A), mG si recruitment (Fig. 4B), cAMP inhibition (Fig. 4C), and GIRK activation (coefficient of determination, ≥0.79 in all cases; Fig. 4D). Similarly, estimated efficacy of test compounds was consistent in receptor regulatory pathways. The log() value of the agonists for -arrestin 2 recruitment correlated very tightly to that calculated for receptor activation (Nb33) (Fig. 4E), GRK2 recruitment (Fig. 4F), and Rab5positive endosome trafficking (Fig. 4G). Unexpectedly, efficacy for G protein pathways closely predicted efficacy in receptor regulatory pathways for all agonists, even those proposed to be biased. The log() for -arrestin recruitment significantly correlated with the log() of G protein coupling (mG si recruitment) (Fig. 4H) and with the activation of both G i2 (Fig. 4I) and GIRK channels (coefficient of determination, ≥0.80; Fig. 4J). A less robust correlation was observed between the efficacy for -arrestin recruitment and cAMP inhibition, which is likely due to the high amplification and receptor reserve of the cAMP assay that results in poorer relative efficacy estimates (Fig. 4K). Together, these results show that oliceridine, PZM21, and SR-17018 exhibited consistently low intrinsic efficacy regardless of the downstream signaling pathway that was monitored, similar to that observed for buprenorphine and lower than that for morphine and oxycodone. Fentanyl and methadone were high-efficacy agonists, similar to DAMGO, in all assays in which they were studied. For each of the family of compounds tested and across a battery of G protein and regulatory assays, relative efficacy was highly conserved. The close correlation of efficacy across assays suggested that test agonists had similar activity in both G protein and regulatory pathways. To test this with a common, standard measure, we used the operational model of agonism to calculate the bias factors of all the ligands across the different signaling pathways that were assayed (see Materials and Methods). No evidence of statistically significant bias was observed toward G protein signaling compared to -arrestin 2 recruitment even after overexpression of GRK2 to ensure detectable and quantifiable responses from all ligands. The only exception was fentanyl showing a slight preference for GIRK activation (Fig. 5, fig. S6A, and table S1). Unexpectedly, SR-17018, which was initially described as a ligand with an extremely high G protein bias, showed no statistically significant bias toward or away from any G protein activation mea-sure, with a direction of bias toward Nb33 recruitment over -arrestin 2 recruitment but low confidence in that estimate. Buprenorphine showed a statistically significant bias away from GIRK activation toward -arrestin 2 recruitment.
These bias profiles were different when measured with only endogenous GRK expression ( fig. S6B). Although not statistically significant, oliceridine, PZM21, and buprenorphine showed bias factors in the direction of Nb33 and mG si recruitment and G i2 and GIRK activation over -arrestin 2 recruitment under these conditions. How ever, poor curve fits for these extremely partial agonists in the -arrestin 2 recruitment assay without kinase overexpression reduced the confidence in estimates of transduction coefficient and bias ( fig. S6A).

The low intrinsic efficacy of oliceridine, PZM21, SR-17018, and buprenorphine correlates with improved therapeutic windows
Biased agonism has been postulated to underlie the improved therapeutic indices of several opioid ligands. We measured the abilities of fentanyl, morphine, oliceridine, PZM21, SR-17018, and buprenorphine to produce antinociception and respiratory depression in mice. The antinociceptive action of these compounds has been shown to be abolished in MOR knockout animals (16,19,36,37) or to be sensitive to the MOR antagonist naloxone (11). These compounds represent the full range of opioid agonist efficacy we quantified in vitro. Compounds were tested at increasing doses in a hot-plate assay (anti nociception) and whole-body plethysmography (WBP) (respiratory depression) for 180 and 240 min, respectively. Compounds were delivered subcutaneously, with the exception of SR-17018, which was given by intraperitoneal injection, as in a previous study (19), due to the viscous vehicle and larger volume required. The tested compounds produced robust antinociception (Fig. 6A), with the exception of SR-17018, which was dose-limited by solubility. The kinetics of the antinociceptive effects were consistent with previously reported pharmacodynamics. Fentanyl and oliceridine had rapid onset of effect, followed by fast decay, aligning with their clinical profiles and animal studies (11). The onset and lifetime of the effect of morphine were intermediate, and although buprenorphine had similar onset to morphine, its effects on nociception were substantially longer-lasting, in line with previous research in preclinical and clinical settings (38). PZM21 and SR-17018 had long-lasting antinociceptive effects, as previously reported (16,18,19).
The primary effect of opioid agonists on mouse respiratory function is a decrease in respiratory frequency. All tested agonists reduced respiratory frequency within the test period, albeit to varying extents ( Fig. 6B). Minute volume, a combined measure of breath frequency and depth, was additionally measured over the test period and showed a very similar pattern between agonists ( fig. S7). With the exception of buprenorphine, test compounds had very similar kinetics of effect between antinociception and respiratory depression. Buprenorphine did not have a statistically significant effect on respiratory frequency until 230 min after injection at both 1 and 10 mg/kg, whereas all active antinociceptive doses reached peak effect within 30 min. A more highly efficacious active metabolite of buprenorphine, norbuprenorphine, has previously been reported to contribute to respiratory depression after buprenorphine injection in mice (39,40) and may explain this result. As previously published by Hill et al. (18) and in contradiction of the initial report of Manglik et al. (16), PZM21 induced substantial respiratory depression at antinociceptive doses. In contrast to the study of Schmid et al. (19), which measured respiratory depression in restrained animals, SR-17018 induced statistically significant respiratory depression at the highest dose tested, 30 mg/kg, although this effect was minimal compared to other agonists. The initial characterization of SR-17018 used restrained pulse oximetry to estimate respiratory frequency and tested until 60 min after injection. The statistically significant effects of SR-17018 on respiratory frequency were observed in our assay, in which animals were moving freely rather than being restrained, from 60 min onward. Hence, variations in the assay format or time course may explain the differences in the observed effects. In addition, similar to the critical differ-ence between the reports of Hill et al. (18) and Manglik et al. (16), the respiratory frequency of the vehicle-treated group in Schmid et al. (19) dropped substantially over the test period with about 25% change in frequency, whereas the vehicle groups in the present study were more stable, changing by 15%. Comparison of the respiratory effects of different ligands at equally antinociceptive doses revealed differences in the agonists' relative potency for the two assays. Buprenorphine, oliceridine, and PZM21 produced less respiratory depression than did morphine or fentanyl at equi-antinociceptive doses (Fig. 6C). The peak effect of each dose from the hot plate and WBP data were then plotted in a dose-response curve (Fig. 7A). This allowed calculation of the potency, logED 50 or the logarithm of the dose producing 50% of the maximal effect observed in that assay, for each drug in each behavioral assay (Fig. 7B). Peak effect was used to allow the comparison between drugs with distinct kinetic profiles. A preclinical model of therapeutic window was calculated by subtraction of logED 50 of antinociception from respiratory depression, with error propagated. Because a curve fit was not possible for SR-17018 due to solubility-limited doses, potency for antinociception was taken as minimum dose reaching a peak effect of 50% maximal possible effect (MPE). In the case of respiratory frequency depression induced by buprenorphine and SR-17018, neither of which reached 50% of the maximum effect of fentanyl, the highest tested dose was represented as a minimum lower bound of potency. Thus, the therapeutic windows of SR-17018 and buprenorphine were estimated and are shown as lower bounds (Fig. 8, A and B). Regardless, our results show that oliceridine has a larger therapeutic window in this preclinical model than morphine or fentanyl, with buprenorphine having the safest profile. PZM21 also appeared to have an improved therapeutic index but was less accurately quantified, whereas the window of SR-17018 was estimated.
Last, our systematic pharmacological characterization in HEK293 cells allowed us to investigate the relationship between in vivo therapeutic window and in vitro efficacy or bias factors. We observed an inverse relationship between efficacy and therapeutic window; highefficacy compounds, such as fentanyl, had a very narrow safety index, whereas the lowest efficacy agonist, buprenorphine, had a broad therapeutic range (Fig. 8A). Morphine, a drug less efficacious than fentanyl but more efficacious than buprenorphine, showed an intermediate safety profile. This characterization is broadly in line with clinical experience (41) and is supported by a comparison of the signaling profiles and Food and Drug Administration-reported adverse event frequencies of medically used opioids, which found an association between reduced efficacy and lower rates of adverse respiratory events (42). All the so-called biased ligands (oliceridine, PZM21, and SR-17018) fell within this correlation, with efficacy and therapeutic windows falling in Bias factors of all opioids were calculated using DAMGO as a reference (see Materials and Methods). *P < 0.05, **P < 0.01 by one sample t test. Error was propagated using standard rules, and error bars represent 95% confidence intervals. 0.01 (6) 0.03 (8) 0.1 (7) 0.3 (6) Sal. 0.3 (7) 1 (7) 1.8 (6) 3 (6) Sal. Sal.
the range between morphine and buprenorphine. In contrast, bias factors calculated between any G protein pathway and -arrestin 2 recruitment did not predict therapeutic window (Fig. 8B and fig. S8B). As noted above, the efficacy of each compound was consistent between tested assays, such that efficacy in most pathways predicted therapeutic window (fig. S8A). Comparison of agonist activity with  bias plots clearly shows that all tested compounds have similar relative activity between G protein and arrestin recruitment assays ( Fig. 9  and fig. S9).

DISCUSSION
Biased agonism at the MOR has been the focus of intense research for over a decade. Early observations in -arrestin 2 knockout mice prompted the hypothesis that this scaffolding protein participated in the signaling mechanisms mediating the on-target side effects of morphine (2,3). This gave rise to the second hypothesis that ligands able to activate G proteins without -arrestin recruitment represented a strategy to provide safer analgesia (43,44). Oliceridine, PZM21, and SR-17018 represent the most prominent examples of opioid ligands with reported G protein bias, although only oliceridine has been clinically tested. However, the underlying hypothesis has recently been challenged. First, respiratory depression induced through the MOR has been shown to be at least partially mediated by receptor coupling to GIRK channels through the activation of G proteins (7), with neurons in several regions of the brainstem respiratory network hyperpolarized by activation of this classical, -arrestin-independent MOR signaling pathway (8). This contrasts to the absence of robust physiological evidence for a -arrestin signal from the MOR affecting respiratory function (45). Second, opioid side effect profile is not improved in a knock-in mouse expressing a phosphorylationdeficient, G protein-biased MOR (5). Third, several laboratories have been unable to repeat the primary result of reduced morphine respiratory depression in the -arrestin 2 knockout (6).
Here, we found that efficacy robustly correlated across five assays of G protein signaling and across multiple assays of receptor regulation. The rank order of efficacy described here in heterologous expression signaling assays is consistent with other cellular and ex vivo studies performed with these ligands, including AtT20 cells and locus coeruleus neurons (46,47). The efficacy of all test agonists, including both existing analgesics and new compounds described as biased, for -arrestin 2 recruitment was closely predicted by accurate G protein efficacy estimates. This suggests that MOR activation by these agonists is similar for all tested assays, and operational analysis found no evidence of statistically significant agonist bias. All three putatively biased agonists had consistently low efficacy compared to morphine. Although previous descriptions of biased MOR agonists by Manglik et al. (16) and Schmid et al. (19) did not quantify efficacy explicitly, it is possible to reanalyze the cell-based assays used in these studies which show low receptor reserve and use DAMGO as a full reference agonist to generate efficacy estimates (table S2) (23). The efficacy of oliceridine, PZM21, and SR-17018 compared to morphine estimated in this manner aligns with our characterization of these ligands as low-efficacy agonists, as well as with other studies (18,25,26).
The relative potencies obtained from our concentration-response curves are also in agreement with previous reports (Table 3) (11,16,19,24,48). This is relevant because we designed each assay to quantitatively differentiate between responses to all test ligands, including very weak partial agonists such as buprenorphine. In assays with minimal amplification (recruitment of Nb33, mGsi, GRK2, and -arrestin 2), which may be assumed to detect a direct, one-to-one interaction between receptor and the recruited protein, E max of an agonist directly reflects efficacy (23). This is the case upon comparison of agonist E max values across these low-amplification assays. The maximal effect of each agonist was consistent between assays of recruitment (Table 2), as would be expected. Thus, with a lack of amplification, E max values reflect efficacy directly, and comparison across assays shows no evidence of agonist bias.
The most accepted method for detection and quantification of biased agonism is comparison of "transduction coefficients" based on the Black and Leff operational model of agonism (49). However, these analytical methods are not robust for assays with substantial confounding factors. The kinetics of ligand binding and signal transduction (22), receptor and effector localization (50), as well as limited assay range in systems with high amplification or extremely partial agonists, can all substantially alter conclusions regarding agonist activity. Using experimental approaches that allow for the measurement of responses in real time, we captured the peak effect of each drug for each signal transduction pathway. This also allowed us to minimize the influence of receptor desensitization (35). Our approach also ensured that the assays used were not limited by narrow signal windows. This was achieved using irreversible receptor inactivation for GIRK channel activation and exploiting the linear receptor-effector relationship with an excess of reporter in the case of both Nb33 and mGsi recruitment assays. In doing so, we captured the true range of MOR agonist efficacy. Similarly, we adjusted our experimental conditions to minimize the "floor" effect of ligands with minimal agonist activity by overexpression of GRK2 to ensure a detection window that fitted all ligands (33).
When GRK2 was overexpressed, we did not observe bias for any of the compounds previously characterized as G protein-biased (Fig. 5). Similarly, when examined under endogenous GRK expression, oliceridine, PZM21, and buprenorphine showed no significant G protein bias (fig. S6B). Moreover, as expected from the correlations between the efficacies of the ligands across different pathways (Fig. 4), no significant bias was detected when using an alternative, efficacydriven approach for bias determination ( fig. S6C) (25). Quantification of ligands that display the pronounced partial agonism of these compounds in any low-amplification pathway is confounded by poor curve fit to low responses (51). No statistically significant bias of oliceridine and PZM21 was previously observed with operational analysis (16). For SR-17018, which was previously described to have an extremely high bias factor toward GTPS and AC inhibition over -arrestin 2 recruitment, we did not detect bias toward any of the G protein pathways studied. The previous study compared activity in end point assays between membranes of Chinese hamster ovary (CHO) cells (GTPS), whole CHO cells (cAMP accumulation), whole U2OS cells (-arrestin 2 recruitment), and membranes from mouse brain (GTPS). As described above, the current study used real-time assays on a consistent cellular background. Given the notably unique kinetics of SR-17018 across assays, capturing peak effect rather than a predefined end point may partially explain discrepancies. In addition, Schmid et al. (19) constrained the parameters of operational analysis to permit a curve fit to extremely low activity compounds in the -arrestin 2 recruitment assay. As shown extensively by Stahl et al. (51), operational curve fitting to extremely low responses produces large variability in parameter estimates. GRK2 overexpression, matching the heterologous overexpression of MOR and -arrestin 2, improves the signal-to-noise ratio of the assay and therefore improved the statistical confidence. Poor quantification of curve position and plateau due to assay parameters is likely partly responsible for differences between our conclusions and those of some previous studies. As discussed theoretically by Conibear and Kelly (20), intrinsic system bias may give rise to substantial differences between assays and apparent agonist bias, particularly for low-efficacy agonists compared between highly amplified and linear signaling pathways. Low-efficacy agonists, such as buprenorphine and the putatively biased compounds, will produce strong responses in highly amplified G protein assays while having little activity in no-amplification, protein-protein interaction assays such as -arrestin recruitment (52,53). Buprenorphine itself has been previously proposed to be a G protein-biased ligand (17,54), although, here and in previous studies, it has consistently been observed to be an extremely partial agonist (24). This reinforces how varying signal amplification across assays can confound interpretation, most profoundly in the case of partial agonists. Equimolar comparison of each agonist's activity in G protein activation and -arrestin 2 recruitment illustrates the differences in signal amplification between assays ( Fig. 9 and fig. S9) (49, 55). The location of a test agonist over this plot compared to a reference agonist is used to identify biased ligands. Regardless of kinase overexpression all tested agonists produce similar levels of -arrestin 2 recruitment at a given amount of G protein activation. The substantial difference in amplification between G protein activation and -arrestin 2 recruitment under endogenous GRK expression illustrates how the window for detection of G protein-biased agonism under these conditions is limited.
Our selection of opioids recapitulates a large range of efficacy of existing opioid analgesics, from low-efficacy buprenorphine to highefficacy fentanyl, in a wider range of functional responses (24). Buprenorphine is an opioid with unique pharmacology, having extremely high affinity with low partial agonism at MOR, antagonism of -opioid receptors (KORs), and interactions with the nociceptin/ opioid-like receptor 1 (ORL1) (56). It has been well characterized as having a "ceiling" effect on respiration in both rodent and human studies, whereby increasing doses well past the therapeutic range do not additionally depress breathing (57). It has been implicated in lethal overdose much less frequently than other opioids (58). Buprenorphine's low intrinsic efficacy is routinely assumed to underly this plateau of effect on human respiration, and this partial agonism did not appear to limit the clinical analgesic effect in a review of clinical trials (59). A review from a consensus group concluded that buprenorphine produced equivalent analgesia to full opioid agonists with reduced likelihood of severe respiratory depression (60). A confounding factor in the study of this agonist, in addition to its lack of selectivity, is the production of active metabolites such as norbuprenorphine, which has lower affinity but higher efficacy than the parent compound (61). As noted above, this metabolite has been implicated in preclinical studies of buprenorphine respiratory depression and, in the rare cases of mortality, has been detected in blood plasma in amounts similar to that of buprenorphine (62). The accumulation of active metabolites is a potential explanation for the delayed effect of buprenorphine on respiratory depression in the present study (Fig. 6B). In any case, given that buprenorphine's antinociceptive effect in mice is abolished by MOR knockout (37), its superior profile here can be understood through its partial agonism at that receptor. Other opioids with very low intrinsic efficacy, including dezocine and nalbuphine, have also been shown to have ceiling effects on respiration in human studies (63,64). A meta-analysis of randomized controlled trials concluded that nalbuphine produced equivalent analgesia to morphine (65). By contrast, the high-efficacy analgesic fentanyl has a very narrow window of therapeutic effect versus respiratory depression (66).
In this context, we have identified that the therapeutic window of putatively biased MOR agonists has a potential alternative mechanism. Although still inducing similar levels of respiratory depression to morphine at high doses, oliceridine had a slightly but statistically significantly improved window that is in line with previous preclinical and clinical work (11,67). Similarly, PZM21, although not as accurately quantified because of lower potency and solubility concerns, appeared to have an equivalent increase in window, as did the estimated profile of SR-17018. Given the challenges to the arrestin hypothesis detailed above, the low intrinsic efficacy of these compounds is a more plausible explanation for these results.
Intrinsic efficacy, as estimated in any pathway, robustly predicted the rank order of the test compounds in an in vivo model of separation between antinociception and respiratory depression, arguably the most clinically important opioid-induced side effect (Fig. 8A and  fig. S8A). It will be of interest in future to determine whether other adverse effects, such as constipation, tolerance, and physical dependence exhibit comparable correlations. Buprenorphine's therapeutic utility is limited by its poor bioavailability, active metabolites, and concerns around its extremely high MOR affinity, including reversibility of respiratory depression and complex pharmacodynamic and kinetic relationships. The new MOR ligands studied here may have advantages in these and other aspects that make them superior platforms for further drug development. The contribution of off-target effects, most critically interactions with KOR, -opioid receptor, and ORL1, to this preclinical model was not directly examined here. The test agonists all have higher potency for MOR signaling over other targets, and MOR knockout abolishes the antinociceptive effects of fentanyl, morphine, buprenorphine, PZM21, and SR-17018 (16,19,37), whereas oliceridine's effects are naloxone sensitive (11). Our data from the preclinical therapeutic window model therefore most likely reflect primarily MOR-mediated effects of these agonists, although off-target effects could contribute to opioid effects, particularly at high doses. In addition, distinct pharmacokinetics have been proposed to contribute to differences in opioid respiratory depression (66), although, here, we showed that, for most agonists, the kinetics of antinociception and effects on respiratory frequency were consistent. The phosphorylation pattern, or "barcode," of the carboxy tail of the MOR is also indicative of the ability of opioid ligands to promote -arrestin recruitment and receptor internalization. It is widely accepted that high-efficacy ligands trigger robust hierarchical and sequential multisite receptor phosphorylation, whereas low-efficacy ligands only induce phosphorylation of Ser 375 in mice (Ser 377 in humans). Such differential phosphorylation has previously been described for DAMGO, fentanyl, oxycodone, morphine, and buprenorphine (34,35). Our data show that the phosphorylation induced by oliceridine and PZM21, even at 10 M for 30 min, is limited to Ser 375 , supporting their classification as low-efficacy ligands. Although the phosphorylation of Ser 375 by oliceridine had previously been reported (11), its potential phosphorylation barcode had not been previously assayed. The phosphorylation pattern induced by SR-17018 during the first 20 min of incubation agrees with that of a low-efficacy agonist. However, after 30 min, the MOR phosphorylation induced by this ligand resembled that of the high-efficacy ligands, both in terms of the pattern and in terms of its sensitivity to GRK2 inhibitors. Thus, although the pharmacological characterization of SR-17018 in vitro and in vivo suggests that its actions could be explained by its low intrinsic activity, these latter observations suggest that an unappreciated mechanism of action of SR-17018 may still contribute to in vivo effects. Unfortunately, because of solubility concerns at high doses, the therapeutic window of this compound had to be estimated; thus, further studies on the signaling mechanisms and in vivo effects of SR-17018 are required for the full understanding of this compound's behavior.
Together, our results draw attention to alternative mechanisms that influence the in vivo safety profiles of opioid ligands with regard to analgesia and respiratory depression. Whereas biased agonism is one hypothesis that may rationalize opioid actions, pharmacological parameters such as low G protein efficacy may also plausibly underlie the favorable therapeutic window of new opioids and should be taken into account. Given that the existing extremely low-efficacy agonist, buprenorphine, has been shown to produce reasonable analgesia with reduced side effects and overdose liability but is limited by a complex pharmacological profile, there is clear scope for future drug development to optimize efficacy at this target.

Drugs
Oliceridine (TRV130) hydrochloride was supplied by AdooQ Bioscience, CA, USA as distributed by Sapphire Bioscience, Australia. Morphine hydrochloride was supplied by GlaxoSmithKline, Australia or by Hameln Inc., Hameln, Germany. Buprenorphine was supplied by the National Measurement Institute, Department of Industry, Innovation and Science, Australia. PZM21 was supplied by (MedKoo Biosciences). SR-17018 was supplied synthesized as previously described (19). Oxycodone hydrochloride was supplied by Mundipharma; levomethadone hydrochloride was supplied by Sanofi-Aventis, and fentanyl citrate was supplied by Rotexmedica. Cmpd101 was from Hello Bio. All other reagents were from Sigma-Aldrich.
For in vivo compatible vehicle solubility, buprenorphine, PZM21, and SR-17018 were converted to hydrochloride salt form (see below). Mass spectrometry and nuclear magnetic resonance (NMR) analysis confirmed the structures of buprenorphine, PZM21, and SR-17018 used in animal assays was as previously published.

Bioluminescence resonance energy transfer
Cells were replated into poly-d-lysine-coated 96-well plates 24 hours after transfection and allowed to adhere overnight. BRET experiments were performed 48 hours after transfection. Furimazine (Promega) was used for Nanoluciferase (NLuc)-tagged constructs and Coelenterazine h (NanoLight Technologies, AZ) was used for Renilla luciferase (RLuc8)-tagged constructs at a final concentration of 5 M. Luciferase substrate was added immediately before dual fluorescence/luminescence measurement in a PHERAstar Omega plate reader (BMG LABTECH). The BRET signal was calculated as the ratio of light emitted at 530 nm by YFP or Venus (optic module, 535 ± 30) over the light emitted at 430 nm by NLuc or Rluc8 (optic module, 475 ± 30). Concentration-response curves were constructed from the 10-min time point after agonist addition for all pathways except Rab5-Venus for which concentration-response curves were constructed 30 min after stimulation. To measure agonist-induced adenylyl cyclase (AC) inhibition, 10 M forskolin was co-added with agonists to induce cAMP production. BRET signal from vehicle-treated cells was subtracted.
Western blotting HEK293 cells stably expressing hemagglutinin (HA)-MOR were seeded onto poly-l-lysine-coated 60-mm dishes and grown to 90% confluence. After cells were either exposed or not exposed to different agonists, cells were lysed in radioimmunoprecipitation assay buffer [50 mM tris-HCl (pH 7.4), 150 mM NaCl, 5 mM EDTA, 1% NP-40, 0.5% sodium deoxycholate, and 0.1% SDS] containing protease and phosphatase inhibitors (cOmplete mini and PhosSTOP, Roche Diagnostics). Pierce HA epitope tag antibody agarose beads (Thermo Fisher Scientific) were used to enrich glycosylated proteins. To elute proteins from the beads, the samples were incubated in SDS sample buffer for 25 min at 43°C. Supernatants were separated from the beads, loaded on 8% SDS-polyacrylamide gels, and immunoblotted onto nitrocellulose afterward. After blocking, membranes were incubated with anti-pT370, anti-pS375, anti-pT376, or anti-pT379 antibody, followed by detection using a chemiluminescence detection system. Blots were subsequently stripped and incubated again with the phosphorylation-independent antibody anti-HA to confirm equal loading of the gels. Protein bands on Western blots were exposed to x-ray film. Films exposed in the linear range were then desensitized using ImageJ 1.37v.

Data analysis
GraphPad Prism software (v. 8.0) was used for data and statistical analyses, which are specifically described in the figure legends. Concentration-response curves were fitted assuming a Hill slope of unity using the following three parameter equation where E m is the maximal possible response of the system, Basal is the basal level of response, K A represents the equilibrium dissociation constant of the agonist (A),  is an index of the signaling efficacy of the agonist, and n is the slope of the transducer function. The analysis assumes that the transduction machinery used for a given cellular pathway is the same for all agonists, such that the basal, E m , and n are shared between agonists. Within each individual experiment, basal activity was constrained to 0, and E m was constrained to 100. The value of n was constrained to 1 as protein-protein interaction assays of Nb33, mGsi, GRK, and -arrestin 2 recruitment, as well as Rab5 trafficking, are linear under operational model assumptions. As noted above, test fitting of concentration-response curves to a variable Hill slope, that is, n ≠ 1, did not provide evidence for operational analysis with a variable value of "n" in the remaining assays. Within each individual experiment,  and log(/K A ) values were calculated using DAMGO as the reference agonist. In the instances where low efficacy agonists become full agonists (i.e., in G protein activation and AC inhibition assays), logK A was assumed to be equal to the logK A value of each agonist obtained in the mGsi recruitment assay. Functional affinity of each agonist for the receptor state producing an mGsi interaction likely reflects affinity for the active, G protein coupling receptor state. In the case of the GIRK membrane-potential assay, log(/K A ) and  estimates were generated by simultaneous curve fitting of the untreated and -CNA-treated concentration-response curves. Biased agonism was quantified as previously described (22,70). To exclude the impact of cell-dependent and assay-dependent effects on the observed agonism at each pathway, the log(/K A ) value of a reference agonist, in this case, DAMGO, was subtracted from the log(/K A ) value of the agonists of interest to give a log(/K A ) value with error propagated according to standard rules. The bias was then calculated for each agonist at two different signaling pathways by subtracting the log(/K A ) of one pathway from the second pathway to give a log(/K A ) value, with error again propagated. Degrees of freedom of the resulting variable, for calculation of 95% confidence intervals as shown in figures and for statistical testing, were estimated by the Welch-Satterthwaite equation due to unequal variance of the fitted parameters between assays. A lack of biased agonism will result in values of log(/K A ) not statistically significantly different from 0. When fold changes in bias are described, this was calculated by converting values of log(/K A ) to the corresponding antilog value.
Biased agonism was statistically tested by a one sample t test comparing a given log(/K A ) value to 0, the subtracted value of the reference agonist DAMGO. P values of this test were multiplicity corrected using the Holm-Sidak method within each set of comparisons. One sample statistical tests to 0 performed for Fig. 5 (all agonists compared between -arrestin 2 recruitment with GRK2 overexpression and Nb33/mGsi/GPA/cAMP/GIRK assays for a total of 38 comparisons) and performed for fig. S6B (all agonists compared between -arrestin 2 recruitment with endogenous GRK and Nb33/mGsi/ GPA/cAMP/GIRK assays for an additional, separate 38 comparisons) were multiplicity corrected within those sets of analyses.
Correlations were computed using Pearson's correlation analysis to assess whether the  values of each agonist in a particular pathway vary together with the  values of the same agonist in a different pathway. A strong correlation was reported when P < 0.05 and the fraction of the variance that is "shared" between the two variables is reported as R 2 .
Membrane potential assay MOR-mediated activation of potassium channels was assayed using a membrane-potential dye from Molecular Devices (FLIPR Membrane Potential Assay kit) as previously described (71). On the day before the experiment, HEK293-GIRK4-MOR cells were plated into clear-bottomed, black-walled 96-well plates (Corning) precoated with poly-d-lysine for cell adherence. Cells were plated in L-15 media (Gibco, Thermo Fisher Scientific) supplemented with 1% (v/v) FBS. The following day, cell medium was aspirated, and plates were treated with either 200 nM -CNA hydrochloride (Sigma-Aldrich) or Hepes-buffered salt solution (HBSS) vehicle control for 20 min. Testing of each compound in untreated and -CNA-treated conditions was carried out in parallel. All experiments were carried out in duplicate. Cells were washed twice with HBSS after vehicle or -CNA treatment and incubated for 1 hour in 50% L-15 media and 50% FLIPR dye reconstituted in HBSS in a final volume of 180 l per well. All test compounds were serially diluted in a vehicle of 1% (v/v) dimethyl sulfoxide (DMSO), bovine serum albumin (BSA; 0.1 mg/ml) in HBSS at 10× concentrations in a 96-well compound plate (Corning) such that final assay concentrations of solvent were 0.1% DMSO and BSA (0.01 mg/ml). Plates were read in a FlexStation 3 (Molecular Devices). Two-minute baselines were taken for each well before compound addition. Then, 20 l of 10× concentrated compound was added and read for 3 min. Peak effect was calculated by taking the minimum raw fluorescence value of the postaddition measurement period, expressed as a percentage of preaddition baseline and vehicle corrected, and normalized to the maximal response to DAMGO without -CNA treatment.

Animals
All experiments involving animals were approved by the University of Sydney Animal Ethics Committee (protocol number K00/12-2011/ 3/5650). Experiments were performed under the guidelines of the Australian code of practice for the care and use of animals for scientific purposes (National Health and Medical Research Council, Australia, 7th Edition). Great care was taken to minimize animal suffering during these experiments and to reduce the number of animals used. Male C57Bl/6 J mice were ordered from the Animal Resource Centre (Perth, Western Australia) and were between 6 and 9 weeks of age on the day of experimentation. Animals were housed no more than six to an individually ventilated cage in a temperature and humidity-controlled room with 12-hour reverse day-night cycle lighting (lights on at 20:00). Mice had free access to food and water and were allowed to acclimatize to the facility for at least 10 days before the experimental day. All animal experiments were performed on mice between 9:00 and 19:00 hours in their dark, active period under a red light. Handling and acclimatization of animals to the testing room occurred on at least 4 days before experimentation.
Doses in milligrams per kilogram were calculated for the active component of the drug (free base). Morphine hydrochloride, fentanyl citrate, PZM21 hydrochloride, and oliceridine hydrochloride were dissolved in saline for injection. Buprenorphine hydrochloride was dissolved in 20% polyethylene glycol 400 (PEG 400) in saline. SR-17018 hydrochloride was dissolved in 1% (v/v) DMSO, 10% (v/v) Tween 80, and hydrochloric acid, before being diluted to a final concentration in saline and pH adjusted back to 7 with sodium hydroxide. Saline, 20% PEG 400, and 1% DMSO/10% Tween 80 vehicles did not differentially affect either the hot-plate or whole body plethysmography assays. All injections were performed subcutaneously at a volume of 200 l, with the exception of SR-17018 and the 1% DMSO/10% Tween 80 vehicle, which were administered intraperitoneal at a volume of 400 l.

Hot-plate assay
Antinociception was tested using a 54°C hot plate with latency to response timed via foot switch. Mice were placed onto the hot plate within a plexiglass cylinder and observed at baseline and after injection time points of 5, 15, 30, 60, 90, 120 and 180 min. Hot-plate response was defined by hindpaw lift, flick, flutter or attending, or, rarely, a jump. Mice were rapidly removed from hot plate after response or at cutoff. A maximum cutoff of 20 s was set to limit potential tissue damage. Experimenters were blind to drug or vehicle treatment group via coded syringe labeling. Responses have been expressed as a percentage of maximum possible effect, by subtracting each animal's baseline latency and normalizing to the cutoff latency.

Whole-body plethysmography
Respiratory side effects were tested using a commercial WBP system (Buxco FinePointe system, DSI Instruments). Animals were acclimatized to one of two parallel WBP chambers on the day before the experiment for 10 to 15 min and, again, on the morning of the experiment. While in WBP chambers, animals were shielded from view from other animals and the experimenter. Chamber air was refreshed from room air by the integrated system at a constant rate. All analysis of respiratory waveforms was completed by integrated FinePointe software. With the exception of fentanyl citrate, the respiratory effects of test compounds were examined in a rotating protocol in which 10-min measurements were taken at a preinjection baseline, followed by 25, 65, 105, and 230 min after injection. Animals settled in the WBP chambers over the first 5-min period, and the second 5-min period was taken for analysis. Four animals were rotated through a single chamber in this manner over the testing period. Because of the extremely rapid kinetics of fentanyl citrate, an alternative rapid rotating protocol was used. Mice were returned to the chamber immediately after injection for an initial 15-min measurement, after which 10-min measurements were taken at 35, 65, 95, 125, and 215 min after injection, with two animals rotated through each chamber. For analysis, each animal's respiratory frequency was normalized to the predrug baseline as 100%.

In vivo dose-response analysis
The peak response of each animal was taken for each compound for dose-response curve fitting. For comparison between hot-plate and WBP assays, only time points captured in both assays were included in analysis. Plots were fitted to a three-parameter logistic function, with basal constrained to 0. In fitting the hot-plate assay, curve maximum was constrained to be less than the maximum possible effect. In WBP, ED 50 values were calculated at 50% of the maximum response to fentanyl. Therapeutic window was calculated between the two assays by subtraction of log(ED 50 ) values to produce log(ED 50 ). Error was propagated by standard rules, and degrees of freedom for statistical testing summed under the assumption of consistent variance. In the case of compounds that did not reach 50% of defined maximum response in the WBP assay, SR-17018 and buprenorphine, the therapeutic window was calculated by assuming that the ED 50 must be greater than the highest dose tested. The 95% confidence interval of the log(ED 50 ) calculated for these compounds is therefore asymmetrical, with uncertainty in one direction unbounded. The maximum dose of PZM21, SR-17018, and buprenorphine in WBP was compared to the corresponding vehicle at each time point using multiple t tests, with multiple comparisons adjusted for using the Holm-Sidak method.
Conversion of free-base buprenorphine, SR17018, and PZM21 to their monohydrochloride salt All solvents and reagents were purchased and used from commercial sources. NMR spectra were recorded on Bruker Advance DRX 300 at 300-MHz 1 H NMR frequency, and chemical shifts are expressed as parts per million. All resonances are reported as chemical shift () and are referenced to the solvent residual peak. Multiplicities are reported as follows: s (singlet), br (broad), d (doublet), dd (doublet of doublets), t (triplet), q (quartet), and m (multiplet). Coupling constants (J) are reported in hertz.
Low-and high-resolution mass spectra were obtained through electron ionization (ESI). Low-resolution mass spectra were performed on a Finnigan LCQ mass spectrometer. High-resolution mass spectra were performed on a Bruker 7 T Apex Qe Fourier Transform Ion Cyclotron resonance mass spectrometer equipped with an Apollo II ESI/APCI/MALDI Dual source.
High-performance liquid chromatography (HPLC) was performed on the Waters Alliance 2695 apparatus equipped with Waters 2996 photodiode array detector, set at 254 nm. Separation using a SunFireTM C18 column (5 m, 2.1 mm by 150 mm) was achieved using water (solvent A) and acetonitrile (solvent B) at a flow rate of 0.2 ml/min. The method consisted of 0% B to 100% B over 30 min. HPLC data are recorded as percentage purity and retention time in minutes. All compounds showed >95% analytical purity.
A solution of free base drug (1.00 mmol) in diethyl ether (10 ml, 0.1 mM concentration) was treated with a 4 M solution of HCl in 1,4-dioxane (1.0 ml, 4.0 mmol) and stirred for 1 hour with a white precipitate formed. The solvent was removed under reduced pressure, and the solid obtained was dried to constant mass under high vacuum.

FRET measurements
To measure the interaction between MOR and -arrestin 2, HEK293T cells were transfected with 0.8 g of MOR-YFP, 0.4 g of human GRK2, and 0.8 g of -arrestin 2-mTurquoise. On the next day, cells were seeded on round 25-mm poly-d-lysine-coated coverslips, and 48 hours after transfection, FRET was measured as previously described (72), except that a light-emitting diode (LED) excitation system (pE-2; CoolLED) was used for all experiments. FRET traces were not corrected for bleaching effects.
View/request a protocol for this paper from Bio-protocol.