The impact of macrocycle conformation on the taxadiene-forming carbocation cascade: Insight gained from sobralene, a recently discovered verticillene isomer

: DFT calculations on the carbocation intermediates that connect the biosynthetic pathways leading to the sand fly pheromone sobralene and taxadiene have been made. Establishment of the conformation of the macrocyclic carbocation intermediate required to produce the cis-C8,C9 alkene bond in sobralene, has identified new conformations of the verticillyl carbocation intermediates on the taxadiene biosynthetic pathway. These “sobralene-like” carbocation conformations provide an exothermic pathway to taxadiene, and are validated by comparison to closely-related structures (x-ray, NMR).


INTRODUCTION
Sobralene 1 is a recently discovered sex-aggregation pheromone produced by populations of the sand fly Lutzomyia longipalpis from Sobral, Brazil, 1 which is the main carrier of the Protist parasite Leishmania infantum, the causative agent of visceral leishmaniasis (VL).Visceral leishmaniasis is a tropical disease transmitted to humans through the bite of the female sand fly (Lutzomyia longipalpis) and it is fatal if untreated.Ninety five percent of new VL cases (estimated at 200-400 thousand per year) occur in just ten countries (Bangladesh, Brazil, China, Ethiopia, India, Kenya, Nepal, Somalia, South Sudan and Sudan), and it is the second largest parasite killer in the world after malaria, resulting in 20,000-40,000 fatalities per year. 2,3Sobralene 1 has the potential to be used in pheromone attractant traps as a vector control measure.However, it has only been isolated in very small quantities from the sand flies, and methods need to be developed for its large scale production.Inspired by this challenge, we initiated studies exploring sobralene's biosynthetic origin as a first step to producing 1 using well known enzyme over-expression techniques.This approach has been successfully employed for the production of closely related diterpenes such as taxadiene 3 [4][5][6][7][8][9][10] and a collection of verticillenes 11,12 (e.g.verticillene 2), but the production of sobralene 1 has yet to be explored.
The synthase responsible for transforming geranyl geranyl pyrophosphate (GGPP) (5) into sobralene 1 is currently not known.However, the fact that taxadiene 3 (a biosynthetic precursor to the anti-cancer drug Taxol 4 13 ) was co-isolated with 1 from sand flies 1 provides strong evidence for a close biosynthetic relationship between 1 and 3 and suggests that sobralene synthase is similar to taxadiene synthase (TXS) 14 (Figure 1).Taxadiene synthase mediates a pathway from GGPP 5 to taxadiene 3 via a series of carbocation-mediated cyclisations, i.e. macrocyclisation (AB), bridged bicycle formation (BC), transannulation (DE) and termination by loss of a proton from carbocation E (Scheme 1).It is likely that sobralene 1 has its origins in the same carbocation D that sits on the taxadiene biosynthetic pathway, with stereoselective loss of a proton from C9 in D producing the required cis-C8,C9 alkene bond in 1. Motivated by the recent discovery of sobralene 1 and its biosynthetic connection to taxadiene 3, we decided to investigate the cyclisation cascade shown in Scheme 1 in more detail using computational methods, paying particular attention to the role of carbocation D.
Although the carbocation cascade leading to sobralene 1 has not been studied, the pathway to taxadiene 3 from GGPP, outlined in Scheme 1, has been examined computationally, and a variety of methodologies have been explored.Oikawa et al. 15  where all the carbocations A to E were examined, along with the transition states connecting them.A key result of this study was that the direct conversion of C into D was unfavourable in comparison to the two step intramolecular proton transfer sequence CFD, thereby supporting the involvement of carbocation F in the taxadiene synthesis pathway.However, their most intriguing finding was that the lowest energy point on the calculated pathway was the carbocation intermediate C, and that the tricyclic carbocation intermediate E (the precursor to taxadiene 3) was significantly higher in energy (+3.8 kcal/mol).A later QM/MM study by Major et al. 17 reexamined the CFD proton transfer process within a model of TXS and these authors concluded that the one step (CD) and the two step (CFD) processes were energetically similar.In contemporaneous studies, Thiel et al. 18 and Major et al. 19 used molecular dynamics simulations to study the interactions of the intermediate carbocations with TXS, which showed that pyrophosphate (PPi) plays a crucial role in favouring the formation of taxadiene, and subsequent QM/MM studies provided energy profiles for the whole process. 19,20e absence of a protein structure for sobralene synthase precluded QM/MM calculations and MD simulations for our studies, so we selected a QM (DFT) approach.By identifying a conformation of the carbocation intermediate D required to install the cis-C8,C9 alkene in sobralene 1 we have discovered a significantly more energetically favourable cylisation cascade (CDE) on the taxadiene pathway than that previously disclosed. 16In this paper we report the details of our study.

COMPUTATIONAL METHODS
Geometry optimisations were performed using either B3LYP/6-31+G(d,p) or B3LYP/6-31G(d) [25][26][27][28] level of theory using Q-Chem as deployed in Spartan 10 (Windows), and single point MPWB1K/6-31+G(d,p) 29 energies were also calculated for the B3LYP/6-31+G(d,p) and B3LYP/6-31G(d) optimised structures.MPWB1K/6-31+G(d,p) was selected as this hybrid functional has been shown to perform well in energy calculations for carbocations, and it also allows for a direct comparison to the energies reported by Tantillo and Hong in their previous study of the taxadiene carbocation cyclisation cascade.16a Zero point vibrational energies were calculated for all structures and the absence of imaginary frequencies was used to characterise the structures as minima on their potential energy surfaces, and transition states were confirmed by the presence of a single imaginary frequency consistent with motion along the reaction coordinate.
IRC-like calculations, as described by Thiel et al., 20 were performed to confirm that starting materials and products were connected via the transition states identified.The Freezing String Method (FSM) 30,31 was used to locate transition states in cases where the standard approach failed.All representations of calculated structures were generated using CYLview. 32

RESULTS AND DISCUSSION
We first examined the carbocation D because this intermediate directly links sobralene 1 and taxadiene 3 (Scheme 1).Since the formation of sobralene 1 has not been examined before, we investigated the conformation of D required to accommodate the cis-C8,C9 alkene in the macrocyclic ring of 1.Molecular mechanics on sobralene 1 provided a good starting conformation for the corresponding carbocation sob D from which to conduct QM calculations (the superscript 'sob' is used here to identify the sobralene-derived conformer of D).Subsequent geometry optimization using B3LYP/6-31+G(d,p) identified sob D as a local minimum, and the MPWB1K/6-31+G(d,p) single point energy was calculated for this structure (Figure 2).Examination of the sob D structure shows that elimination of the proton Ha on C9 is stereoelectronically favourable, and that this elimination would naturally lead to the cis-C8,C9 alkene found in sobralene 1.As mentioned earlier, the structure of carbocation D has been calculated previously by Tantillo and Hong 16 during their investigation of the taxadiene biosynthetic cascade.In contrast to our work, they identified an alternative conformation for D that can be described as adopting a verticillene-like conformation vert D (the superscript 'vert' is used here to distinguish this conformation from the sobralene-like sob D conformer).Crucially, the conformation of vert D precludes the direct formation of sobralene 1 because elimination of a proton from C9 in vert D would give the incorrect trans-C8,C9 stereoisomer of sobralene, and the alternative elimination of Hb from C7 in vert D would give the trans C7,8 alkene found in verticillene 2 (Figure 2).We therefore propose that the synthase responsible for forming sobralene 1 must carefully control the conformation of carbocation D, favouring the sob D conformer prior to elimination of the C9 Ha proton.Recent experiments show that sobralene 1 isomerises to verticillene 2 upon treatment with mild acid via cation D (i.e.1 sob D2), 21 thus demonstrating that 2 is more thermodynamically stable than 1.
This finding is consistent with our calculations that show sob D is slightly higher (2.3 kcal/mol) in energy than vert D. Without the structure of the sobralene synthase, we cannot assess if specific active site residues facilitatate the desired deprotonation at C9 in the carbocation D to give sobralene 1, but it seems likely that the synthase is responsible for guiding carbocation D to adopt the sob D conformation during the deprotonation to form sobralene 1.Further inspection of the calculated sob D structure showed that the C3-C8 distance was much shorter (3.14 Å) than that previously reported for the vert D conformer (3.47 Å) 16a (Figure 2), and whilst not relevant to the formation of sobralene 1, this shortened C3-C8 distance could indicate that transannulation on the taxadiene pathway (i.e.DE, Figure 3) might be more facile in the sobralene-like versus the verticillene-like conformation (vide infra).
Having found the sob D conformation, we next explored whether any other key carbocation intermediates on the taxadiene cascade (Scheme 1) could similarly adopt sobralene-like conformations.
Following the same methodology, we examined the carbocations E (Figure 3) and C (Figure 5), and we were able to locate sobralene-like conformations for both.For clarity, the new sobralene-like conformers were named sob E and sob C, whilst the previously calculated verticillene-like conformers were named vert E and vert C respectively.
Comparison of our newly calculated sobralene-like conformer sob E with the previously reported verticillenelike conformer vert E 16 revealed a very significant (and somewhat unexpected) energy difference, with the sobralene-like conformer ( sob E) being 9.2 kcal/mol lower in energy than the verticillene-like equivalent ( vert E) (Figure 3).This important result means that the sobralene-like conformer sob E is the lowest energy point on the taxadiene carbocation cascade, which is in contrast to the verticillene-like conformers where vert C is the lowest energy point.This means that by utilizing the sob E conformation, the carbocation cascade to taxadiene is calculated to be energetically downhill overall.Both sob E and vert E adopt similar conformations in their A-and Crings, with the major difference being between their 8membered B-rings.As expected for the lowest energy conformer, sob E adopts a boat-chair (BC) conformation, and vert E adopts a higher energy chair-chair (CC) conformation. 22The boat-chair conformation of sob E allows the C17 and C19 methyl groups to be much further apart (4.80 Å) than in vert E (3.98 Å), and the C3-C8 bond length is shorter in sob E (1.60 Å) than in vert E (1.73 Å) (Figure 3).Supporting evidence of the relevance and importance of the sob E conformation on the taxadiene biosynthetic pathway comes from examining experimental data measured on closely related neutral structures.Firstly, in their early pioneering work, Coates et al. measured n.O.e.data on taxadiene 3 itself, and they showed that irradiation of the C19 methyl group (which is attached to C8) gave enhancements to both methylene protons on C9, and also to the -proton on C2 (Figure 4A). 23On the basis of these data, Coates et al. thereby concluded that taxadiene 3 adopts a sobralene-like, in preference to the alternative verticillene-like, conformation in solution (Figure 4A).In addition, during the first total synthesis of taxadiene 3, Williams et al. 24 measured an X-ray crystal structure of their key synthetic intermediate 9 that contains a ketone at C4 (Figure 4B).with the calculated structure of sob E.
Having found sobralene-like conformations of carbocations D and E, we next calculated the corresponding sobralene-like conformer of carbocation C (Figure 5).Examination of the structure of sob C, shows that the proton on C11 (i.e. the one transferred intramolecularly to either the C3,C4 alkene or the C7,C8 alkene) is closer to both C3 (2.26 vs 2.35 Å) and C7 (2.55 vs 2.66 Å) in sob C than in vert C, thus suggesting that the intramolecular proton transfer to form either of the carbocations D or F (see Scheme 1) might be more facile in the sobralene-like conformation than in the verticillene-like conformation.A similar conformation for carbocation C has also been identified by Tantillo and Gutta 16b during their preliminary study of intramolecular proton transfer during taxadiene biosynthesis, and our data is in close agreement with theirs.In order to establish the barrier heights for the proton transfers, and to fully explore the sob D to sob E cyclisation, we next calculated the pathway from carbocation sob C to sob E including transition state structures.To expedite these calculations for the nine species (i.e.five minima and four transition states) along the carbocation cascade we explored the use of B3LYP/6-31G(d) for geometry optimisations and frequency claculations, followed by MPWB1K/6-31+G(d,p) for single point energies.Using carbocations C, D and E, for which we already had B3LYP/6-31+G(d,p) geometries for comparison, we quickly found that the MPWB1K/6-31+G(d,p)//B3LYP/6-31G(d) energies gave good agreement with the more costly MPWB1K/6-31+G(d,p)//B3LYP/6-31+G(d,p) alternatives (Table S1).The electronic energies were within 1.0 kcal/mol, giving confidence in the B3LYP/6-31G(d) geometries, and scaling of the zero-point vibration energies by a factor of 0.993 allowed direct comparison to the values obtained using the larger 6-31+G(d,p) basis set.This approach resulted in good agreement, so MPWB1K/6-31+G(d,p)//B3LYP/6-31G(d) was used to calculate the sobralene-like carbocation cascade (Figure 6).Using this method we were able to find the sobralene-like conformers of all nine species in the cascade, and for comparison we have plotted their MPWB1K/6-31+G(d,p) energies (in green) alongside the previously-reported verticillene-like equivalents (in blue) 16a (Figure 6).

Me
Our calculations show that formation of sob D from sob C directly, via proton transfer from C11 to C7, is unfavourable.Instead, intramolecular proton transfer from C11 to C3 within sob C, results in the formation of sob F. A second intramolecular proton transfer from C3 to C7 within sob F next  31+G(d,p)//B3LYP/6-31+G(d,p)] 16a taxadiene carbocation cascades (using GG+, cation A at 0.00 kcal/mol as reference).produces carbocation D', which despite containing the desired carbocation at C8, cannot take part in the transannulation step as it sits in a non-productive conformation.
We could locate a transition state for the direct sob C sob D' interconversion (-18.8 kcal/mol, not plotted in Figure 6), but this was considerably higher in energy than the two step pathway involving carbocation sob F (Figure 6).These findings are consistent with those of Tantillo and Gutta who also reported a similar conformations for carbocations F and D. 16b As expected for a simple conformational change, the barrier to formation of the productive conformer sob D from sob D' is low (1.1 kcal/mol), and the subsequent transannulation step ( sob D sob E) proceeds with a very low activation barrier (0.9 kcal/mol) to give the carbocation sob E that contains the fully formed taxane ring system.In general, and with the notable exception of sob E, the sobralene-like conformers are slightly higher in energy than their verticillene-like equivalents, but it is also apparent that the transition-state barriers are consistently lower than on the verticillene-like pathway.
As discussed above, it is significant that the sobralenelike pathway is energetically favourable going from sob C to sob E (-11.0 kcal/mol) whereas the verticillene-like pathway is uphill going from vert C to vert E (+3.8 kcal/mol).Since the cascade from sob C to sob E would be spontaneous, it is possible that taxadiene synthase (TXS) carefully controls the conformations of the carbocations involved to ensure production of the desired product (taxadiene 3) resulting from sob E. It is interesting to speculate that recent site directed mutagenesis studies on taxadiene synthase by Brück et al. might provide evidence of this conformational control, as the V584M and V584L TXS mutants both favour formation of verticillene-like structures (viz.2) instead of taxadiene 3. 11 Thus, it is possible that the longer amino acid side chains (Met and Leu) interact with the carbocation C (or one of the other downstream carbocations) to make the active site unable to accommodate the sobralene-like conformation that is favourable for taxadiene 3 production.In this situation, the next lowest point of the cascade vert C would then be favoured leading to verticillene 2 production.
The isolation of taxadiene 3, in addition to sobralene 1, from sand flies 1 provides a further piece of tentative evidence for conformational control by the sobraleneforming synthase, as our study shows that both sobralene 1 and taxadiene 3 could be produced from the same conformation of carbocation D (i.e.sob D, Figure 2).As mentioned briefly above, in addition to imparting conformational control, it is likely that the yet-to-be identified sobralene synthase is capable of facilitating the deprotonation reaction at C9 of carbocation sob D. The presence of a suitably positioned basic residue in the active site would direct the flux of carbocation D towards sobralene 1, rather than taxadiene 3 production.Due to the urgent need to treat and/or prevent visceral leishmaniasis (VL), genome sequencing is being conducted on the sand fly carrier (Lutzomyia longipalpis), and the identity of the sobralene synthase is likely to be revealed by these studies in due course.However, in the absence of genome sequence data, we feel that based upon our study and the clear biosynthetic link between sobralene 1 and taxadiene 3, it should be possible to produce new mutants of

ACS Paragon Plus Environment
The Journal of Organic Chemistry taxadiene synthase that are capable of over-producing sobralene 1 in a host organism.This metabolic engineering approach could provide improved yields of the sand fly pheromone (sobralene 1) for use in attractant traps or other applications, and we have initiated our own studies in this area based upon our successful overproduction of taxadiene 3 in tomato plants. 7

CONCLUSIONS
Sobralene 1 is a newly discovered isomer of verticillene 2 and we propose that it has a biosynthetic connection to taxadiene 3 via the verticillyl carbocation intermediate D.
Quantum chemical calculations have shown that D can adopt a conformation, sob D, that allows a stereoelectronically favoured deprotonation leading to the cis-C8,C9 alkene bond in sobralene, and it is likely that the same sob D is adopted in the active site of sobralene synthase.Furthermore, the conformation sob D permits an energetically favourable transannulation reaction which leads to the sobralene-like conformer ( sob E) of the penultimate carbonium ion precursor E to taxadiene 3. Importantly, the calculated conformations of sob D and sob E are in very good agreement with both NMR data collected on taxadiene 3, and with X-ray data measured on an advanced synthetic precursor to taxadiene 9, thus confirming their significance on the taxadiene cascade.Significantly, our combined results provide an overall exothermic pathway to taxadiene 3 that proceeds via lower activation barriers than those previously reported.It is reasonable to conclude that taxadiene synthase favours taxadiene production by guiding carbocation D to adopt a sobralene-like conformation thereby facilitating the transannulation ( sob D sob E) reaction; subsequent deprotonation from C5 in sob E then provides taxadiene 3. The previously determined gas-phase verticillene-like conformations of the carbocations have been used in QM/MM studies of the taxadiene cascade that include taxadiene synthase, and we feel that the sobralene-like conformations identified in this study should also be considered during any future computational studies of the pathway.Although the synthase responsible for the formation of sobralene in the sand fly Lutzomyia longipalpis has not yet been identified, the isolation of taxadiene 3 alongside sobralene 1 from L. longipalpis indicates a close relationship between the biosynthetic pathways to sobralene and taxadiene, and suggests that mutants of taxadiene synthase could be engineered [4][5][6][7][8][9][10][11] to allow for the overproduction of sobralene 1 in a suitable host organism.

ASSOCIATED CONTENT
Supporting Information.Details of all calculated structures (x,y,z coordinates, energies, zero point vibrational energies and imaginary frequencies for transition states).This information is available free of charge via the internet at http://pubs.acs.org/.