Bayesian mapping of the striatal microcircuit reveals robust asymmetries in the probabilities and distances of connections

The striatum’s complex microcircuit is made by connections within and between its D1- and D2-receptor expressing projection neurons and at least five species of interneuron. Precise knowledge of this circuit is likely essential to understanding striatum’s functional roles and its dysfunction in a wide range of movement and cognitive disorders. We introduce here a Bayesian approach to mapping neuron connectivity using intracellular recording data, which lets us simultaneously evaluate the probability of connection between neuron types, the strength of evidence for it, and its dependence on distance. Using it to synthesise a complete map of the rodent striatum, we find strong evidence for two asymmetries: a selective asymmetry of projection neuron connections, with D2 neurons connecting twice as densely to other projection neurons than do D1 neurons, but neither subtype preferentially connecting to another; and a length-scale asymmetry, with interneuron connection probabilities remaining non-negligible at more than twice the distance of projection neuron connections. We further show our Bayesian approach can evaluate evidence for wiring changes, using data from the developing striatum and a mouse model of Huntington’s disease. By quantifying the uncertainty in our knowledge of the microcircuit, our approach reveals a wide range of potential striatal wiring diagrams consistent with current data.


INTRODUCTION
As the input of the basal ganglia circuit, the striatum has been ascribed key computational roles in action  Figure 1A and B), in accordance with frequentist inference about a proportion. As they are estimates, 101 they come with a level of uncertainty about the true proportion that depends on sample size, which here is 102 the number of pairs that were tested. In a frequentist approach, this uncertainty would usually be given by 103 a confidence interval. 104 However, typically, intracellular recording studies do not report any estimate for the uncertainty connections is about twice as likely as D1 to D2 connections. When we do compute confidence intervals, 109 such as the Wilson confidence interval for binomial proportions (Brown et al., 2001) that we add ourselves 110 in Figure 1A and B, we find that, given the relatively small sample sizes, the confidence intervals overlap 111 quite significantly.

112
Bayesian inference of connection probabilities 113 As we have just explained, frequentist inference gives us a single point estimatep for the probability of 114 connection, normally surrounded by a confidence interval which may be too large to be of any practical 115 use and also, because it is flat, may give the illusion that the true value of p might be anywhere within 116 this interval with equal probability. By contrast, Bayesian inference is more informative because it gives 117 us a full probability density function f p (p), called the posterior, telling us exactly how likely every 118 possible value of p actually is, given the collected data. In this way, even when confidence intervals 119 overlap as is the case for practically all the SPN to SPN connections here ( Figure 1A and B), which in 120 a frequentist interpretation would lead us to dismiss the difference as non-significant without insight as 121 to whether this is due to insufficient data or a true non-difference ( where k is the number of connected pairs and n the total number of tested pairs of that type (see Table   128 1 for a full summary of the experimental data). In this way, the binomial distribution provides a likelihood 129 for the data given p. According to Bayes theorem, the posterior distribution can be determined by: 130 f posterior (p) ∝ P(X = k|p) f prior (p) given a prior f prior (p), which is a probability distribution describing our initial beliefs about the 131 possible value of p. Finding a posterior for the success rate of a binomial distribution is a well known 132 problem in Bayesian inference and the prior distribution used in this case is a beta distribution: with a and b the parameters determining the shape of the prior, and B(a, b) the so-called Beta function.

134
The main advantage of this type of prior is that it is the so-called conjugate prior of binomial distributions, 135 which means the posterior that results from combining this prior with a likelihood in the form of a 136 binomial distribution simply turns out to be a new beta distribution with updated parameters (sparing us 137 the trouble of renormalising the right hand side of equation 2 to get a proper probability density function):  Probability of lateral connections between SPNs estimated using either frequentist or Bayesian methods. A-B Frequentist estimates of the probabilities of connection computed from intracellular recording data, and our computed 95% Wilson confidence intervals. C-D Posterior probability density functions for the probability of connection using a Bayesian approach. Coloured bars underneath the plot represent the 95% credibility intervals corresponding to each probability density function. Inset: shape of the prior, a uniform distribution. E-F Posterior probability density functions using the Jeffreys prior. G-H Posterior probability density functions using a prior based on previous literature with mean equal to 0.12 and variance equal to 0.005.

5/32
depicted in Figure 1C (light blue curve). Depending on our assumptions, different values of a and b can 143 be used to give the prior a desired shape. We begin with the common choice of the uniform distribution in 144 which p could be anywhere between 0 and 1 with equal probability, achieved by setting a = b = 1 as in 145 the example just given.   Figure 1D).

158
The fact that we used the same prior for all pairs of neuron types reflects our initial belief that 159 there is no difference in the probability of connection between pairs. To overcome this belief requires of connection that is dependent on the subtype of the presynaptic neuron, with no or little effect of the 175 postsynaptic target subtype, something we will explore more thoroughly later.

176
One of the main advantages of Bayesian inference is that it forces researchers to be explicit about their 177 priors and gives them the opportunity to choose appropriate ones. In order to illustrate this, we applied 178 three further priors to the experimental data. Firstly, the so-called non-informative Jeffreys prior sets a = b = 1/2. An intuitive way of understanding this prior is to picture ourselves at the very beginning 180 of the experiment, waiting for the result of the very first paired stimulation and recording. This test will 181 6/32 either be successful or not, meaning that the shape of the prior should give most and equal weight to these 182 two outcomes (inset of Figure 1E). Figures 1E and F show the posteriors that result from using this prior 183 and we can see how they are practically identical to the posteriors obtained with a uniform prior. This was 184 also the case when using the Haldane prior for which a and b equal 0 (not shown).

185
Our third prior is based on prior data, for Bayesian inference also provides us with a principled distinguishing SPN subtypes and concluded that lateral connections occurred at a rate of about 0.12.

189
Using this information, we can design a beta distribution with a mean of 0.12 and an arbitrary variance of   199 We previously observed that D1 neurons seem to make fewer connections than D2 neurons without with the bounds of the integral such that x + k lies between 0 and 1. We can then calculate the 207 probability that ∆ (D1→D1)−(D1→D2) is smaller than 0 by integrating this distribution between -1 and 0 (or 208 calculate if it is greater than 0 by integrating the distribution between 0 and 1). By contrast, the frequentist 209 strategy would be to compute a p-value giving the probability of getting an experimental result at least as 210 extreme as the one observed assuming the null hypothesis of no difference in connection probabilities (i.e. 211 ∆ = 0). By contrast, the Bayesian approach allows us to calculate the probability that ∆ is less than (or 212 greater than) 0 given experimental results. Thus whereas the p-value tells us how surprising the actual 213 data is if we accept the null hypothesis, the Bayesian approach can quantify precisely how unlikely the 214 null hypothesis actually is.  Figure 3A. To address this concern, instead of 254 focusing on a raw probability of connection p that is agnostic to distance, we can try to extract information 255 about the rate of decrease in connectivity as a function of distance in these two studies and see whether it 256 is consistent between the two studies. To do this, we start by positing that this decrease obeys a simple 257 exponential decay function: with β the decay parameter of unknown value, and r (for radius) the distance separating the two neurons. This is obviously a simplistic model, but its advantage for us is its dependence on a single parameter β , which, as we will demonstrate, can be linked to the binomial success rate p that we've been concerned with so far (and we note that any model with more than one parameter could not have its parameters uniquely estimated from the single estimate of p) . Ideally, to estimate this β parameter would require knowledge about the exact distance between every recorded pair of neurons, from which we could directly fit the model, but with simple assumptions on the sampling method used by experimenters, we can find an alternative way of converting values of β into p. Since the distance between each sampled pair of neurons in an experiment is in fact unknown to us, we shall consider it as a random variable. We can now express p as a function of β as which is the product of the probability of experimenters selecting a neuron at distance r from another  A simple model for f equi would be that, given a certain volume surrounding a central neuron, the probability of sampling any given neuron in that volume is equiprobable for all neurons ( Figure 3B). With this assumption, we obtain the solution (see Methods): In the case of equiprobable sampling, the probability of choosing neurons further away increases as the infinitesimal volume corresponding to that distance increases as a linear function of r. C The probability of finding a connected pair of neurons depends on two different processes. Firstly, the process of connection, modelled by the probability of connection between two neurons given the distance between them, which we postulate decays exponentially; secondly, the process of sampling neurons in the experiment, modelled as the probability of selecting another neuron at a given distance from a starting neuron. We explore here two different scenarios for the sampling process: an equiprobable scenario in which neurons within a determined volume are selected randomly, and a nearest-neighbour scenario in which the selected neuron is whichever is the closest within the maximum distance set by the experimenters. The overall rate of connection reported by the experimenters then corresponds to the integral (shaded areas) of the product of these two probability models. Hence, differences in sampling processes can cause different rates of connection, even if the probability of connection given distance is the same.
the probability of connection between two types of neuron, we can now transform these into posteriors 266 for β , f β (β ) through parameter substitution using equation 8 (see Methods).

267
Probability of connection decreases faster for D1 than for D2 neurons 268 We apply this method to the posteriors for the probabilities of connection collapsed according to the 269 subtype of the presynaptic SPN and obtain the posteriors for the decay rate β shown in Figure 4A and B  Table 3) are 276 also shown in the inset of Figure 4A, and it is evident that they are extremely close to one another. As for 277 D2 neurons ( Figure 4B), the decay rate is smaller, as expected given that we have already shown that the 278 overall probability of connection is higher for these neurons, ranging between 0.02 and 0.07 µm −1 .   Table 3. MAPs and 95% credibility intervals (in µm −1 ) of the posterior curves for β .
If we compare the estimates between the two datasets more critically, there is a clear bias for the  (Figure 4A and B). Indeed, if we refer to the insets in Figure 4A and   We next consider explanations for this.

306
Biased neuron sampling could explain differences between datasets 307 One potential explanation is that the sampling of neuron pairs was more complex than the equiprobable 308 sampling model we first assumed. In this section, we explore this possibility by considering a model will assume that they always patched the closest neuron within the maximum distance they set themselves.

317
This means that we are looking for the density function for the nearest neighbour, f NN (r), which has a 318 solution of the form: with k a normalising constant (see Methods for derivation). Normally, k is defined so that integrating 320 f NN between 0 and infinity is equal to 1, i.e. the nearest neighbour must be somewhere in that interval. In 321 our particular case however, experimenters set themselves a maximum distance of either 50 or 100µm, 322 meaning that the closest neuron must be closer than this distance (if there was no neuron closer than this, 323 experimenters would simply look for another pair). In other words, k is such that:  6C and E; Table 3) thus reconciling the two studies. In line with the already discussed overall smaller rate 384 of connection to D2 neurons, the probability of connection drops much faster as distance increases for 385 connections to D2 neurons (dropping to 50% after about 100 µm, see Figure 6F) than for connections to D1 386 neurons (50% connection rates occurring at a distance of at least 200 µm, see Figure 6D). Consequently,

387
there are at least two length-scales in the striatal microcircuit, with connections between SPNs falling 388 to 50% probability within a few tens of micrometers ( Figure 4A,B), but connections to SPNs from FS 389 interneurons falling to 50% probability at a hundred micrometers or more ( Figure 6D,F). 391 We turn now to the connections that FS interneurons make on other interneurons of the striatum. To

398
We see from these data that the probability of connection from FS interneurons to PLTS interneurons 399 is low (p MAP = 0.10) but uncertainty regarding these connections is quite large (95% credibility interval shown in Figure 7C. It remains to be seen whether there is an asymmetry in connection probability to D1 416 and D2 SPNs as the experimenters did not make this distinction when testing PLTS → SPN connections.

417
Furthermore, it has been reported that PLTS interneurons prefer to make connections to faraway SPNs 418 (Straub et al., 2016) meaning that this low rate of connection locally is not the full extent of PLTS 419 interneuron connectivity, and also prevents us from using an exponential decay model of probability to 420 extract a decay rate as we did for the SPNs and FS interneurons.  slightly more conservative -0.2 to 0.15 using the prior based on previous literature ( Figure 8F). . We add here the 95% Wilson confidence intervals. D-F Posterior probability density functions for the probability of connections between SPNs at each developmental stage. Coloured bars underneath the plot represent the 95% credibility intervals. A uniform prior as in Figure 1C is used. Inset: Density function for the difference in probability of connection for pairs with a D2 presynaptic neuron. G-H Density functions for the difference in connection probabilities for each pair of neuron types between consecutive stages of postnatal development. Figure 10. Map of the striatum microcircuitry based on the MAP estimates for p and, when a maximum intersomatic distance was available, the decay rate β assuming equiprobable sampling. Line thickness is indicative of the relative probability of these connections. Connections between and within SPN subtypes are assumed to be the same for a given presynaptic subtype, as established in the main text, and the two different estimates for p correspond to the two different maximum distances used in Taverna

508
In this paper, we presented novel methods based on Bayesian inference for analysing connectivity using 509 intracellular recording data, and applied them to reconstruct the microcircuit of the striatum. We drew on and some of these are broad -for example, the connection probability from NGF to Ach interneurons has 532 a 95% credible interval twice as wide as its best (MAP) estimate (Table 1) Table 1 and our confidence 544 intervals for the decay parameter β in Table 3. 1mm, and which make infrequent bouquets of terminals (Kawaguchi, 1993). More detailed knowledge of 574 these long-distance connections would allow for a more complete map of striatal connectivity.

575
Finally, we introduced a new analytical method to determine the distance-dependence of the probability 576 of connection between two neuron types, which depended on simple models for the distribution of neurons theory that accounts for any of these interneuron features, and so their functional role is unclear.

644
A Bayesian approach to microcircuit mapping can be used anywhere in the brain 645 We showed a range of advantages that our Bayesian approach has over more traditional frequentist 646 approaches. A first qualitative advantage of Bayesian inference is that it replaces a single point estimate  The methods we present here can easily be applied to any other brain regions where paired recording are these Bayesian methods easily applicable to intracellular recording data from any brain region, but 674 also may be a rare case where it easier to be Bayesian than frequentist. should reflect the fact that the average connection rate of 0.12 masks the potential existence of four 694 distinct connection rates for each pair. We were unable to find a principled way of deriving this desired 695 variance and, for this reason, different values of variance were tested before settling for 0.005 which gives 696 the corresponding beta distribution a shape that makes such a prior both sufficiently informative as to 697 be interesting without being completely insensitive to the addition of new data. Setting µ = 0.12 and 698 v = 0.005, we find a = 2.56 and b = 18.12.

Conversion of p into β 700
By definition of a probability density function, f equi (r) must be such that the probability that r is found 701 between two arbitrary values r 1 and r 1 + ∆r is: 702 P(r 1 < r < r 1 + ∆r) = r 1 +∆r This probability distribution for distance depends on how the experimenters sampled their pairs. A 703 simple assumption would be that given a certain volume surrounding a central neuron, the probability of 704 testing any given neuron is equiprobable for all neurons, hence why we chose to call the density function 705 f equi . So for a given distance r 1 from a central neuron, the probability of selecting a neuron within the 706 volume bounded by r 1 and r 1 + ∆r is also equal to the ratio of the expected number of neurons found 707 within it over the expected number of neurons in the total volume: 708 P(r 1 < r < r 1 + ∆r) = N.V (r 1 ) N.V tot (16) with V (r 1 ) the subvolume bounded by r 1 and r 1 + ∆r, V tot the total volume and N the density of 709 SPNs of whichever given type experimenters are currently trying to sample. Note that N cancels out in 710 the fraction, which implies that the probability distribution for distance is, counter-intuitively perhaps, focus, and the subvolume is a hollow cylinder, as depicted in Figure 3B: V (r 1 ) = hπ((r 1 + ∆r) 2 − r 2 1 ) = 2πr 1 h∆r + πh∆r 2