Occupational Choice with Endogenous Spillovers


 We study a model that integrates productive and socialising efforts with occupational choice, and endogenous spillovers. We show that more talented individuals work harder and contribute more to externalities, but also have incentives to segregate. Average socialising increases the productivity of the occupation. The size of an occupation grows with its synergies. Individuals underinvest in productive and socialising effort, and sort themselves inefficiently into occupations. We derive the optimal subsidy for sorting into different occupations. Finally, we derive a rule to identify overpopulated sectors and establish the connection between inequality of talents, socialising, productive efforts and occupation size.


Introduction
Many productive processes are mediated by social interactions. The accumulation of human capital (Moretti, 2004), innovation (Cassiman and Veugelers, 2002), and crime (Glaeser, Sacerdote, and Scheinkman, 2003), are examples of activities carried out by individuals whose actions are affected by the activities and abilities of others with whom they establish connections. Since social interactions have productive consequences, economic agents naturally devote a considerable effort to developing them. From the perspective of an individual, socializing in production activities involves two different but interconnected decisions: first, selecting with whom to interact, and then choosing the strength of these interactions, together with their productive effort. However, the literature has explored these two dimensions of socializing separately. 1 We develop a framework to study the joint determination of both dimensions of socializing in the context of occupational choice. We show how spillovers emerge endogenously from individual productive efforts and socializing decisions, investigate the associated effects on welfare and derive novel implications for policy and empirical research.
Formally, we study a model in which individuals are endowed with different (occupation-specific) abilities and socializing is multidimensional. First, each individual decides which occupation to join. Once individuals sort themselves into occupations, they choose their productive effort and the intensity of their social interactions. Socializing allows individuals to benefit from the endogenous spillovers emerging from the productive efforts of those in the same occupation. Despite the complexity imposed by these features, our framework allows for a comprehensive equilibrium and welfare analysis in a simple and intuitive way. Our analysis generates new insights and empirical implications for occupational choice with great ease. Furthermore, as we briefly discuss in the Conclusion, our framework is compelling since it is sufficiently flexible to study many social and economic phenomena.
Embedding endogenous spillovers in a model of occupational choice is important for several reasons. First, because these spillovers exist: empiri-cally the importance of social connections for entrepreneurs, 2 for professionals 3 and even for the unemployed 4 has been widely established. Second, because spillovers matter: Guiso and Schivardi (2011) find that spillovers rather than heterogeneous entry costs are the explanation for differences in entrepreneurial activities across Italian regions. 5 Finally, spillovers are likely to be endogenous: if spillovers are beneficial (damaging), rational individuals will look for ways to enhance (reduce) them. The scarce existing literature introducing spillovers into occupational choice takes them as exogenous. 6 Analyzing endogenous spillovers leads to important insights. Our framework allows us to characterize the optimal policy to achieve efficiency in occupational choice. We provide the explicit form of the the first-best allocation for the social planner which can be achieved in equilibrium combining a linear tax with a particular subsidy for any type of distribution of talents. This result shows that the optimal policy needs to have two dimensions to correct both the within sector inefficiency caused by the externality of spillovers and the misallocations across occupations. This finding rationalizes why many measures implemented by governments to correct inefficient occupational choices, in particular measures to boost entrepreneurship, have failed. 7 There are other examples, where a policy is designed with the purpose of affecting the interaction environment of individuals. As Carrell, Sacerdote, and West (2013) show, doing this without considering the incentives for interaction, and the fact that group formation is endogenous can lead to counterproductive effects of the intervention.
One of the most novel contributions of our analysis is to study the implications of spillovers for allocations of individuals across sectors. We assume that individual abilities are given according to a Pareto distribution, since wages and income, at least at the top of the distribution, are well described by a Pareto. 8 With a Pareto distribution of talents our equilibrium is unique, 2 See for example, Guiso and Schivardi (2011); Guiso, Pistaferri, and Schivardi (2015); Hoanga and Antoncic (2003).
3 See for example West, Barron, Dowsett, and Newton (1999) for the medical and Ogus (2002) for the legal profession.
which is a useful feature of our framework. We find that the type of allocative inefficiency (overpopulation or underpopulation of one occupation) depends on the Pareto shape parameter and the strength of the synergies. Our model provides two rules of thumb to identify potentially over-or underpopulated occupational sectors. The first rule applies to a situation with a low dispersion in the Pareto distribution, implying that each sector is characterized by few superstars. In such a situation, the sector with stronger synergies is going to be underpopulated. On the contrary, when synergies are weak in all occupational sectors, the sector with stronger synergies is overpopulated. These results are important because they provide guidance for both policy and empirical work. We also explore how increasing inequality in talent affects allocative inefficiency. In our framework, increasing the dispersion of a distribution enhances socializing and productive efforts. Interestingly, this effect is not confined to the occupation where inequality increases, but also takes place in the other occupation. Distributional spillovers across occupations imply that more inequality leads to a better selection of types in both occupations inducing higher productive and socializing intensities, thereby connecting two phenomena that are generally considered as independent from each other. This is another novel result with useful implications for policy and empirical analysis. 9 We also provide a secondary set of results concerning individual decisions about productive and socializing effort within a given occupation for general distributions of talents. We show that more talented individuals do not only work harder but also generate more spillovers. 10 Furthermore, our model predicts that on average individuals in more productive occupations work harder and socialize more. 11 But average socializing and, hence, learning spillovers are also increasing in network synergies. As a consequence, occupations with weaker synergies should experience lower interactions and fewer spillovers. 12 Insofar as synergies capture institutional and technological aspects of socializ-9 Inequality spilling over across occupations is a relatively unstudied possibility. In a recent paper, Clemens, Gottlieb, Hémous, and Olsen (2016) show that higher inequality in one occupation spills over into other occupations through consumption demand across occupations, yielding further increases in inequality.
10 This result is consistent, for example, with Azoulay, Zivin, and Wang (2010), who show that researchers collaborating with a superstar scientist experience a significant decline in their productivity (quality adjusted publication rate) after the unexpected death of their superstar collaborators. Similarly, Waldinger (2010) find that the expulsion of high quality Jewish scientists from Nazi Germany harmed, in a significant way, their students left behind. 11 The connection between occupation productivity and individual socializing effort is in line with Currarini, Jackson, and Pin (2009) and consistent with observations provided by Albornoz, Cabrales, Hauk, and Warnes (2017).
12 This result is observed by Nix (2015) for the case of Sweden.
ing, we can provide an explanation for the intensity of spillovers varying across geographical regions (e.g. Bottazzi and Peri, 2003) and over time (e.g. Jaffe, 1986). We complete our characterization of individual decisions by showing how the benefits of socializing are greater for highly productive workers; a feature that rationalizes the existence of fraternities and elite societies (e.g. Popov and Bernhardt, 2012). This paper is organized as follows. We begin by discussing our contribution to the literature (Section 2). In Section 3, we spell out the model. Section 4 contains the equilibrium analysis and the general results valid for any occupation specific ability distribution. In Section 5, we study the case of a Pareto distribution of talents. We conclude in Section 6. Most proofs are gathered in an online Appendix.

Contribution to the literature
Our model contributes to several aspects of the literature of occupational choice. This literature generally builds upon the seminal contribution by Lucas (1978). In Lucas (1978)'s model as well as in several follow-up papers, ability has a single dimension which implies the counterfactual prediction that all entrepreneurs should earn more income than every employee. The literature has accounted for low and high income in both sectors by adding a second dimension of abilityà la Roy (1951). 13 We follow this approach and allow for occupation-specific abilities. As a consequence, occupational choices are determined by comparative rather than absolute advantage. In this context, Rothschild and Scheuer (2012) and Scheuer (2014) study the optimal design of redistributive income taxes. We also study optimal policy instruments but our concern is efficiency not redistribution. A fundamental contribution of our approach is introducing endogenous spillovers. The few papers studying the effect of spillovers in occupational choice take them as exogenously given. In Guiso and Schivardi (2011), exogenous spillovers affect occupational choices by shifting productivity. In Cicala, Fryer Jr, and Spenkuch (2016); Chandra and Staiger (2007), exogenous spillovers change relative benefits from different activities. We complement this literature by providing a framework where individual efforts affect the level of spillovers they enjoy and derive its policy implications.
There is plenty of evidence of excessive or insufficient number of partic- 13 Early examples are Sedlacek (1985, 1990) and Jovanovic (1994).
ipants in specific occupations. Many countries make it a priority to spur entrepreneurship. Shakhnov (2014) finds that financial markets are overcrowded with respect to entrepreneurship and that the model matches well US data. Khabibulina and Hefti (2015) find a negative correlation of relative wages in the financial sector with respect to the manufacturing sector in the U.S. states from 1977 to 2011. Lopez-Martin (2015) obtain similar results for the allocation of workers between the formal and informal sectors. Our paper provides an explanation for these phenomena and shows that overpopulation/underpopulation can emerge in a model without much structure. More generally, our results have concrete implications for economic growth, as misallocation of talent and resources is viewed as a major force of cross country GDP and productivity differences (e.g. Murphy, Shleifer, and Vishny, 1991;Restuccia and Rogerson, 2013;Hsieh and Klenow, 2009). There is a very large research effort to understand the effect of social relations and occupational decisions and outcomes (e.g. Granovetter, 1995;Calvo-Armengol and Jackson, 2004;Bentolila, Michelacci, and Suarez, 2010, to mention some of many contributions). The main goal of this literature is to clarify how previous social connections affect future employment decisions. In our analysis, occupational choice is driven by future socializing, not past connections. In this sense, our paper offers a new direction to explore the relationship between socializing and productive decisions.

A model of occupational choice
We consider an economy with a continuum of heterogeneous individuals. Occupational choice is modeled as a two-stage game. In the first stage, individuals simultaneously choose their occupation. They can either be employed in occupation M or in occupation F. For illustrative purposes we will often refer to occupation F as entrepreneurs and to occupation M as employees, but our results also apply to different occupation sectors. 14 In the second stage, all agents within the same occupation simultaneously decide their direct productive effort k n i and their socializing effort s n i .

The pay-offs
Each individual i has an occupation-specific individual productivity parameter b n i for n ∈ {M, F } , which is randomly and independently drawn for each occupation. 15 The payoff within a particular occupation n is the sum of two components, a private component P n i , and a synergistic component S n i derived from social interactions. The private component P n i has a linear-quadratic cost-benefit structure and is given by where d n is an occupation-specific parameter and is multiplicative in individual ability in occupation n.
The synergistic component, S n i , captures that socializing is required to take advantage of the externalities generated within each occupation, which are due to the complementarity in productive efforts. 16 The synergistic returns are given by where N i denotes the occupational group to which individual i belongs; the parameter a captures the overall strength of synergies, s n is the profile of all socializing efforts within the occupation (and which we assume has no effect between occupations) and g n ij (s n i , s n j ) is the link intensity of individual i and j, defined as follows: 17 Each occupational group is composed by a continuum of individuals N n ⊂ R for n ∈ {M , F } , where the measure of the set N n is N n . The payoff of individual i in an occupational group n is the combination 15 For the time being we make no specific assumptions on how these abilities are distributed, which also implies that the distribution of talents across occupations might follow any correlation structure or be independent. The specific case of a Pareto distribution is studied in Section 5. Results for the uniform distribution can be found in the working paper version . 16 Of course, socializing could also have a negative effect, say, because of revealing secrets to competitors. Assuming complementarity implies that we focus on situations where the benefits of socializing are larger than the costs. 17 We provide a micro-foundation for this functional form in online Appendix A.1.
of i s private returns and its synergistic component: We interpret the positive part of this individual payoff as the value of individual output and denote it by y n i To be more precise:

Discussion of the main assumptions
Our set-up can be thought of describing occupational choice in a competitive frictionless labor market with endogenous spillovers. There is a bounded set of workers in the market, namely N M ∪ N F . Their productivity is perfectly observed. There are no matching frictions. Firms have a linear technology and there is free entry from an unbounded set of them, and there are no fixed costs. In this environment, the positive component of individual payoffs y n i corresponds to the value of the output produced by this worker in equilibrium. In our model, socializing within each occupation is undirected, but it is directed across occupations. 18 This means that within occupational groups the agents only choose the amount of interaction s i , but not the identity of the individuals with whom they interact. However, individuals choose the occupational group where they socialize. Many real world examples fit this way of socializing: entrepreneurs and employees go to conferences or business fairs, they join professional associations and go to their meetings, or simply share social activities or events. Synergistic effort is mostly generic within the conference, fair or social gathering; but clearly individuals carefully choose the socializing spaces they attend and the associated socializing intensity.
18 Undirected socializing and the requirement of socializing to enjoy externalities are features shared with Cabrales, Calvó-Armengol, and Zenou (2011). However, we propose a different functional form for the benefits from synergistic returns. We will show that using our synergistic component S n i leads to a game with a unique symmetric equilibrium within a network, while the game in Cabrales, Calvó-Armengol, and Zenou (2011) has multiple equilibria. Equilibrium uniqueness in socializing and productive efforts facilitate our analysis of directed occupational choice.
We assume that productive efforts are complementary. In particular, we model synergistic returns as multiplicative in individual productivity parameters and in the square root of productive efforts additively separable by pairs. 19 Adopting this specific functional form implies that synergistic returns are symmetric in pairwise productive efforts and that the synergistic returns exhibit constant returns to scale to overall productive efforts.
We restrict individuals to belong to one single group only. This assumption is consistent with a number of potential applications: most people are either entrepreneurs or employees. They tend to have only one profession to which they dedicate themselves; academics generally do not work simultaneously in very distinct fields; top athletes generally only excel in one sport; and in spite of "Ingres' violin" the same thing generally holds for artists. 20 It can also be justified formally within the model in a variety of ways. For example, by adding a sufficiently large fixed cost to join a group which could arise from training costs. We also assume no specific capital requirements to become an entrepreneur. This could be due to the absence of capital market imperfections or justified by simply assuming that entry costs are similar across occupations. This way, occupational choices are not associated with initial wealth and we can focus on social interactions and productive decisions. 21 Individual ability in our model is always multiplied by the occupationspecific parameter d n , so that the "effective" individual ability of individual i in occupation n is captured by d n b n i . This is a purely technical assumption which amounts to a normalization of the distribution of "effective" abilities d n b n i . It allows us to discuss the comparative statics of a change in the mean of the ability distribution while fixing the distribution of b n i . Obviously a shift that increases d n involves a specific way to introduce a first order stochastically dominating shift in "effective" abilities d n b n i .
19 Complementarity in productive returns in Cabrales, Calvó-Armengol, and Zenou (2011) is generated by synergistic returns being multiplicative in productive efforts and additively separable by pairs. 20 The term "Ingres' violin" comes from the French neoclassical artist Jean Auguste Dominique Ingres, who while famous for his paintings was also incredibly talented though less well known for his skill on the violin. 21 See Evans and Jovanovic (1989) for the seminal contribution on the analysis of the effect of liquidity constraints on entrepreneurial choice.

The equilibrium and general results
We solve the game by backward induction. We compare the individual optimum with the social optimum in which a social planner maximizes the sum of individual utilities. We first solve for the optimal efforts within an occupational group and then let individuals sort themselves (or be sorted by a social planner) into occupations.

Choice of production and socializing efforts
For each individual, we have to find the optimal productive and socializing effort within each occupation (we suppress the superindex referring to the occupation when there is no ambiguity). For the individual choice problemthe decentralized problem -this is the choice of k i and s i that maximizes (2). The social planner, on the other hand, chooses k sp i and s sp i to maximize the sum of individual utilities given by Denote by b 2 = j∈N i b 2 j dj. We first define some functions that are going to be useful in the description of the equilibrium values. and To avoid unbounded equilibrium choices we assume: 22 which guarantees that the k and k sp are always well defined. We can now derive the equilibrium decisions in terms of productive and socializing efforts, which we state as follows: Proposition 1. Under assumption 1, both the individual choice problem and the social planner choice problem have a unique (interior) solution which for each individual is equal to her own productivity multiplied by a function that is identical for all individuals in the group. 23 That is for the individual choice problem, and for the social planner, respectively.
Proof. See online Appendix A.2.
Proposition 1 has important empirical consequences. Since individual productivity b i is complementary to effort, it follows that Empirical Implication 1. More talented individuals work harder.
The correlation between talent and effort has been observed in education; a sector for which we have good data on both ability and effort. 24 But these individual features also translate to the group level, something that allows to make intergroup comparisons as well. On the one hand, highly talented individuals generate greater externalities on their fellows. Evidence consistent with this result is observed in the academic world. For example, the sudden absence of extremely highly productive researchers provides a natural test for our prediction. Azoulay, Zivin, and Wang (2010) find that researchers collaborating with a superstar scientist experience a lasting and significant decline in their quality adjusted publication rate after the unexpected death of their superstar collaborator. A result similar in spirit is provided by Waldinger (2010) when showing that the expulsion of high quality Jewish scientists from Nazi Germany had a negative effect on the productivity of the Ph.D. students left behind.
Proposition 1 also shows that average socializing is increasing in average group productivity b = j∈N i b j dj. 25 Therefore, Empirical Implication 2. Individuals within more productive occupational groups socialize more on average.
This empirical implication of our model is consistent with evidence presented in Currarini, Jackson, and Pin (2009) showing that the number of interactions within friendship groups are increasing in their size. Albornoz, Cabrales, Hauk, and Warnes (2017) provide further empirical evidence for this prediction based on the analysis of co-authorships within economics fields. Furthermore, academic life is clearly an example of a situation in which an individual's productive outcomes are affected by the abilities and activities of other researchers involved in the same production process. Hence socializing decisions become key productive choices. Moreover academics choose their field of research: their group. Using data scrapped from the IDEAS-RePEc website Albornoz, Cabrales, Hauk, and Warnes (2017) establish that economic researchers who work in more productive fields tend to have more co-authors.
Proposition 1 also reveals that average socializing and hence learning spillovers are increasing in network synergies a. Thus, Empirical Implication 3. Occupations with fewer synergies should experience lower interactions and fewer spillovers. This is indeed found by Nix (2015) for the case of Sweden. After constructing a ranking of interactions with peers using Swedish data on workers, their peers, and their firms from 1985-2012, Nix (2015) compares it to estimated learning spillovers per-occupations and finds a strong correlation between those two measures.
Insofar as synergies capture institutional and technological aspects of socializing, we can also provide an explanation for the intensity of spillovers varying across geographical regions (e.g. Bottazzi and Peri, 2003) and over time (e.g. Jaffe, 1986).
Using the optimal efforts derived in Proposition 1, we can calculate the associated individual utilities.
Proposition 2. Equilibrium individual utilities are in the individual choice problem and for the social planner solution.
Proof. See online Appendix A.2.
From Proposition 2 we observe that while all individuals benefit from being in a more productive occupational group, 26 higher types benefit even more from a given level of within-occupation externalities. 27 Since productivity is independent of occupational group size for a given average spillover, 28 it follows that Empirical Implication 4. High types have an incentive to segregate from low types if possible.
We certainly observe a tendency for high-skilled employees or entrepreneurs to create elite societies. Good examples are the Freemasons or the Rotary club (Yanagida (1992), Burt (2003)) where access is restrictive and whose objective seems to be mainly to socialize among like-minded high-skilled individuals. 29 These examples are particularly interesting because they are often secretive, i.e., they are not created for the purpose of signaling such quality to the external world. 30 From Proposition 1, it is easy to see that individuals fail to internalize the positive externality of their investment decisions on the other members of their occupational group. Therefore, the individual utility resulting from the  (2007)). Also, fraternities in college serve the purpose of segregation, are mainly for networking and have a positive effect on future income. Marmaros and Sacerdote (2002) report that fraternity membership is positively associated with networking and with finding a high paying job directly out of college. Routon and Walker (2014) confirm that fraternity membership increases the probability of a recent graduate obtaining a job. Mara, Davis, and Schmidt (2016) find that fraternity membership increases expected future income by roughly 30%. decentralized solution (9) is lower than the individual utility resulting from the social planner solution (10). In other words, Empirical Implication 5. Individuals underinvest in both productive and socializing effort (k sp > k and s sp > s).
This underinvestment is specifically severe in professional activities where learning spillovers are important for productivity 31 and provides a rational for subsidizing these activities. Entrepreneurship has emerged as a key issue in the policy arena in the last few decades. 32 For instance the European Commission launched the "Small Business Act for Europe" in June 2008, which explicitly recognizes the central role of innovative small and medium-size enterprises (SMEs) in the EU economy and sets out a comprehensive policy framework for the EU and its member states. In this document, the Commission proposes that member states should create an environment that rewards entrepreneurship, specifically mentioning taxation in this context. Since entrepreneurial effort in particular, and effort within an occupation in general, is suboptimal in the presence of spillovers, we now turn to the determination of an optimal subsidy within each occupation.
Proposition 3. A subsidy that achieves efficient effort within an occupation (taking as given the selection into occupations) is given by: Proof. See online Appendix A.3 This subsidy, which is based on observable individual output and productive effort, alters the original utility in a way that induces socially optimal levels of effort. 33 However, it takes as given the selection into occupations. For this reason, it is only part of an optimal policy. Individuals choose their 31 This is clear in the high-tech industry. To cite one example, Pirolo and Presutti (2007) analyze the metropolitan high-tech cluster in Rome and show that social interactions are the most significant determinant of the innovation process and relationships based on knowledge sharing are the most important ones. 32 The Economist on 14th March 2009 published a special report on entrepreneurship with the title "Global Heroes". 33 Of course, it also assumes that the distribution of talent and other common parameters are known. But these are things that can in principle be estimated from aggregate data and observable output. occupation, and these individual choices might not be efficient. We now analyze the optimal individual occupational choice and then return to the issue of taxation to induce efficiency.

Choice of occupation
Having found the second-stage utilities, we can now solve the first-stage in which individuals sort themselves into either employees (group M ) or entrepreneurs (group F ). When deciding which occupational group to join, individuals take the occupation choices of others as given. They choose the occupation that grants them the maximal utility given the optimal within occupation investment choices, which could result from the decentralized or the centralized solution derived in the previous subsection.
We show that independently of whether productive or socializing efforts within the occupation are individually chosen (decentralized solution) or by the social planner, the solution is characterized by a cutoff value C, such that individuals for whom the ratio In other words, comparative advantage determines the choice of occupation in a particularly simple way. Naturally, C is an endogenous function of all the parameters in the model, and in general, it need not be unique. We denote the slope of the dividing line by C P if effort choices in the occupational groups are decentralized and by C E if the social planner implements efficient effort choices within the occupations.
Proposition 4. For any underlying distribution of abilities, if assumption 1 is satisfied, both C P and C E exist and are decreasing in a M and d M and increasing in a F and d F . 34 Proof. See online Appendix A.4.
When C decreases more people become employees (join the M -group). Similarly, an increase in C implies that more people become entrepreneurs (join the F -group). Thus, according to Proposition 4, an increase in the power of synergies, or a specific first order stochastic dominance shift in the distribution of final abilities, will lead to more people joining the affected occupation.
Higher within-occupation synergies a can be caused by the introduction of new or improved communication technology facilities. Shifts in d could be technological changes that affect the productivity of every individual in a given occupation. Or they could be due to institutional features.
The effect of communication technologies (changes in a) on productivities has been widely acknowledged. 35 To our knowledge, there is no study linking the relative sizes of economic sectors with their differential adoption of communication technologies. This paper provides a clear prediction for this linkage.
Empirical Implication 6. The differential adoption of communication technologies in different sectors should be accompanied by an increase in the relative size of the sector after the technology is adopted.
This prediction can be tested in future research and exhibits the nice feature of being independent of the underlying distribution of abilities. Similarly, our model delivers clear and testable predictions for a shift in d. This is especially relevant if we interpret our model as choosing to work in the formal or informal sector. In some institutional settings very large (or very small) firms are extremely regulated, while in others there are too many loopholes for politically connected firms. A looser control of informal activities induces a high d in the informal sector. The d in the formal sector would suffer from high taxation. Therefore, Empirical Implication 7. More people will work in the informal sector at the expense of working in the formal sector the looser the controls of informal activities and the higher formal sector taxation. (2015) finds plentiful evidence consistent with this implication.

Lopez-Martin
The above results only indicate how the relative occupational sector sizes change with the underlying parameters, but they do not inform us about the efficiency or inefficiency of the equilibrium outcomes. However, independently of the direction of inefficiency, we can show that every relative sector size can be achieved using a linear tax/subsidy no matter the underlying talent distributions and hence the planner can also achieve the social optimal sorting into occupations. Furthermore: Proposition 5. The first-best allocation for the social planner, including the socially optimal C, can be achieved in equilibrium using a linear tax/subsidy on output t, plus a subsidy equal to t y Proof. See online Appendix A.5.
Proposition 5 establishes that a first-best allocation can be achieved by combining a linear tax with a particular subsidy for any type of distribution of talents. However, since there might be multiple equilibria in the occupational choice stage, it abstracts from equilibrium selection issues. Also, in order to determine the correct linear tax or subsidy, which depends on whether an unregulated occupational sector is too big or too small, we will need to make specific assumptions about the underlying talent distributions. In what follows we will focus on the Pareto distribution which is empirically relevant and -as we will show -leads to a unique equilibrium in our model. 36

The case of a Pareto distribution of talent
The empirical relevance of the Pareto distribution for describing variations of wages and income across individuals has been well established. 37 Since its shape parameter is an inverse measure of the spread of talent, the Pareto distribution therefore also captures the empirical distribution of talents. Assuming that individual talents in each occupation follow a Pareto distribution and that talent is occupation-specific leads to a unique equilibrium in our model. 38 Proposition 6. If abilities are distributed independently and follow a Pareto law in [1, ∞) with shape parameter α j for j ∈ {M, F }, both C P , defined by (28) and C E , defined by (30) exist and are unique. 36 The working paper version also includes results for the uniform distribution . 37 For example, Mandelbrot (1960); Guvenen, Karahan, Ozkan, and Song (2015). 38 Notice that we assume uncorrelated talents for expositional simplicity. As discussed below, our main results in this section are robust to the correlation structure.
Proof. See online Appendix A.6.
Consequently, there is also a unique first best solution, which requires an intervention at both margins, i.e. by inducing optimal effort within the occupation as well as choices leading to the optimal occupational choice. Without an intervention on location, one sector will be overpopulated while the other sector will be underpopulated. There is plenty of evidence of excessive or insufficient size of specific occupations. Shakhnov (2014) shows that financial markets are too large with respect to the entrepreneurship sector with a model that matches well US data. Khabibulina and Hefti (2015) find a negative correlation of relative wages in the financial sector with respect to manufacturing sector in case of the U.S. states from 1977 to 2011. A similar, while somewhat less robust, result applies to the case of relative sector sizes as measured by the labor force. Our paper provides an explanation for these phenomena and shows that productive and informational spillovers are prime candidate mechanisms for overpopulation/underpopulation to emerge in economic sectors.
The following results provide some insights on the direction of overpopulation when abilities follow a Pareto distribution and investments in productive and socializing effort are optimally determined by a social planner. 39 Proposition 7. Let abilities be independently distributed, and assume they follow a Pareto law in [1, ∞) for n ∈ {M, F } with a common shape parameter α. Without loss of generality assume that sector F has higher overall strength of synergies, i.e.a F d F > a M d M . Then social welfare may increase by adding (i.e. ∂w(C) ∂C C=C E > 0) or by decreasing (i.e. ∂w(C) ∂C C=C E < 0) the number of workers in occupation F . In particular, • The sector with the overall higher strength of synergies (the F-sector) is underpopulated for distributions with relatively low dispersion. 40 39 Proposition 7 established that intervening only locally within a group leads to suboptimal choice of occupations. This poses the question whether an uncoordinated intervention can be worse than no intervention. In the working paper version , we provide an example of such a result when the talent distribution is uniform and in the presence of a special type of congestion costs. In this setup the equilibrium is also unique. 40 The exact technical condition is the following: For fixed values of a F , a M , d F and d M , there is a value of α high enough such that ∂w(C) ∂C C=C E > 0).
• When overall synergies are sufficiently small in both occupations, the F-sector is overpopulated for sufficiently low synergies a F . 41 Proof. See online Appendix A.7.
If the overall strength of synergies is higher in the F -sector, then reallocating M -types that are close to indifferent to occupation F leads to lower welfare in occupation F , since the average type in occupation F decreases. At the same time, welfare in occupation M increases because the average type in occupation M increases. 42 The overall effect on social welfare is therefore ambiguous. Notice that Proposition 7 establishes that occupation F is underpopulated for distributions with relatively low dispersion (high values of α). Notice as well that a low dispersion in a Pareto distribution implies that the number of superstars is very small. 43 Thus, if both sectors have a low number of very able individuals, the welfare can increase by augmenting the size of the sector, which has the larger impact of synergies.
Empirical Implication 8. If the number of very able individuals in each sector is small, the sector with the larger impact of synergies will be too small.
Proposition 7 also establishes a second rule of thumb: Empirical Implication 9. The size of the occupation with higher overall synergies is sub-optimally large when synergies are sufficiently small in both occupations. 44 To the extent that dispersion of talents and the strength of spillovers within a particular occupation are observable, Empirical Implications 8 and 9 provide potentially useful rules of thumb to detect local underpopulation or overpopulation of different occupational sectors. Verifying these rules is left as an empirical challenge for future work. 41 The exact condition is the following: for fixed values of d F , d M and for a M 2 d M 2 a F 2 d F 2 < 1, there is an a F low enough such that the F-sector is overpop- . 42 This is true because the average type in occupation F decreases with C, while the average type in occupation M increases with C.
43 Because high α implies low dispersion so the tails of the distribution are thin. 44 To see this, notice that occupation F is overpopulated when a F is very low. Since F has higher overall synergies (a M 2 d M 2 < a F 2 d F 2 ) this also implies a sufficiently low a M

Inequality and effort choices
The Pareto distribution is also appropriate to study the link between inequality of abilities and productive and socializing efforts. Since -as we explained already -the shape parameter α i is an (inverse) measure of the spread of talent, we can simply associate a general increase of inequality with a reduction of α i . One difficulty with the Pareto, though, is that reducing α i increases both mean and dispersion. To circumvent this problem, we look at the effect of a "neutralized" reduction in α j that keeps the unconditional mean of the Pareto distribution constant. 45 This way, we focus exclusively on the effect of changes in the dispersion of talent, which we associate with inequality.
Proposition 8. Suppose abilities are distributed independently and follow a Pareto law in [1, ∞) with shape parameter α j for j ∈ {M, F }. Suppose as well that the shape parameter α j of one of occupations decreases and that a is reduced to exactly compensate for the increase in the unconditional mean of squared types. 46 Then, if we hold C E or C P constant, both b M 2 and b F 2 increase, and thus productive and socializing effort increase in both occupations.
Proposition 8 simply states that if the dispersion of talents in one occupation increases, both occupations receive a better selection of types and hence productive and socializing effort increase in both occupations. The basic intuition is that the tails of one of the distribution is now larger and comparative advantage forces a selection mostly from the tails. 47 Clearly, the effect of inequality in talent on socializing and productive efforts emerges from the existence of spillovers within occupations. This is a 45 More specifically, as α j falls we impose an equivalent change is a to reduce effort as much as necessary to fix the unconditional mean of the Pareto distribution; which is 47 For a more analytical explanation, note that the expression for b F 2 can be written as: Observe that if α M decreases, the amount of mass on the tail of the distribution increases. In this way, the weight given to larger values of b F increased by a (now larger) factor The effect of a decrease of α F is more direct, as it increases f F b F for larger values of b F . But of course, we are compensating for the direct increase by reducing a. But the key difference in the conditional expectation is that the F Cb F ∞ 1 f b F F Cb F db F term, now unchanged, gives more weight to changes that occur for higher values of b F . novel empirical implication of our model that stands as a challenge for future empirical work.
Empirical Implication 10. An increase in inequality in talent leads to higher production and more socialization in all occupations.
Admittedly, assuming that productivity parameters are independent across occupations requires some degree of heroism. A natural question is whether our results hinge on this assumption. We find it reassuring that our propositions 6 and 8 for the Pareto distribution are robust to the following correlation structure: with probability p the two values of b j i for j ∈ {F, M } are independent of one another. With probability (1 − p), they coincide, namely b M i = b F i and they are distributed with shape parameter α F . Indeed, both propositions 6, and 8 are proved under this assumption which includes the independence assumption for p = 1. 48

Conclusion
In this paper, we study a model that integrates productive and socializing efforts with occupational choice. Socializing allows for capturing informational spillovers between individuals. We show that the existence of spillovers leads to some interesting implications. It causes more talented individuals to work harder, generating bigger positive externalities within their occupation, but they also have incentives to segregate. We also show that average socializing increases in average group productivity and in network synergies. Also, any increase in within occupation synergies or improvement in final abilities for an occupation causes more people to choose this occupation no matter how abilities in the different occupations are distributed. This result provides interesting testable implications on how sector sizes should vary, for example, after the introduction of new communication technologies, which may be adopted differentially across sectors. Another interesting implication of endogenous spillovers is that a higher inequality of abilities in one occupation imply more socialization and productive efforts in both occupations. This is something that would not happen in a world without spillovers within occupations. Our framework can be applied to investigate a range of different contexts, where individuals choose which group to belong to and then decide how much to invest in the productive and socializing efforts. Education choices share many features with the case we studied in this paper. We intend to use this model in future work to study the demand and supply for different subjects and skills (e.g. high-level Science, Technology, Engineering, and Math). Interestingly, the case of education choices requires endogenizing ability. In , we take a first step in this direction, where parents can initially invest in their children's abilities. We show that parental educational investment can mitigate or reinforce distributional inefficiencies. Our model could also be useful to study residential choice, as the benefits of living in a community often depend on social interactions within them. The choices of leisure activities are another potential fruitful avenue of application of our ideas. A more intriguing area for the development of this kind of model refers to aspects more connected to an individual's identity. The national, religious, or ethnic identification of a person is sometimes a matter of choice, and is connected to the decisions of others. For example, whether a person feels she is European, British or Welsh, and to which degree, could be influenced by her efforts and those of others in pursuit of their own identity. We think that our contribution is an important step towards understanding the determinants and effects of socializing.
One possible avenue for further research would be to explore the dynamic implications of our model. The agents' choices in our framework are static, but the work on homophily shows that some fruitful insights can be obtained from dynamic models of group formation. For example, Bramoullé, Currarini, Jackson, Pin, and Rogers (2012) show that it is only for young individuals that homophily-based contact search biases the type distribution of contacts. 49 Hence in the long-term groups need not be type-biased. We could extend our model to allow for participation in more than one occupation over time and thus ascertain if biases in occupational choice persist over time. Clearly, another extension would be to allow some spillovers between groups and partial participation of agents in several of them. We could also allow for horizontal preferences over occupations which are not necessarily related to individual productivity and for correlated productivities across occupations. 49 Another example of the interaction of homophily and dynamics is Golub and Jackson (2012), which shows that homophily induces a lower speed of social learning (the opinions of others like me are likely to be similar to my own).

A Online appendices
A.1 A microfoundation for the g n ij function Lemma 1. Suppose that, for all s = 0, the link intensity satisfies the following assumptions 50 : (A1) Symmetry: g n ij (s n i , s n j ) = g n ji (s n j , s n i ), for all i, j, n; (A2) The total interaction intensity of individual i in group n exhibits constant returns to scale to overall inputs in socializing efforts and symmetry: (A3) Anonymous socializing: g n ij (s n i , s n j )/ s n j 1/2 = g n ki (s n k , s n i )/ (s n k ) 1/2 , for all i, j, k; Lemma 2. Then, the link intensity is given by Proof of Lemma 1: Fix s. Combining (A1) and (A3) gives (s n k ) 1/2 g n ij (s n i , s n j ) = s n j 1/2 g n ij (s n i , s n k ).
Integrating across all j's and using (A2) gives g n ij (s n i , s n k ) = 1 N n (s n i ) 1/2 (s n k ) 1/2 . Notice that given (A2) and a level of socializing effort for all members of the group, total socializing of an individual in a group j∈N i g n ij (s n i , s n j )dj is independent of the size of the group. In other words, individuals will not have more contacts in larger occupational groups if everyone in the same occupation chooses the same s n i independent of size. One could easily accommodate other assumptions, where socializing is either easier or more difficult in larger groups by using 1/ (N n ) β for some β different from 1.

A.2 Proof of Propositions 1 and 2
The FOC for the decentralized problem are 50 While Cabrales, Calvó-Armengol, and Zenou (2011) also model symmetric and anonymous socializing, which is the key for generic socializing, they assume that link intensity satisfies aggregate constant returns to scale.
while the FOC for the social planner simplify to We first prove that k i s i = k j s j for all i and j. We divide (14) by (15) to get where bold face letters denote vectors and Rearranging (18) gives from which it is immediate that for some K (.) with a unique solution. To see the uniqueness notice that letting 19) can be written as the left hand side of (20) is a convex function taking the value 0 when x i = 0 and the right hand side it is a linear and takes the positive value d a 2 K (b, k, sp) when x i = 0. Hence there is a single crossing point at the positive orthant. Hence Thus it is clear we can write An analogous proof establishes that also for the centralized problem It remains to determine the common optimal group parameters.
k sp s sp for the centralized problem. Suppressing the dependence on the vectors, we get two simultaneous equations with two unknowns, namely for the decentralized problem and for the centralized problem. The optimal investments follow immediately from solving this system of linear equations. Assuming ad 2 b 2 2 < 1 guarantees positive investment levels.
Introducing the optimal investment levels into the utility functions gives us for the decentralized solution and for the centralized solution.

A.3 Proof of Proposition 3
Let the subsidy be k sp then the resulting utility with this subsidy is This leads to first order conditions letting k i = b i k sp and s i = b i s sp , we get Clearly the system (25) , (26) is the same as (21) , (22) and thus (23) , (24) also solves the same system as (16),(17) and the result follows.

A.4 Proof of Proposition 4
We first establish existence and a useful technical result for the rest of the proposition.
Lemma 3. For any underlying distribution of abilities, if assumption 1 is satisfied, there exist mappings f (C) and g(C) such that a zero of the mappings f (C) and g(C) is an equilibrium of, respectively, the decentralized and centralized problems. Furthermore an equilibrium always exists, and in any stable equilibrium, ∂f (C) ∂C < 0 and ∂g(C) ∂C < 0.
Proof. Under the decentralized solution, individuals choose to become an employee (group M ) if and only if If the dividing line exists, its slope is defined when the expressions on either side of the inequality in (27) are equal. In other words, the dividing line is defined by the following expression: Hence C P is the fixed point of the mapping where the right hand of (28) depends on C P through b M 2 and b F 2 , which are defined by equations (12) and (13) respectively. Put differently, C P is implicitly defined by a zero of the mapping If s sp and k sp are induced by the social planner (say via subsidies), people would choose to become an employee (group M ) if and only if u sp and the dividing line, should it exist, would solve and is implicitly defined by a zero of the mapping Define so that g (C, ·) = g A (C)−C 2 . Then, given that we assume that sup C ad 2 b 2 2 < 1 g(0, ·) > 0 Then, note that the assumption sup C ad 2 b 2 2 < 1 means that b 2 is bounded above, so the numerator of the function g A (C) is bounded above by 1 Similarly the denominator of g A (C), is bounded below by This means that for all C which implies that if we define C g as we have that for all C > C g g (C, ·) < 0 and thus by the mean value theorem there exists a value C * ∈ 0, C g such that g (C * , ·) = 0.
Similarly, let Then, given that we assume that sup C ad 2 b 2 The assumption sup C ad 2 b 2 2 < 1 means that b 2 is bounded above, so the nu- Similarly the denominator of f A (C), is bounded below by This means that for all C we have that for all C > C f f (C, ·) < 0 and thus by the mean value theorem there exists a value C * ∈ 0, C f such that f (C * , ·) = 0.
For stability, note that if g (C) > 0 we would have and thus for an individual with b M i = Cb F i would not be indifferent between group M and F but would prefer to move to group F so that C would present a tendency to increase. This leads us to postulate a natural tatônnement-like adjustment dynamic where R (.) is an increasing function that is positive if and only if g (C, ·) is positive. It is then easy to see that in any stable equilibrium C * * , g (C, ·) has to be decreasing at C * * as otherwise, a small increase or decrease from C * * will push the dynamics away from the equilibrium. An analogous argument proves the result for f (C, ·) .
By Lemma 3, ∂f (C)/∂C < 0 and ∂g(C)/∂C < 0, hence to establish com-parative static results, one only needs to check the sign of the derivatives of the functions defining C P and C E with respect to the underlying parameters a n and d n for n ∈ {M, F }. So using the implicit function theorem, we only need to check how the functions f (.) and g (.) vary directly with a M , a F , d M and d F to calculate how C changes with those underlying parameters. We start by looking at changes in a M ∂f (C, ·) If the synergies of the M -group become more important, C decreases, thus more people join the M -group. We now show that the opposite happens when synergies in the F -group increase.
Now we look at changes in d M .
If d M increases fewer people join the F −group.
Finally we want to understand how the dividing line is affected by changes in d F .

A.5 Proof of Proposition 5
We first establish Lemma 4. Any C ∈ [0, ∞) can be obtained in equilibrium using a linear tax/subsidy on output.
Proof. We characterize the optimal choices under a linear tax/subsidy on output. The FOC for the decentralized problem are We first prove that k i s i = k j s j for all i and j. We divide (33) by (34) to get where bold face letters denote vectors and Rearranging (35) gives from which it is immediate that for some K (.) with a unique solution. To see the uniqueness notice that letting the left hand side of (37) is a convex function taking the value 0 when x i = 0 and the right hand side it is a linear and takes the positive value d a 2 K (b, k, sp) when x i = 0. Hence there is a single crossing point at the positive orthant. Hence Thus it is clear we can write We now determine the common optimal group parameters.
Suppressing the dependence on the vectors, we get two simultaneous equations with two unknowns, namely Assuming ad 2 tb 2 2 < 1 guarantees positive investment levels. Introducing the optimal investment levels into the utility functions gives us for the decentralized solution.
Similarly for the centralized solution the first order conditions are and using analogous arguments as before we have that using k i = b i k sp and s i = b i s sp the expressions (39) and (40) can be written as This leads, after some manipulations, as before, to From expression (38) we get that the equation that defines C P implicitly is and from expression (41) we get that the equation that defines C P implicitly is From expression (43) and (42) we get that lim t F t M →0 C E = lim t F t M →0 C P = 0 and lim t M t F →0 C E = lim t M t F →0 C P = ∞. This, plus continuity of C E and C P as a function of t F , t M establishes that one can obtain any value of C E and C P between 0 and ∞ by appropriately varying t F t M .
Having established Lemma 4 we now proceed with the remainder of the proof of Proposition 5 Given the tax and subsidy scheme proposed we can write the utility of the agent as The FOC for the decentralized problem for all i are: Letting k i = b i k sp and s i = b i s sp , we have Notice that the expressions (44) and (45) are identical to (39) and (40) Hence we will have that The remainder of the proof follows from Lemma 4.

A.6 Proof of Proposition 6
We assume now that returns b follow a Pareto distribution with shape pa- We will prove a slightly more general statement of the proposition, allowing for the following correlation structure. With probability p the two values of b j i for j ∈ {F, M } are independent of one another. With probability (1 − p), b M i = b F i and they are distributed with shape parameter α F . We will derive the results under the assumption that the C that defines the dividing line b M i = Cb F i is such that C ≥ 1. 51 Existence follows from 51 If C < 1, the same results hold with the names of the networks interchanged.
Proposition 3. We now calculate b F 2 and b M 2 for C ≥ 1. Note that in the correlated part, b M i = b F i implies that b M i < Cb F i and thus if one player gets a correlated draw she forms part of the F network).
We first prove uniqueness of C E defined by Note that the LHS of (50) is increasing in C E so all we need to show is the RHS is decreasing in C E so that a unique equilibrium exists. Clearly the numerator of the RHS is decreasing in C E because b M 2 is increasing in C E . Since b F 2 is decreasing in C E , the denominator of the RHS is increasing in C E . And thus the result follows. We now prove uniqueness of C P which is defined by Again note that the LHS of (51) is increasing in C P so all we need to show is the RHS is decreasing in C P so that a unique equilibrium exists. It is again easy As a result RHS of (51) is decreasing in C P and the result follows.

A.7 Proof of Proposition 7
We will prove the proposition for α F = α M . . We will first show that Note also that if a M 2 d M 2 = a F 2 d F 2 the solution of (50) is at C E = 1. An increase of a M 2 d M 2 with respect to a F 2 d F 2 displaces the RHS to the left so that the new equilibrium entails C E < 1.
We will now show that for C E > 1 there might be too few ∂w(C) ∂C C=C E > 0 or too many people ∂w(C) ∂C C=C E < 0 in the F group compared to the social optimum. 52 The F group will be underpopulated if and only if We will check how a decentralized group choice deviates from the efficient group choice C sp implemented by a social planner who maximizes social welfare. We study the case where the social planner also implements the socially optimal investments in productive and socializing effort.
The social planner would choose C to maximize social welfare with socially optimal investments in productive and socializing efforts where social welfare is given by Note that ((α − 1) (2C α − 1) + 1) 2 < α 2 (2C α − 1) 2 since that expression is equivalent to (α − 1) (2C α − 1) + 1 < α (2C α − 1) where the last two inequalities hold since C > 1, noting that in that case 2C α − 1 > C α . Thus equation (54) establishes (53). The next two lemmas establish that overpopulation can occur in both sectors and depends on the underlying parameters. Lemma 6 shows the existence of parameter values that ∂w(C) ∂C C=C E < 0 while Lemma 7 shows the existence of parameter values that ∂w(C) ∂C C=C E > 0.
Lemma 7. Let a M 2 d M 2 a F 2 d F 2 = r < 1. For a fixed a F 2 and r such that C E exists, there is an α high enough that which is true for example if r > 1 2 . Proposition 7 immediately follows from these Lemmas.

A.8 Proof of Proposition 8
We will prove this Proposition for a slightly more general case, using the same correlation structure as in the proof of Proposition. 6. With probability p the two values of b j i for j ∈ {F, M } are independent of one another. With probability (1 − p), b M i = b F i and they are distributed with shape parameter α F . With this correlation structure b F 2 and b M 2 have been calculated by (47) and (48) respectively, as We now normalize b M 2 by the expected second moment α M (α M −2) . Hence Clearly, this is decreasing in α F and α M . We normalize b F 2 by the expected second moment α F (α F −2) .Hence We will now show that is increasing in α F and given that ∂b F 2 N ORM /∂α F < 0 is equivalent to expres-sion (56) the result follows. Then Clearly this is true as α F / (α F + α M ) 2 , α F / (α F + α M ) and 1/ C α M − α F α F +α M all increase in α F .