Dynamic Persuasion With Outside Information

A principal seeks to persuade an agent to accept an offer of uncertain value before a deadline expires. The principal can generate information, but exerts no control over exogenous outside information. The combined effect of the deadline and outside information creates incentives for the principal to keep uncertainty high in the first periods so as to persuade the agent close to the deadline. We characterize the equilibrium, compare it to the single-player decision problem in which exogenous outside information is the agent's only source of information, and examine the welfare implications of our analysis.

1 Introduction lows. Each period t, information that the agent expects to obtain -from future signals and experiments-determines a cutoff belief above which the agent chooses to accept; we call it the agent's period-t threshold of acceptance. 2 The closer the deadline the smaller the amount of useful information the agent expects to obtain in the future. Hence, the threshold of acceptance decreases over time. This in turn creates incentives for the principal to keep uncertainty high in the first periods with a view to persuading the agent closer to the deadline. 3 We call this mechanism the deadline effect. Yet, in order to try persuading the agent close to the deadline the principal must let the agent observe exogenous signals. The caveat is that favorable signal realizations could lead the belief to "overshoot" the agent's period-1 threshold of acceptance. The greater the overshooting the fewer mistakes the principal can induce the agent to make in periods ahead, thus incentivizing the principal to try persuading the agent in the first period. We call this mechanism the overshooting effect. Which one of the deadline and overshooting effects dominates the other pins down the principal's choice of experiment in the first period and, via this choice, whether the agent sometimes waits before making his final decision.
We examine two types of signals: "perfect good news" and "perfect bad news". Under perfect good news (respectively, bad news) the state in which accepting the offer is optimal (resp. suboptimal) for the agent is perfectly revealed with positive probability each period. With perfect good news, or if the deadline is sufficiently far into the future, the overshooting effect is the dominant force. The principal then generates a sufficient amount of information in the first period to induce the agent to make an immediate final decision. However, under perfect bad news, if signal accuracy is intermediate and the deadline is not too far into the future, the deadline effect then becomes the dominant force. In this case the agent might wait up to T periods before making his final decision.
Welfare hinges on the equilibrium strategy of the principal. In one regime, the principal generates information so as to induce the agent to act in the first period; we say in this case that the principal is aggressive. In the other regime, the principal generates less information, and in the first periods seeks to sustain uncertainty so as to persuade the agent closer to the deadline; we say in this case that the principal is conservative. Pareto efficiency obtains if and only if the principal is aggressive. Furthermore, as long as no regime switch occurs, the agent's equilibrium expected payoff as well as the quality of the final decision are monotonically increasing in the amount of exogenous outside information, be it in the form of more accurate outside information, or a deadline further away in time (allowing the agent to observe more exogenous signals). However, any regime switch from aggressive to conservative causes the agent's welfare and the quality of the final decision to drop, and vice versa.
Our analysis reveals a rich interplay between inside information and exogenous outside information, that contrasts sharply with settings in which exogenous outside information is the agent's only source of information (as in Wald (1947), for example). For instance, in our setting, extending the deadline can accelerate the agent's final decision. The reason is that pushing the deadline further away in time increases the amount of information generated by the principal.
The rest of the paper is organized as follows. The related literature is discussed below. The model is presented in Section 2. The core of the analysis is in Section 3. Several extensions of the model are examined in Section 4. Section 5 concludes.
Related Literature. We contribute to the literature on Bayesian persuasion by introducing outside information in the canonical framework of Kamenica and Gentzkow (2011), that is, by relaxing the assumption that the sender (or principal) fully controls the flow of information to the receiver (or agent). This approach connects our work to two active strands of research.
A first strand of research examines the case in which multiple senders compete to persuade the agent. This includes Gentzkow and Kamenica (2016), Li and Norman (2018) and Board and Lu (2018). The models and applications are different from ours: we study situations in which a single principal designs multiple experiments over time whereas these papers examine situations in which multiple principals design one experiment each.
The second strand of research focuses like we do on the dynamic persuasion of an agent, and begins with Au (2015) and Honryo (2018). The contemporaneous work of Orlov, Skrzypacz and Zryumov (2020) is the study most related to ours. In their model, an evolving state affects the principal's and the agent's payoff from exercising an option. This process is exogenous, and creates an incentive for the agent to wait. However, the payoffs also depend on a second state. The principal controls the flow of information concerning the second state, but the evolution of the first state is publicly observable. The environment is stationary; in particular, there is no deadline by which the agent must act. In our model, in the absence of a deadline, the agent's threshold of acceptance is the same in all periods. This means, in turn, that the principal generates information inducing the agent to make a final decision in the first period.
In equilibrium, the agent therefore never waits. In Orlov et al. (2020), by contrast, waiting can be socially optimal, since the principal is unable to generate information about one of the two states. The key tradeoffs in the two papers are thus different. While several other papers examine the dynamic persuasion of an agent, including Henry and Ottaviani (2019), Che, Kim and Mierendorff (2020), Ely and Szydlowski (2020), Smolin (2020), and Zhao, Renou and Tomala (2020), the focus in all of them is different from ours since in these models the principal fully controls the flow of information.
A few additional papers are related to specific aspects of our work. Our finding that the agent's equilibrium expected payoff is a non-monotonic function of signal accuracy is linked in spirit to a related result in Kolotilin (2018). Gratton, Holden and Kolotilin (2017) examine the problem of a principal deciding when to start a public flow of information about her type and is one of very few papers which, like us, analyze the role of deadlines in contexts of persuasion. The decision problem of the agent naturally links our analysis to the literature on experimentation starting with Rothschild (1974), Bolton and Harris (1999), and Keller, Rady and Cripps (2005). However, whereas we study the interplay of inside and outside information, there is no inside information in that literature.

Model
A principal ("she") and an agent ("he") interact over T ≥ 2 periods. We refer to the final period as the deadline of our game. The agent has to choose between accepting and rejecting an offer that the principal would like him to accept. By rejecting, the agent secures a (undiscounted) payoff V R > 0; accepting yields him V ω , where ω ∈ {G, B} represents an unknown state of the world. To make the model interesting, V G > V R > V B . In order to learn about the realized state, the agent can postpone making his final decision (accept or reject) until t = T . Both outside and inside information is observed over time: the former is exogenous whereas the latter is strategically generated by the principal. All information being public, the players share common beliefs about the state. The (evolving) probability assigned to ω = G will be referred to as the belief. The payoff of the principal is 0 in case of rejection, and is normalized to 1 in case of acceptance. Both players discount time at rate δ ∈ (0, 1

Figure 1
Timing. The state of the world is drawn by nature according to P(ω = G) = p 1 . We suppose for expository purposes that the agent initially leans towards rejection, that is, denotes the belief at which the agent is indifferent between accepting and rejecting. The principal designs an experiment inducing the end-of-period-1 belief q 1 . The agent then chooses between accept, reject and wait. If the agent makes a final decision, payoffs are realized; if the agent waits, the exogenous signal s 1 is observed, inducing the beginningof-period-2 belief p 2 . This sequence repeats until the agent makes a final decision, with the caveat that, at t = T , the agent has to make a final decision. Figure 1 summarizes the timeline; the broken arrow between the second and third node indicates that the game may terminate at the second node.
Inside Information. The principal's experiment in period t is a probability distribution τ t ∈ ∆([0, 1]) governing the end-of-period-t belief q t ; the only constraint imposed on each experiment is Bayes plausibility: E τt [q t ] = p t . The support M t of τ t therefore uniquely determines this experiment as long as |M t | ≤ 2. It will be convenient, whenever possible, to use M t in order to represent τ t .
Outside Information. The signal in period t, denoted s t , is drawn from the conditional probability distribution π(· | ω) over {g, b}. The signal-generating process is assumed i.i.d. across time periods. As is common in the literature on strategic experimentation, 5 we focus for tractability on conclusive signal-generating processes. Under perfect bad news, π(b | B) = γ and π(g | G) = 1. In this case s t = b informs players that ω = B, whereas the belief drifts upwards as long as s t = g. By contrast, under perfect good news, π(b | B) = 1 and π(g | G) = γ. The signal realization g then informs players that ω = G, whereas the belief drifts downwards as long as s t = b. The parameter γ ∈ [0, 1] capturing the informativeness of the signal-generating process will be referred to as the signal accuracy.

Strategies and Equilibrium.
A t-history consists of experiments, end-of-period beliefs and signal realizations for the first t − 1 periods, that is {τ k , q k , s k } t−1 k=1 ; an augmented t-history contains in addition the experiment τ t and the belief q t . A strategy for the principal maps each t-history to an experiment τ t . A strategy for the agent maps each augmented t-history to a decision in {accept, reject, wait} for t < T , and to a decision in {accept, reject} for t = T . The equilibrium concept is Perfect Bayesian Equilibrium (PBE): the player at each decision node maximizes her/his expected payoff conditional on (a) the other player's strategy and (b) the belief obtained using Bayes' rule.

Analysis
Subsection 3.1 characterizes the equilibrium of our game. A general discussion of the main theorem is provided in Subsection 3.2, and a sketch of its proof is presented in Subsection 3.3. All omitted proofs of this section are in Appendices A, B and C.

Main Result
As usual in models of Bayesian persuasion, equilibrium multiplicity arises from the fact that, for a subset of beliefs, several experiments ultimately induce identical outcomes. We thus focus throughout the paper on PBE such that: (i) whenever the principal is indifferent between experiments ordered according to Blackwell's criterion, she chooses the least informative experiment; (ii) whenever indifferent, the agent makes the decision preferred by the principal. These refinements simplify the exposition, but are inessential for our results. The first deals with the kind of multiplicity mentioned above; 6 the second rules out inconsequential multiplicity off the equilibrium path. 7 Henceforth, PBE satisfying (i) and (ii) will be referred to as equilibria for short. Proposition 1. There exists a unique equilibrium. 6 For instance, imagine that in a given period the agent accepts for q t in an interval [x, y]. Then, for p t ∈ (x, y), the principal is indifferent between designing the uninformative experiment or M t = {x, y}. In this case, we assume that the principal chooses the uninformative experiment.
7 For instance, imagine that, in a given period, at q t = x the agent is indifferent between rejecting and waiting, but that irrespective of whether the agent does one or the other, any period-t experiment with x in its support is strictly dominated for the principal by some other experiment. Then what the agent does at q t = x is inconsequential, as q t = x never occurs on the equilibrium path. In this case, we assume that the agent waits at q t = x.
We henceforth refer to the threshold b at which the agent is indifferent between accepting and rejecting as the static threshold of acceptance. Note that b is independent of the signalgenerating process. At t = T , the agent accepts if q T ≥ b and rejects otherwise. At t < T however, information which the agent expects to obtain in periods ahead (from the experiments and from the signals) determines an interval of beliefs at which the agent chooses to wait.
Lemma 1. Each period, cutoffs 0 < a t ≤ b t < 1 exist such that, in equilibrium the agent rejects We henceforth refer to the cutoff b t as the agent's period-t threshold of acceptance. As information which the agent expects to obtain can only increase his incentive to wait, b t ≥ b regardless of the period t. One shows more generally that the agent's threshold of acceptance decreases with t.
Lemma 2. The agent's period-t threshold of acceptance b t decreases with t.
We turn next to the principal. Each period the principal can either try to persuade the agent immediately or aim to keep uncertainty high (i.e. aim for q t ∈ [a t , b t )) so as to try persuading the agent in a future period. The optimal choice of the principal is illustrated in Figure 2. In both panels the gray solid curve represents the principal's equilibrium continuation payoffs given the end-of-period-t belief q t . 8 The black dashed curve depicts the concavification of the former curve (Aumann, Maschler and Stearns, 1995), and represents the principal's equilibrium continuation payoffs given the beginning-of-period-t belief p t . The case in which the principal optimally tries to persuade the agent in period t is depicted in Panel I. In this case, {p t } otherwise, and we say that the principal is aggressive in period t. The case in which the principal optimally keeps uncertainty high so as to try persuading the agent in a future period is depicted in Panel II. In this case a t < b t and We then say that the principal is conservative in period t. The experiments described in the previous paragraph are the only experiments ever designed by the principal in equilibrium.
Lemma 3. Each period, in equilibrium, either the principal is aggressive, or the principal is conservative.
The following theorem is the central result of our analysis.
The perfect bad news case is illustrated in Figure 3. 9 For example, each parameter pair (δ, γ) that belongs to the vertically dashed region of the figure is such that in equilibrium the principal is conservative at t = 1 if either T = 2 or T = 3, whereas the principal is aggressive whenever T ≥ 4. The rest of this section is organized as follows. We discuss below the key 9 The figure is drawn for V G = 2, V R = 1 and V B = 0. The code is available on the authors' website. tension at the heart of our model and how this tension explains Theorem 1. In Subsection 3.2, we link Theorem 1 to the welfare properties of the equilibrium. We also examine the impact of information supplied by the principal's experiments, by contrasting our model and results with the benchmark setting in which exogenous outside information is the agent's only source of information. A sketch of the Proof of Theorem 1 is provided in the final subsection.
Information which the agent expects to obtain determines each period the agent's threshold of acceptance b t . The lower b t , the more mistakes the principal can induce the agent to make. Thus, if b 1 > b T , the principal is incentivized to maintain enough uncertainty in the first periods in order to try persuading the agent at t = T . We refer to this as the deadline effect. The caveat is the following: to persuade the agent in period T , the principal must let the agent observe T − 1 exogenous signals. However, if g signal realizations are sufficiently conclusive, letting the agent observe T − 1 exogenous signals may lead the belief to "overshoot" the agent's period-1 threshold of acceptance, as illustrated in Figure 4. 10 We refer to this as the overshooting effect. The greater the overshooting the fewer (future) mistakes the principal can induce the agent to make. For the principal, the overshooting effect thus creates countervailing incentives relative to the deadline effect. 10 In Panel I, to prevent the overshooting, the principal must generate information inducing the agent to accept in the first period with positive probability. In Panel II, the principal has more freedom, but to avert overshooting the principal must generate information inducing the agent to accept with positive probability before t = T .
Which effect dominates the other pins down the principal's choice of experiment at t = 1. 11 Roughly, with perfect good news the overshooting effect dominates the deadline effect because s 1 = g then induces p 2 = 1. 12 By contrast, with perfect bad news, the deadline effect can dominate the overshooting effect (provided γ and T are sufficiently small, so as to avoid the scenarios illustrated in, respectively, panel I and panel II of Figure 4). To see that neither γ nor δ can be too small for this mechanism to work, observe that γ ≈ 0 and δ ≈ 0 both imply b 1 ≈ b = b T , in which case the deadline effect becomes vanishingly small: for small γ, this is because the agent does not expect to obtain much information by waiting; for small δ, this is because the agent does not value future information much.

Discussion
Pareto efficiency. When is the equilibrium Pareto efficient, and when is it not? In our model, Pareto efficiency obtains if and only if the agent (a) accepts with probability 1 conditional on state G and (b) makes his final decision at t = 1 with probability 1. 13 By Lemmata 1 and 3, condition (a) is always satisfied in equilibrium. 14 The key question then is whether in equilibrium the agent's final decision occurs at t = 1 with probability 1. Notice that if in equilibrium the principal is aggressive at t = 1 then M 1 = {0, b 1 }, and so (b) holds in this case (by Lemma 1). On the other hand, if at t = 1 the principal is conservative in equilibrium then Either way, q 1 = a 1 with positive probability, 11 To be sure, the principal's time discounting provides her with an additional incentive to try persuading the agent immediately. However, the overshooting effect alone can provide sufficient incentives for the principal to be aggressive at t = 1.
12 Moreover, we show in Appendix B, Proposition B.1, that the deadline effect is weaker in the perfect good news case than in the perfect bad news case, in the sense that the difference b 1 − b T is smaller in the former case than in the latter.
13 See Proposition C.1 in Appendix C. 14 Lemma 1 ensures that in equilibrium the agent rejects in period t if and only if q t ∈ [0, a t ). Lemma 3 ensures that in equilibrium q t / ∈ (0, a t ). Thus if the agent rejects in period t, it must be the case that q t = 0. and so in this case (b) does not hold. We conclude from Lemma 3 and the previous remarks that, in equilibrium, Pareto efficiency obtains if and only if the principal is aggressive at t = 1. Theorem 1 thus pins down the conditions under which the equilibrium satisfies Pareto efficiency.
Comparison with the single-player setting. Our main theorem also offers interesting contrasts with the corresponding single-player setting in which exogenous signals are the agent's only source of information (as in Wald (1947)). First, in the single-player setting, if the agent waits given a certain amount of outside information then the agent also waits for all greater amounts of outside information (greater T , greater γ, or both). In our model on the other hand, pushing the deadline further away in time can increase the amount of information generated by the principal, and thereby cause the agent to make his final decision earlier on. By the same token, increasing γ may accelerate the agent's final decision with probability 1.
For example, at point A in Figure 3, in equilibrium the agent sometimes waits if T = 2, but never waits if T = 3; similarly, if T = 2, the agent sometimes waits at point A but never waits at point B, albeit γ B > γ A (and δ B = δ A ). Second, whereas in the single-player setting increasing the amount of outside information always improves the expected quality of the agent's final decision and raises the agent's expected payoff, in our model increasing γ may increase the probability of type II errors made and lower the agent's expected payoff. 15 The reason is as follows. By switching from aggressive to conservative, the principal causes delay in the agent's decision to accept. Since the principal discounts time, she must then be compensated by a higher probability of acceptance. But we earlier pointed out that in equilibrium the agent accepts with probability 1 conditional on state G. So the higher probability of acceptance must be coming from state B. In consequence, any change of parameters leading the principal to switch from aggressive to conservative can induce a higher probability of type II errors and a reduction of the agent's expected payoff. In Figure 3 for example, in equilibrium the expected quality of the agent's final decision and the agent's expected payoff are higher at point C than at point A, though γ A > γ C (and δ A = δ C ).

Sketch of the Proof of Theorem 1
We present here the main steps of the proof of Theorem 1. Readers uninterested in the technical details may skip this subsection.
In equilibrium the principal ensures that the agent makes no type I error. However, the principal would like to maximize the number of type II errors. The first part of Theorem 1 is founded upon the observation that with perfect good news, in equilibrium, making the agent wait induces him to base his final decision on more information (in Blackwell's sense) than if the principal were aggressive and optimally triggered the agent's final decision at t = 1. The principal therefore chooses to be aggressive in the first period.
Lemma 4. With perfect good news, in equilibrium the principal is aggressive at t = 1.
Proof: Consider a period t < T such that, in equilibrium, the principal is aggressive in period t + 1. Notice that the latter requirement is satisfied if t = T − 1. Observe as well that given q t = b t , the belief p t +1 induced by s t = b has to be strictly smaller than b t +1 ; if this were not the case then, by Lemma 1, at q t = b t the agent would prefer accepting to waiting, contradicting the definition of b t . Straightforward algebra then establishes . (1) Next, we claim that for all z ∈ [a t , b t ), in equilibrium, given p t = z the principal is strictly better off designing the experiment M t = {0, b t } than the uninformative experiment. This in turn will imply that, in equilibrium, the principal is aggressive in period t and, by induction, also at t = 1.
We now prove the previous claim. Let X denote the random variable representing the belief at which the agent makes his final decision given p t = z and the equilibrium strategies in the continuation game, assuming that the principal designs the experiment Let Y denote the corresponding random variable assuming that the principal designs the uninformative experiment. One shows, using (1), that Y is a mean-preserving spread of X. 16 Let φ denote the piecewise linear function with a kink at min{b t , b t +1 } such that φ(0) = 0 and φ min{b t , b t +1 } = φ(1) = 1. Given the equilibrium strategies in the continuation game, the principal's expected payoff from designing the experiment M t = {0, b t } can be written as 16 Since supp(X) = {0, b t } and supp(Y ) = {0, b t +1 , 1} we only need to show that P(X = 0) < P(Y = 0). This On the other hand, as δ < 1, her expected payoff from designing the uninformative experiment is bounded from above by This concludes the proof of the claim which, in turn, by the arguments laid out in the second paragraph, concludes the proof of the theorem.
Lemma 4 establishes the first part of Theorem 1. In the rest of this subsection, the focus is on the perfect bad news case. We start by showing that information generated by the principal is such that, at q t = b t , any benefit accruing to the agent from waiting must come from information generated by the following period's exogenous signal. 17 Therefore, the agent's standard of acceptance is the same at all t < T . This, in turn, implies (by Lemma 2) that either the agent's threshold of acceptance equals the static threshold of acceptance in all periods, or the agent exhibits two thresholds of acceptance: a high threshold of acceptance prior to the deadline, which then drops to the static threshold of acceptance at t = T .
Lemma 5 shows that the agent's threshold of acceptance is the same at all t < T . Then, suppose that in some period t < T − 1 the principal knows that she will try to persuade the agent in period t + 1 (i.e. she will be aggressive in period t + 1). Since next period's threshold of acceptance is the same as this period's threshold of acceptance, in equilibrium the principal has to try persuading the agent this period. We therefore obtain the following result.
Lemma 6. Let t < T − 1. With perfect bad news, in equilibrium if the principal is aggressive in period t + 1, then the principal is also aggressive in period t.
We infer from Lemma 6 that if in a game of given length, in equilibrium the principal is aggressive in period 1, then the same must be true in all longer games. Building on Lemma 5 enables us to show in addition that, for sufficiently long games, the principal has to be aggressive in period 1. We thus obtain the following result.
We next record the conditions under which, in equilibrium, the principal is aggressive in period T − 1. 18 17 The qualification "at q t = b t " is essential here. At more pessimistic beliefs, the agent usually strictly benefits from information generated by the principal's experiments and signals two or more periods ahead. 18 Note that, by Lemma 4,T (γ, δ) = 1 in the perfect good news case.
Lemmata 7 and 8 together yield the second half of Theorem 1.

Frequent Signals
Our framework is founded upon the assumption that exogenous signals are observed at discrete points in time. This assumption is not without loss of generality. In our setting, to observe any signal the agent must incur the cost of waiting a discrete amount of time. This, in turn, assures that the agent's threshold of acceptance is always strictly below 1 (no matter the signal accuracy). To take advantage of this wedge, for γ close to 1, the principal chooses to be aggressive, thereby inducing the agent to make his final decision in the first period. In our discrete time setting, a natural question is to inquire about the impact of the frequency at which exogenous outside information is observed. In this subsection we recast our model by letting ∆ n = 1 2 n−1 capture the period length, and refer to n ∈ N * as the signal frequency; T n will denote the total number of periods until the deadline. The game length (in units of time) is thus L := T n ∆ n . The signal-generating process is such that π(b| B) = 1 − e −λ∆n , where λ ≥ 0, and π(g| G) = 1 (we focus on the perfect bad news case; it is easy to show that with perfect good news the principal is aggressive at t = 1 regardless of the signal frequency). The per-period discount factor is e −r∆n , where r > 0.
Keeping n fixed, the baseline model (Section 2) is obtained by setting γ n = 1 − e −λ∆n , and δ n = e −r∆n . Relabelling appropriately, the analysis in Section 3 shows that, irrespective of the signal frequency n, in equilibrium: the agent's final decision is made in period 1 with probability 1 if and only if one of the following conditions holds: the game is sufficiently long (L n >L n ), signals are sufficiently inaccurate (λ n < λ n ), signals are sufficiently accurate (λ n > λ n ), players are sufficiently impatient (r n > r n ).
However, a question arises regarding the model's behavior in the limit as n tends to infinity, since both γ n and δ n then tend to 0. The first effect pushes the principal to be aggressive at t = 1 (Theorem 1), while the second effect pushes the principal to be conservative (Lemma C.3). 19 The question then is whether the dichotomy between aggressive and conservative regimes that our analysis uncovered continues to exist at very high frequency: namely, if lim n→∞Ln = 0 (respectively, lim n→∞Ln = ∞) then at very high frequency the principal is aggressive (respectively, conservative) at t = 1 irrespective of the game length.
We show in the next proposition (proved in Online Appendix 1) that, provided the signals are sufficiently informative, the aforementioned dichotomy continues to exist at very high frequency. A sufficient condition is λ > ϕ(r), where This condition is equivalent to requiring sufficiently informative signals for the agent to prefer waiting at low signal frequency (n = 1) when p 1 = b.
Proposition 2. With perfect good news, the agent's final decision is made in period 1 with probability 1. Suppose λ > ϕ(r). Then, there exist N and 0 <L < L † ≤ ∞ such that, with perfect bad news, for all n > N : (i) the agent's final decision is made in period 1 with probability 1 if L > L † ; (ii) the agent's final decision is made in period 1 with probability strictly less than 1 if L <L.

Different Discount Factors
Here we let the discount rates of the two players differ. We denote the agent's discount rate δ A ∈ (0, 1), and the principal's discount rate δ P ∈ (0, 1). Our baseline model corresponds to δ P = δ A . The findings listed in Theorem 1 hold qualitatively unchanged with different discount factors, as recorded in the following proposition.
Proposition 3. In equilibrium, with perfect good news the principal is aggressive at t = 1.
The proof is in Online Appendix 2. In our baseline model, whenever players are sufficiently impatient the principal is aggressive at t = 1. Proposition 3 shows that for this result to hold, it is enough that one of the players be sufficiently impatient. If the principal is sufficiently impatient then she is aggressive regardless of the period-1 threshold of acceptance b 1 . If instead the agent is sufficiently impatient, then he does not wait, regardless of the information generated by the principal's experiment at t = 1 (a 1 = b 1 ). This, in turn, results in the principal being aggressive at t = 1.

Costly Experiments
In this section we extend the baseline model by assuming that the principal incurs a cost C > 0 for each new experiment. With costly experiments, the principal's payoff (expressed in period-1 units) from acceptance in period t can be written as Similarly, the principal's payoff from rejection in period t becomes The game with costly experiments may thus be viewed as a modified version of the baseline model in which the principal's (undiscounted) payoff is U R := Cδ 1−δ in case of rejection and U A := 1 + U R in case of acceptance. Intuitively, costly experiments add an extra incentive for the principal to generate information provoking the agent's final decision early on, since the principal now prefers early rejection over late rejection (U R > 0). One shows that Theorem 1 holds unchanged, except perhaps for the exact values of the cutoffs in the statement of the theorem. 20

Concluding Remarks
We develop a model of a principal seeking to persuade an agent to accept an offer before a deadline. Whether accepting the offer is optimal for the agent depends on an unknown state of the world. The agent can wait in order to accumulate information. That information might come from the principal (inside information), and/or exogenous signals over which the principal exerts no control (outside information). The combination of this outside information and the deadline by which the agent needs to act yields a non-stationary environment in which the agent's threshold of acceptance evolves over time, providing in turn incentives for the principal to keep uncertainty high in the first periods so as to persuade the agent close to the deadline. 21 We characterize the conditions under which in equilibrium the agent makes his final decision in the first period, those in which the agent sometimes waits until the deadline, link these results to the welfare properties of the model, and contrast our analysis with the setting in which exogenous outside information is the agent's only source of information.

Appendix A: Proof of Proposition 1
In this appendix we prove equilibrium existence and uniqueness (Proposition 1). We start with a very general result that will be used repeatedly in this and the next appendices.
Proof of Proposition A.1: Consider an arbitrary signal-generating process, with realiza- , and p i (q) := qγ Gi qγ Gi +(1−q)γ Bi . To shorten notation, in what follows we use p i to refer to p i (q t ).
The continuation game starting in period T with beginning-of-period-T belief p T is identical to the static Bayesian persuasion game of Kamenica and Gentzkow (2011). We summarize some of their main results in the following lemma.
Lemma A.1. In equilibrium, at t = T , the agent accepts if q T ≥ b and rejects otherwise. The principal designs the experiment The agent does not benefit from the period-T experiment, hence his equilibrium continuation payoff is convex in p T . 22 Lemma A.2. Let t < T . Suppose that functionsĝ t+1 (p t+1 ) andf t+1 (p t+1 ) uniquely determine the agent's (resp. the principal's) equilibrium continuation payoffs in period t + 1. Ifĝ t+1 is convex, then: 1. in equilibrium, the principal's period-t experiment and the agent's period-t decision are both uniquely determined; the former is a function of p t only and the latter is a function of q t only; 2. functionsĝ t (p t ) andf t (p t ) uniquely determine the equilibrium continuation payoffs in period t, andĝ t is convex.
Then the agent's equilibrium continuation payoff given q t can be written as 22 The equilibrium period-T experiment generates information that has no value for the agent, since rejecting is an optimal choice for q T = 0 as well as for q T = b. The agent's equilibrium continuation payoffs at the beginning of period T are thus given by the convex functionĝ T (p T ) = max{V R , Asĝ t+1 is convex by assumption, Proposition A.1 shows thatg t is convex as well. Moreover, Then (5), (6) and convexity ofg t give unique a t and b t , with a t ≤ b ≤ b t , such that Hence, in equilibrium, the agent rejects if q t < a t , waits if q t ∈ (a t , b t ), and accepts if q t > b t . Moreover, since in equilibrium whenever indifferent the agent makes the decision preferred by the principal, the agent waits if q t = a t < b t and accepts if q t = b t . Hence: Standard arguments yieldf Furthermore, since in equilibrium whenever indifferent the principal picks the least informative experiment, the principal's experiment in period t is uniquely determined by the belief p t at the beginning of period t. Lastly, letting τ t (p t ) denote the principal's equilibrium experiment Finally, sinceg t is convex, (5) shows that g t is as well which, in turn, implieŝ The properties of τ t (·) implied by (8) finish to establish thatĝ t is convex, sinceĝ t is given by (9) and g t is convex.
Proof of Proposition 1: The proposition follows from Lemmata A.1 and A.2.

Appendix B: Proof of Theorem 1
In this appendix we prove the steps leading to Theorem 1, including all lemmas of Section 3 except for Lemma 4, whose proof was kept in the text. The order in which we prove the results is as follows: Lemma 1, 5, 6, 7, 2, 3 and 8. Proof Proof: The result follows from the arguments in the text above the statement of Lemma 2.

Lemma B.2. Each period, in equilibrium
(ii) or a t < b t and: M t = {0, a t } for p t ∈ (0, a t ) and there exists Proof: Recall (8). If a t = b t (so that the set of beliefs for which in equilibrium the agent waits in period t is empty) then in equilibrium M t = {0, b t } for all p t ∈ (0, b t ). So assume a t < b t .
Observe that: (A)f t (·) (defined by (4)) is concave, (A) follows from Proposition A.1 while (B) is obtained from (7). In view of (A)-(B), either (i) in the statement of the lemma holds, or (ii) does.

Proof of Lemma 5:
If a T −1 = b T −1 = b T , the claim of the lemma is straightforward. 23 As-

in equilibrium the agent is indifferent between waiting and accepting. The agent's expected payoff from accepting is
On the other hand, using Lemmata 1 and B.1, the agent's expected payoff from waiting can be written Next, consider t < T − 1 such that b t+1 = b T −1 . Suppose q t = b t , so that, by definition, in equilibrium the agent is indifferent between waiting and accepting. The agent's expected payoff from accepting is Hence, conditional on s t = g, the agent optimally accepts in the next period. It ensues that b t solves (10) and, therefore, Proof of Lemma 6: Suppose that in equilibrium the principal is aggressive in period 1 < t + 1 < T . If a t = b t the statement of the lemma is straightforward. Assume therefore that a t < b t . By virtue of Lemma B.2, in order to establish that the principal is also aggressive in period t it is enough to show that, at p t = a t , the principal strictly prefers the experiment M t = {0, b t } over the uninformative experiment. On one hand, the principal's expected payoff from designing M t = {0, b t } is at bt . On the other hand, her expected payoff from designing the uninformative experiment is given by δE st [f t+1 (p t+1 ) | q t = a t ]. The next sequence of inequalities therefore concludes the proof: The first inequality follows from noting thatf t+1 is concave; the equality follows from the assumption that the principal is aggressive in period t + 1, and the second inequality is due to Lemma 5.
Proof of Lemma 7: Note that, in view of Lemma 6, it is enough to show that in equilibrium the principal is aggressive at t = 1 when T is sufficiently large. Next, Lemma 5 shows that any benefit to the principal from being conservative at t = 1 must come from persuading the agent to accept at t = T when ω = B. So these benefits are bounded from above by δ T −1 , which tends to 0 as T → ∞. On the other hand, as b 1 < 1, the corresponding loss to the principal is bounded away from zero since by being aggressive at t = 1 the principal obtains acceptance with strictly positive probability conditional on ω = B. We conclude that, for T sufficiently large, in equilibrium the principal is aggressive at t = 1.

Proof of Lemma 2:
In the perfect bad news case, the result follows from Lemma 5. In the perfect good news case, the result is easily obtained by induction using Lemma 4. Lemma B.3. Consider a period t < T and let a + t denote the beginning-of-period-t + 1 belief given q t = a t and s t = g. Then a + t > a t+1 .
Proof: The result is trivial if a t = b t , so suppose a t < b t . Assume by way of contradiction that a + t ≤ a t+1 . By definition of a t , in equilibrium, at q t = a t the agent is indifferent between waiting and rejecting; the corresponding remark applies to period t + 1. Therefore a + t ≤ a t+1 implies that, by waiting at q t = a t , the agent's expected continuation payoff is as if the agent rejected with probability 1 in period t + 1. As V R > 0, rejecting in period t thus yields the agent a strictly higher expected continuation payoff than waiting. This remark contradicts the definition of a t . Lemma B.4. Consider a period t < T − 1 such that, in equilibrium, in period t the principal is not aggressive. Let c + t denote the beginning-of-period-t + 1 belief given q t = c t and s t = g. With perfect bad news, c + t < b t+1 .
Proof: Suppose by way of contradiction that c + t ≥ b t+1 . Then, using Lemma 5, given p t = c t , in equilibrium the experiment M t = {0, b t } gives the principal a strictly larger expected continuation payoff than the uninformative experiment. This cannot be, since by definition of c t (Lemma B.2) the uninformative experiment has to be optimal at p t = c t . Lemma B.5. Consider a period t < T − 1 such that, in equilibrium, in period t the principal is not aggressive. With perfect bad news, c t+1 = a t+1 implies c t = a t .
Proof: Assume that the conditions in the statement of the lemma hold. Recall to begin that c t ≥ a t , by definition. Suppose by way of contradiction that c t > a t . Our goal will be to show that, given p t = c t , in equilibrium the experiment M t = {a t , b t } yields the principal strictly larger expected continuation payoff than the uninformative experiment, contradicting the definition of c t .
As a preliminary step, notice that, since in equilibrium the principal is not aggressive in period t + 1 (Lemma 6), at p t+1 = a t+1 the principal must weakly prefer the uninformative experiment over Next, Lemmata B.3 and B.4 together imply c + t ∈ (a t+1 , b t+1 ). Thus, using Lemma B.2, at p t = c t , in equilibrium the principal's expected continuation payoff from the uninformative experiment can be expressed as δE[f t+1 (X)], where X is a random variable with mean c t and support {0, a t+1 , b t+1 }, and P(X = 0) = (1 − c t )γ. Call this remark A. Now, at p t = c t , reasoning similarly as above and using b t+1 = b t (Lemma 5) establishes that the principal's expected continuation payoff from designing the experiment M t = {a t , b t } can be bounded from below (strictly) by δE[f t+1 (Y )], where Y is a random variable with mean c t and support {0, a t+1 , b t+1 }, and P(Y = 0) = bt−ct bt−at (1 − a t )γ. Call this remark B. The last step of the proof is as follows. First, straightforward algebra shows 1 − c t > bt−ct bt−at (1 − a t ). Hence, P(X = 0) > P(Y = 0). As X and Y have the same mean and are both supported on {0, a t+1 , b t+1 }, we conclude that X is a mean-preserving spread of Y . Inequality (11) then implies δE[f t+1 (X)] ≤ δE[f t+1 (Y )]. Hence, combining remarks A and B, given p t = c t , in equilibrium the experiment M t = {a t , b t } yields the principal strictly larger expected continuation payoff than the uninformative experiment, contradicting the definition of c t .
Lemma B.6. With perfect bad news, in any period such that in equilibrium the principal is not aggressive, c t = a t .
Proof: Consider t such that, in equilibrium, the principal is not aggressive in period t. Then, by Lemma 6, the principal is not aggressive in period T − 1. Moreover, using Lemma B.3, for any q T −1 ∈ [a T −1 , b T −1 ), the agent waits and: (i) accepts following s T −1 = g, (ii) rejects following s T −1 = b. Thus f T −1 is linear in q T −1 over the belief interval [a T −1 , b T −1 ). It ensues that c T −1 = a T −1 . Reasoning by induction using Lemma B.5 then establishes c t = a t .
Proof of Lemma 3: By the definitions of aggressive and conservative, Lemma 3 follows from Lemma B.2 if we can show that, in equilibrium, each period either (a) the principal is aggressive or (b) c t = a t . Lemma 4 shows that (a) always holds in the perfect good news case. In the perfect bad news case, by Lemma B.6, either (a) holds or (b) does. Lemma B.7. In equilibrium, conditional on ω = G the agent accepts with probability 1.
Proof: Lemma 1 ensures that in equilibrium the agent rejects in period t if and only if q t ∈ [0, a t ). Lemma 3 ensures that in equilibrium q t / ∈ (0, a t ). Thus if the agent rejects in period t, it must be the case that q t = 0.
Proof: The arguments in the proof of Lemma A.2 show that b T −1 > b if and only if given q T −1 = b the agent strictly prefers waiting over rejecting, that is, if and only if which, upon rearrangement, yields γ >γ(δ). Solving (10) gives (12).
Proof of Lemma 8: By Lemmata 5 and B.8, if γ ≤γ(δ) then in equilibrium a t = b t = b in every period, and so the principal is aggressive in period T − 1. Sinceγ(δ) > 1 for δ sufficiently small, we conclude that, in equilibrium, for δ small enough the principal is aggressive in period T − 1 regardless of γ. Suppose next that γ >γ(δ). Then at q T −1 = a T −1 , in equilibrium the agent is indifferent between rejecting and waiting. Hence, after rearrangement. Now, using Lemma B.2, the necessary and sufficient condition for the principal to not be aggressive in period and substituting for a T −1 and b T −1 using (12) and (13), the former inequality becomes One checks that for δ = 1 the quadratic equation in γ obtained from (14) has roots γ = 0 and γ = 1. On the other hand, for δ < 1, (14) is violated when either γ = 1 or γ =γ(δ). So (14) holds for all values of γ in between the roots of the corresponding quadratic equation. Letting γ(δ) and γ(δ) denote the real roots, the previous remarks yieldγ(δ) < γ(δ) ≤ γ(δ) < 1 and show that these roots only exist for δ > δ, where δ > 0 is defined implicitly by γ(δ) = γ(δ). Noting that, by Lemma 3, whenever the principal is not aggressive she is conservative concludes the proof.
denote the period-t threshold of acceptance under perfect good news (resp. perfect bad news).
Proof: First, notice that, under perfect bad news, b B T −1 satisfies the following fixed-point property: at q T −1 = b B T −1 the agent is indifferent between (a) accepting and (b) making his final decision next period given that next period the principal designs Applying (15)- (17), the principal's expected payoff underZ can be written as ThusZ and Z give the same expected payoff to the principal. On the other hand, the agent's expected payoff underZ can be written as where the first two terms in the final sum represent the agent's expected payoff under Z.
As (1 − p 1 )V R (1 − δt −1 )Z(accept,t|B) > 0, we find that the agent's expected payoff underZ is greater than it is under Z. Thus,Z and Z give the same expected payoff to the principal, but the agent's expected payoff is strictly greater underZ than it is under Z, contradicting the initial assumption that Z ∈ Z * . We conclude that (18) holds. Combining (15)-(18) shows that Z * ⊆ Z † . We next show that Z † ⊆ Z * . Let Z ∈ Z † . If Z / ∈ Z * , we can find Z which Pareto dominates Z. Either Z ∈ Z † or, by the first part of the proof, we can find Z ∈ Z † which Pareto dominates Z , in which case Z Pareto dominates Z, by transitivity. Hence, assume without loss of generality that Z ∈ Z † . Since both Z and Z belong to Z † we have Z (x, t|ω) = Z(x, t | ω) unless t = 1 and ω = B. But then either Z (accept, 1|B) < Z(accept, 1|B) and then the principal strictly prefers Z over Z , or Z (accept, 1|B) > Z(accept, 1|B) and then the agent strictly prefers Z over Z . Therefore, Z does not Pareto dominate Z, contradicting the definition of Z . This shows that Z ∈ Z * .

Comparison with the Single-Player Setting
We show here that increasing γ may increase the probability of type II errors made and lower the agent's expected payoff. We start with the following useful lemma.
Lemma C.1. Either a t = b t = b in all periods or the interval of beliefs at which the agent waits is constant for the first x ≤ T − 1 periods, and then strictly decreasing over the remaining periods.
Proof: In the perfect good news case, the result follows from Lemma 4. Below, we focus on the perfect bad news case. If in equilibrium the principal is aggressive at t = T − 1 then the result is a consequence of Lemmata 5, 6 and A.1. Therefore, suppose henceforth that the principal is conservative at t = T − 1.
Notice to begin with that by Lemma 5 all we need to show is that the sequence a t increases with t. Reasoning as in Lemma B.4 establishes that a + T −1 ∈ (a T −1 , b T −1 ). Sinceĝ T −1 >ĝ T over the belief interval (a T −1 , b T −1 ), we obtain, using definition (3), We conclude from the arguments in the proof of Lemma A.2 that a T −2 < a T −1 . Now, if the principal is aggressive in period T − 2 then Lemmata 5 and 6, immediately give a 1 = · · · = a T −3 < a T −2 < a T −1 . So suppose that the principal is conservative in period T − 2. Since a T −2 < a T −1 and b T −2 = b T −1 , Lemmata B.2 and B.6 establish thatĝ T −2 >ĝ T −1 over the belief interval (a T −2 , b T −2 ). Moreover, a + T −2 ∈ (a T −2 , b T −2 ). Hence: We conclude from the arguments in the proof of Lemma A.2 that a T −3 < a T −2 . Pursuing the recursion completes the proof.
Section 3 revealed the existence of two possible equilibrium regimes. In one regime, the principal is aggressive at t = 1, and triggers the agent's final decision in the first period. In the other regime, the principal is conservative, and seeks to sustain uncertainty until t = T . We next establish that, as long as no regime switch occurs, increasing the amount of exogenous outside information weakly increases the welfare of the agent. ĝ T −1 (·) ≥ĝ T −1 (·). If T = 2 then Lemma C.1 finishes the proof. Otherwise, The first inequality follows from convexity ofĝ T −1 (·) and the fact that, since γ ≥ γ , s T −2 is Blackwell-more-informative than s T −2 . The second inequality follows from the previously established inequalityĝ T −1 (·) ≥ĝ T −1 (·). Hence, δg T −2 (a T −2 ) ≥ δg T −2 (a T −2 ) = V R which, in turn, implies a T −2 ≤ a T −2 and, reasoning as above,ĝ T −2 (·) ≥ĝ T −2 (·). If T = 3 then Lemma C.1 finishes the proof. Otherwise, we can repeat the last step.
Proof: Fix γ ∈ (0, 1). First, notice that for all t < T . Next, let each element of the sequence {x t } T −1 t=1 be defined implicitly as follows: Thus, x 1 < x 2 < · · · < x T −1 < b. Moreover notice that, for all t < T : Otherwise, we could find a δ sufficiently close to 1 such that given q t = a t the agent would strictly prefer waiting until the deadline over rejection (contradicting the definition of a t ). Let 1 − x 1 > > 0. Applying (20), we can find δ < 1 such that δ > δ implieŝ Noting that combining (19)-(21) yields, for δ sufficiently large: If in equilibrium the principal were aggressive in period 1 we would havef 1 (x 1 + ) = x 1 + b 1 .
Proposition C.2. With perfect good news, the agent's equilibrium expected payoff is monotonically increasing in T and γ. With perfect bad news, the agent's equilibrium expected payoff is monotonically increasing in T and, if players are sufficiently impatient, also monotonically increasing in γ; however, if players are patient enough, the agent's equilibrium expected payoff is non-monotonic in γ.
Proof: For the perfect good news case, the result follows from Lemma C.2. We show the proof of the result for the perfect bad news case. We start with three observations: • Observation 1: if in equilibrium at t = 1 the principal is aggressive thenĝ 1 is piecewise linear with a kink at b 1 ,ĝ • Observation 2: if in equilibrium at t = 1 the principal is conservative thenĝ 1 is piecewise linear with kinks at a 1 and b 1 , • Observation 3: b 1 is both non-decreasing, and continuous in γ.
Observations 1 and 2 immediately follow from the definitions of a 1 , b 1 , and the experiments designed by the principal when she is aggressive and conservative. Observation 3 follows from Lemmata 5 and B.8. Now, let T > T . We want to show that the agent's equilibrium expected payoff is at least as large in the game of length T = T as in the game of length T = T . If in equilibrium the principal is aggressive at t = 1 given T = T and given T = T , the result then follows from Lemma C.2, and similarly if in equilibrium the principal is conservative at t = 1 given both game lengths. Hence, by Lemma 6, the only case left to consider is when in equilibrium the principal is aggressive at t = 1 given T = T , but conservative at t = 1 given T = T . In the latter case, the result follows from Observations 1-2 combined with Lemma 5.
Next, let γ > γ . We first want to show that, if players are sufficiently impatient, then the agent's equilibrium expected payoff is at least as large under γ = γ than under γ = γ . We know from Lemmata 6 and 8 that, for δ < δ, in equilibrium the principal is aggressive at t = 1 regardless of γ. So the result follows from Observations 1 and 3.
Finally, we want to show that, if players are patient enough, then the agent's equilibrium expected payoff is non-monotone in γ. This result follows from Lemmata B.8 and C.3, combined with Observations 1-2 showing that an equilibrium switch from aggressive at t = 1 to conservative at t = 1 triggers a drop of the agent's equilibrium expected payoff.
The rest of this appendix considers a hypothetical planner with payoffs W aG from acceptance in state G, W rG < W aG from rejection in state G, W rB from rejection in state B, and W aB < W rB from acceptance in state B. We are interested in this planner's equilibrium expected payoff, Q. For concreteness, we henceforth refer to Q as the (equilibrium) quality of the agent's final decision. 24 The planner's welfare differs from the agent's in two ways: first, while the planner cares about errors made by the agent, the planner is indifferent about the timing of said errors; second the planner and the agent may weigh type I and type II errors differently. Notwithstanding these differences, the effect of exogenous outside information on Q mirrors its effect on the welfare of the agent (Proposition C.2). Proposition C.3. With perfect good news, the quality of the agent's final decision is monotonically increasing in T and γ. With perfect bad news, Q is monotonically increasing in T and, if players are sufficiently impatient, also monotonically increasing in γ. However, if players are patient enough, then Q is non-monotonic in γ.

Proof:
We focus as usual on the perfect bad news case (the perfect good news case being similar and easier). Let X denote the random variable representing the belief at which in equilibrium the agent makes his final decision. Let φ : [0, 1] → R denote the piecewise linear function with a kink at b such that φ(0) = W rB , φ(b) = W aB + b(W aG − W aB ) and φ(1) = W aG . Then: (c) if in equilibrium the principal is aggressive at t = 1 then supp(X) ⊆ {0, b 1 }; (d) if in equilibrium the principal is conservative at t = 1 then supp(X) = {0, a + T −1 , b 1 }, where a + T −1 > b denotes the beginning of period-T belief given q T −1 = a T −1 and s T −1 = g; We are now ready to prove the various parts of the proposition. First, we know from Lemmata 6 and 8 that, for δ < δ, in equilibrium the principal is aggressive at t = 1 regardless of γ. Hence, suppose δ < δ. Let γ > γ . Then, by Lemmata 5 and B.8, b 1 ≥ b 1 . That Q ≥ Q now follows from remarks (a), (b), (c) and (e) above.
Next, if players are patient enough, Lemmata B.8 and C.3 establish that, starting from γ = 0 and increasing γ, in equilibrium, at t = 1 the principal is aggressive at first but then switches to being conservative. Since b 1 is continuous in γ, remarks (a)-(e) establish that this equilibrium switch induces a drop in Q.
Lastly, let T > T . If in equilibrium the principal is aggressive at t = 1 given T = T and given T = T then, by Lemma 5, Q = Q . If in equilibrium the principal is aggressive at t = 1 given T = T but conservative at t = 1 given T = T then, by Lemma 5 and remarks (a)-(e), Q > Q . By Lemma 6, the last case remaining is when in equilibrium the principal is conservative at t = 1 given T = T and given T = T . A simple recursive argument based on Proposition A.1 then establishes Q > Q .
Then (OA.1), (OA.2) and convexity ofg t give unique a t and b t , with a t ≤ b ≤ b t , such that Hence, in equilibrium, the agent rejects if q t < a t , waits if q t ∈ (a t , b t ), and accepts if q t > b t . Moreover, since in equilibrium whenever indifferent the agent makes the decision preferred by the principal, it ensues that the agent waits if q t = a t < b t and accepts if q t = b t . This gives Standard arguments yieldf t = cav f t . Since in equilibrium whenever indifferent the principal picks the least informative experiment, the principal's experiment in period t is uniquely determined by the belief p t at the beginning of period t. Lastly, letting τ t (p t ) denote the principal's equilibrium experiment given p t yieldsĝ t (p t ) = E τt(pt) [g t (q t )|p t ].
Finally, sinceg t is convex, (OA.1) shows that g t is as well. Sinceĝ t (p t ) = E τt(pt) [g t (q t )|p t ], convexity of g t together with the properties of τ t (·) establish thatĝ t is convex.
Proposition OA.1. There exists a unique equilibrium.
Lemma OA.3. Each period, cutoffs 0 < a t ≤ b t < 1 exist such that, in equilibrium the agent rejects if q t < a t , waits if q t ∈ [a t , b t ), and accepts if q t ≥ b t .
Proof: For t = T , the result follows from Lemma OA.1. For t < T , the result was shown within the proof of Lemma OA.2. expected payoff from waiting can be written as (OA.4) Next, consider t < T − 1 such that b t+1 = b T −1 , and q t = b t , so that, by definition, the agent is indifferent between waiting and accepting. The agent's expected payoff from accepting On the other hand, using Lemma OA.4, q t = b t ≥ b T −1 = b t+1 . Hence, conditional on s t = g, the agent optimally accepts in the next period. It ensues that b t solves (OA.4) and, therefore, that b t = b T −1 . A recursive argument then yields b t = b T −1 for all t < T .
. Then γ >γ(δ A ) if and only if b T −1 > b, and either condition implies . Lemma OA.9. Let t < T − 1. In equilibrium, if the principal is aggressive in period t + 1, then the principal is also aggressive in period t.
Proof: Suppose that in equilibrium the principal is aggressive in period 1 < t + 1 < T . If a t = b t the statement of the lemma is straightforward. Assume therefore that a t < b t . By virtue of Lemma OA.5, in order to establish that the principal is also aggressive in period t it is enough to show that when p t = a t the principal strictly prefers the experiment M t = {0, b t } over the uninformative experiment. On one hand, the principal's expected payoff from designing M t = {0, b t } is at bt . On the other hand, her expected payoff from designing the uninformative experiment is given by δ P E st [f t+1 (p t+1 ) | q t = a t ]. The next sequence of inequalities therefore concludes the proof: The first inequality follows from noting thatf t+1 is concave (which we show formally in the appendix); the equality follows from the assumption that the principal is aggressive in period t + 1, and the second inequality is due to Lemma OA.7.
Proof: Note that in view of Lemma OA.9 it is enough to show that, for T sufficiently large, in equilibrium the principal is aggressive at t = 1. Next, part (ii) of Lemma OA.7 shows that any benefit to the principal from not being aggressive at t = 1 must come from persuading the agent to accept at t = T when ω = B. So these benefits are bounded from above by δ T −1 P , which tends to 0 as T → ∞. On the other hand, as b 1 < 1, the corresponding loss to the principal is bounded away from zero since by being aggressive at t = 1 the principal obtains acceptance with strictly positive probability conditional on ω = B. We conclude that, for T sufficiently large, in equilibrium the principal is aggressive at t = 1. Lemma OA.11. In equilibrium, each period either the principal is aggressive, or the principal is conservative.
Proof: The proof follows the same steps as the proof of Lemma 3 of the main text.
Proof: We saw in the proof of Lemma OA.8 that γ ≤γ(δ A ) implies that in equilibrium the agent never waits. So whenever γ ≤γ(δ A ), in equilibrium the principal has to be aggressive in period T − 1. In particular, sinceγ(δ A ) > 1 for δ A sufficiently small, we find that for δ A small enough the principal is aggressive in period T − 1 irrespective of γ and of δ P .
Suppose next that γ >γ(δ A ). Then for q T −1 = a T −1 in equilibrium the agent is indifferent between waiting and rejection. The agent's expected payoff from rejection is given by V R . His expected payoff from waiting is on the other hand given by δ A [a T −1 V G + (1 − a T −1 )(γV R + (1 − γ)V B )], where we deduced from Lemma OA.1 that s T −1 = g induces p T > b T = b. We therefore obtain Now, using Lemma OA.5, the necessary and sufficient condition for the principal not to be aggressive in period T − 1 in equilibrium is f T −1 (a T −1 ) ≥ a T −1 b T −1 . 25 Noting that f T −1 (a T −1 ) = δ P [a T −1 + (1 − γ)(1 − a T −1 )] and substituting for a T −1 and b T −1 using (OA.5) and (OA.6), the former inequality becomes One checks that if (OA.7) holds for some δ P , it must hold for δ P > δ P : either the righthand side is positive, and therefore decreasing in δ P , or it is negative, but the left-hand side is always positive, 26 so in this case the inequality is always satisfied. Moreover, for δ A = 1 the quadratic equation in γ obtained from (OA.7) has roots γ = 0 and γ = 1. On the other hand, for δ A < 1, (OA.7) is violated whenever either γ = 1, or γ =γ(δ A ). So (OA.7) holds for all values of γ in between the roots of the corresponding quadratic equation. Letting γ(δ A , δ P ) and γ(δ A , δ P ) denote the real roots, the previous remarks yieldγ(δ A , δ P ) < γ(δ A , δ P ) ≤ γ(δ A , δ P ) < 1 and show that these roots only exist for δ A > δ A and δ P > δ P (δ A ), where (i) δ A is defined implicitly by γ(δ A , 1) = γ(δ A , 1) and (ii) δ P (δ A ) is defined implicitly for δ A > δ A by γ(δ P (δ A ), δ A ) = γ(δ P (δ A ), δ A ). Noting that, by Lemma OA.11, whenever the principal is not aggressive she is conservative concludes the proof.