Conjugate duality in stochastic controls with delay

Abstract In this paper we use the method of conjugate duality to investigate a class of stochastic optimal control problems where state systems are described by stochastic differential equations with delay. For this, we first analyse a stochastic convex problem with delay and derive the expression for the corresponding dual problem. This enables us to obtain the relationship between the optimalities for the two problems. Then, by linking stochastic optimal control problems with delay with a particular type of stochastic convex problem, the result for the latter leads to sufficient maximum principles for the former.


Introduction
There are many real-world problems providing applications for stochastic optimal control formulations. Examples include the quadratic loss minimization problem in portfolio optimization, and the consumption and investment problem in economics. It is well known that Markovian optimal control problems can be solved by using either the method of dynamic programming or the stochastic maximum principle, the two methods having been developed separately and independently. In particular, the stochastic maximum principle typically involves a so-called Hamiltonian (function), and a corresponding system of adjoint stochastic differential equations; the optimal control can be expressed in terms of the maximum of the Hamiltonian, analogous to deterministic cases which were originally studied by Pontryagin. We refer the reader to [21,Chapter 3] for the general theory of the (Markovian) stochastic maximum principle.
Often, there is a need to extend these Markovian models to allow for time-lag or time delay effects. See, for example, [8] for delayed models in estimating volatility of the price of a financial security. Also, although the efficient-market hypothesis states that current prices of assets reveal all the necessary information from the market, investors often take the historic performance of assets into consideration and use past information in modelling the wealth processes of clarity, we first investigate the stochastic convex problem with discrete delay: for given convex functions L and l, minimize

L(t, X(t), X(t − δ),Ẋ(t), H X (t)) dt + E[l(X(T ))],
where X ranges through a certain family of Itô processes,Ẋ and H X denote, respectively, the drift and diffusion coefficients of X, and δ ∈ (0, T ) is a given deterministic length of delay. We assume that X(t) = x 0 (t) for t ∈ [−δ, 0] for a given deterministic continuous function x 0 . Note that, equivalently, we could maximize if L and l were concave, for example, replacing L and l by −L and −l. We investigate the corresponding dual problem and the conditions for optimization of the above problem. As noted in [20], the dependence on X(t − δ) in the convex problem results in its dependence on future values in its 'dual' process. Unlike the deterministic case, the 'time' cannot be reversed in the stochastic case. The novelty in our approach to overcome this difficulty lies in the use of conditional expectations in the characterization of dual processes and the use of the martingale representation theorem to identify them as solutions to BSDEs. Then we consider stochastic optimal control problems with just discrete delay. We connect stochastic control problems with delay with stochastic convex problems. This allows us to use the conditions for optimality of the convex problems to prove sufficient maximum principles for stochastic control problems with delay. In particular, we derive the Hamiltonian and the associated adjoint equations and express the sufficient maximum principles in terms of them, where the adjoint equations are anticipated BSDEs. Finally, with fairly straightforward modifications, we extend our results in both the stochastic convex and control problems to allow the model to include both discrete and exponential moving average delays. Although it is not included in the paper, the approach that we take can easily be extended further to include a Lévy jump measure or regime-switching in stochastic convex problems with delay. This can then be used to obtain stochastic maximum principles in the corresponding control problems.
To be able to use the results in stochastic convex problems with delay, we require some extra conditions on the functions involved. Some of these conditions are stronger than those obtained using the stochastic calculus approach in the literature. Apart from these technical conditions, if only a discrete delay is involved, our result on the sufficient maximum principle is similar to those in [4] and [12] when their models are restricted to ours. Note that some apparent differences in the signs of some functions involved are the consequence of our problem being minimization and those in these papers being maximization. However, if both types of delay are involved, our result improves those in [14] and in [13], when the model in the latter is jumpfree. Moreover, our approach of using the conjugate duality method unifies the Hamiltonian and the associated adjoint equations involved in the maximum principles for control problems with either just discrete delay or with both discrete and exponential moving average delays: those for the former are a special case for the latter.
The remainder of the paper is organized as follows. In Section 2 we describe the setting for the stochastic convex problem with (discrete) delay. In Section 3 we use conditional expectations to characterize dual processes and the martingale representation theorem to link them with the solutions of BSDEs. This enables us to derive the corresponding dual problem and, using the method of conjugate duality, obtain conditions for optimality. In Section 4 we concentrate on stochastic optimal control problems with discrete delay. We show how they can be reformulated as the convex problems described in Section 2. Then the application of the conditions for optimality obtained in Section 3 leads to sufficient maximum principles for the stochastic control problem with discrete delay. We also give an example to show how the results in the previous section can be used to obtain the optimal control. In Section 5, by modifying our previous arguments, we extend our results to stochastic control problems with both discrete and exponential moving average delays.

A stochastic convex problem with discrete delay
Let ( , F , P) be a complete probability space and T ∈ (0, ∞) be a fixed time horizon. For a fixed positive integer m, write B(t) = B(ω, t) for a standard m-dimensional Brownian motion and {F (t)} t∈[0,T ] for the filtration generated by B such that the usual conditions hold (see [9,Definition 2.25]).
In addition to m, we also fix an integer n > 0 and introduce the following four functional spaces, where we have suppressed ω for notational simplicity.
: the space of F (t)-progressively measurable, R n -valued stochastic processes X for which the norm : the space of F (t)-progressively measurable, R n×m -valued stochastic processes H for which the norm is finite, where elements in R n×m are represented by n × m matrices and so |H (t) In what follows, we simply write the above functional spaces as L 2 , L 2∞ F , L 21 F , and L 22 F , respectively, and, as above, suppress ω in functions and stochastic processes for notational simplicity, unless it is necessary for clarity.
Write X = L 21 F × L 22 F , let δ ∈ (0, T ) be fixed, and x 0 ∈ C([−δ, 0]; R n ) be a given initial deterministic continuous function. Note that We identify (Ẋ, H X ) ∈ X with the continuous F (t)-adapted stochastic process X : × [−δ, T ] → R n defined by (2.1) Here the representation of X by (Ẋ, H X ) ∈ X is unique up to indistinguishability (see [9, Definition 1.3]). Note that, since it is continuous, X is F (t)-progressively measurable. Moreover, we define the delayed stochastic process X δ associated with X by Proposition 2.1. For X defined by (2.1), we have X δ ∈ L 2∞ F and X(T ) ∈ L 2 . Proof. By Doob's maximal inequality (see [9, p. 14]), the definition of X implies that X ∈ L 2∞ F when it is restricted to [0, T ]. Then, by noting that and that |X(T )| 2 ≤ sup 0≤t≤T |X(t)| 2 , the required results follow.
Although the domain for X defined by (2.1) is [−δ, T ] for fixed ω ∈ , for simplicity, we shall in the following regard X as being in L 2∞ F as its path in [−δ, 0] is fixed.
F and J l on L 2 , respectively, by To ensure that L and l are measurable, and that I L and J l are strictly greater than −∞, not identically ∞, and are convex, as well as to be able to apply the conjugate duality method to I L and J l , we make the following assumptions throughout this paper.

Assumption 2.1. (i)
We assume that L and l are not identically infinite; when they are finite, L is a lower semicontinuous convex function on R n ×R n ×R n ×R n×m for any (ω, t) ∈ ×[0, T ], and l is a lower semicontinuous convex function on R n for any ω ∈ .
Note that, in the presence of (i), condition (ii) is equivalent to L and l being 'normal convex integrands', a concept introduced in [16] (see also [17, p. 180] such that, for any (x, y, z) ∈ R n×3 and h ∈ R n×m , where we abbreviate almost surely to a.s.
(ii) There exist X ∈ L 2 and an R-valued F (T )-measurable random variable ϑ 1 satisfying (ii) There exist X ∈ L 2 and an R-valued F (T )-measurable random variable ϑ 2 satisfying E[|ϑ 2 |] < ∞, such that l(ω, X) ≤ ϑ 2 , dP-a.s. The proof of Proposition 2.2 is essentially the same as the proof for the deterministic case of [17,Proposition 1]. Hence, we omit it here. Now, for given L, l, x 0 , δ, and for X defined by (2.1), we define a function of X in terms of I L and J l by (X) = I L (X, X δ ,Ẋ, H X ) + J l (X(T )).
It follows directly from Proposition 2.2 that > −∞ and that is convex. For such a function , we define, in a similar fashion to delay-free convex problems, the stochastic convex problem with discrete delay as follows.
Definition 2.1. The stochastic convex problem with discrete delay associated with L and l is to findX ∈ X realizing inf X∈X (X), where X is identified with (Ẋ, H X ) using (2.1). We refer to the function and problem (2.3) as the primal function and problem, respectively. Any X ∈ X such that (X) < ∞ will be called a feasible solution of this primal problem. Moreover, any feasible solutionX that achieves the infimum in (2.3) will be called an optimal solution to the primal problem.
Note that, if is identically infinite, no X ∈ X will be regarded as an optimal solution. Note also that our setting-up and definition of the primal function and problem bear a similarity to those studied in [1]. However, the extra delayed variable X δ introduced in the primal function and problem is a function of X, and so the methods and results of [1] cannot be applied directly to our problem.
Moreover, similarly to that for the corresponding deterministic convex problem with delay studied in [20, p. 172], we define a family of perturbed functions F of on X, parameterized by (θ, ξ, η) Compared with the perturbed functions used for the delay-free deterministic convex problem in [17,Section 7] and for the Markovian convex problem in [1, Definition III-1], the function F here depends on an extra parameter η to take account of the delayed variable X δ in I L . Accordingly, a family of perturbed optimization problems parameterized by (θ, ξ, η) is to findX ∈ X realizing inf X∈X F θ,ξ,η (X).
This results in the corresponding optimal value function φ on In particular, the relationship between F and yields Clearly, F is a composition of with a certain affine mapping. Thus, F is greater than −∞ and is a convex function of X, which implies the convexity of φ.

The dual problem and conditions for optimality
We now apply a duality approach of convex analysis to obtain the corresponding dual problem to the primal problem given by Definition 2.1 and to relate the optimality of (2.3) with minimizers of the corresponding dual problem.

Pairings and conjugate convex functions
The fundamental notion for applying the conjugate duality method is the concept of paired linear spaces, or simply paired spaces, associated with a particular duality pairing, or simply pairing, which is an R-valued bilinear form defined on the paired spaces. Following the convention described in [19, p. 13], when we say that two linear spaces are paired spaces, then a pairing has been specified and these two spaces are respectively equipped with compatible topologies (see [19]) with respect to that pairing.
Throughout this paper, we shall pair the Euclidean space R n with itself via the Euclidean inner product. To derive the dual problem to (2.3), we pair L 2 with itself via the pairing defined by Since is defined in terms of the functions L and l, to derive its dual we let, for any fixed (ω, t) ∈ × [0, T ], L * and l * be the usual conjugate convex functions of L and l with respect to the pairing given by the Euclidean inner product. A similar argument to that used for [17,Theorem 2] shows that, since L and l satisfy Assumptions 2.1-2.3, L * and l * also satisfy the corresponding Assumptions 2.1-2.3. Moreover, since all four spaces defined in Section 2 are decomposable (see [16, p. 532]), by Proposition 2.2, the conjugate duality given by [16,Theorem 2] can be generalized directly to relate I L * and J l * to I L and J l as follows, where I L * and J l * are defined similarly to I L and J l , respectively.

induced directly from (3.2) and (3.3). Similarly, J l and J l * are the conjugate convex functions of each other with respect to the pairing (3.1).
Noting that φ defined by (2.5) is convex by Proposition 2.3, the conjugate convex function φ * of φ, with respect to the pairing induced from (3.1) and (3.3) Then any solution to the optimization problem is related to the optimality of our primal problem (2.3). To see this, setting (θ, ξ, η) = (0, 0, 0) on the right-hand side of (3.4), we have andX ∈ X such that the equality in (3.6) holds, then i.e.X is an optimal solution to the primal problem (2.3).

The dual problem
For the Markovian convex problem studied in [1], the corresponding φ * has been expressed in terms of the corresponding I L * and J l * in a similar manner to that for the corresponding primal function in terms of I L and J l . Unfortunately, the introduction of the extra parameter η * in (3.4) to pair with η in (2.5), due to the delayed variable X δ , makes this no longer the case; a phenomenon clear from the deterministic convex problem with delay studied in [20].
To find an expression for φ * , we write P = L 2 × L 21 F and, for (P T ,Ṗ ) ∈ P, define the continuous F (t)-adapted stochastic process P by Clearly, P (0) is a constant. By the martingale representation theorem, there exists a unique Moreover, by Doob's maximal inequality, it follows from (3.8) that, if (P T ,Ṗ ) ∈ P, then P ∈ L 2∞ F . As for X ∈ X, we shall identify P with (P T ,Ṗ ) ∈ P using (3.7). However, unlike X, the identification (3.7) is implicit and it results in the explicit identification of P with (P T ,Ṗ , Moreover, this explicit identification of P shows that P is the solution of a stochastic differential equation with a terminal, rather than an initial, condition, i.e. P is the solution to a BSDE. Note that the corresponding P in the deterministic convex problem with delay studied in [20, Proposition 3.1], which follows an ordinary differential equation with a terminal condition, can be equivalently expressed as the solution of an ordinary differential equation with a fixed initial condition, in a similar manner to that for X in the corresponding primal problem described in [20, p. 167]. The identification of P here described by a BSDE is not equivalent to the identification for X given by (2.1). The process P ∈ P defined in such a way plays an important role in our derivation of the expression for φ * as given in the following theorem, which generalizes the result [20, Proposition 3.1] for the deterministic convex problem with delay.

Theorem 3.1. Suppose that Assumptions 2.1-2.3 hold. For any given
where 1 A denotes the indicator function of set A, and identify P by (3.7) with (P T ,Ṗ ) ∈ P.
where H P is specified by (3.8). Proof. First, by Jensen's inequality and Fubini's theorem, the fact that η * is in F , so thatṖ defined by (3.9) is in L 21 F . Using (2.5) and F defined by (2.4), we can rewrite φ * given by (3.4) as To simplify this, we use the relationship between X and X δ to rewrite the final term on the right-hand side of (3.12) as On the other hand, using (3.8) for P and applying the Itô formula to P (t), X(t) , we obtain recalling that P (0) is a constant. Similarly, by applying the Itô formula to P (t), x 0 (0) , we have Then replacing P T andṖ in (3.14) and in (3.15) by their definitions in (3.9), these two equations lead to the left-hand side of which is equal to the first term of the right-hand side of the second equality in (3.13). Finally, we substitute (3.13) into (3.12), using (3.16) and Proposition 3.1, to obtain as required.
Although the relationship we obtained between and φ * bears some similarity to that between the corresponding functions obtained in [20] for the deterministic convex problem with delay, our proof is different from that of [20]. In particular, we need to deal with the issue of an anticipated (or time advanced) variable.
we can rewrite (P ,Q) given by (3.10) as By using Proposition 3.1 and noting Proposition 2.2, we see that is strictly greater than −∞ and is convex. Similarly to the primal problem defined by Definition 2.1, any (P ,Q) ∈ P × L 21 F such that (P ,Q) < ∞ will be called a feasible solution of the dual problem. we shall call a feasible solution (P ,Q) which achieves the infimum in (3.18) an optimal solution to the dual problem.
Unlike the classical convex problem, although we call the dual to , the space P × L 21 F on which is defined is not the paired space, with respect to the pairing defined in Section 2, to the space X on which is defined on account of the fact that the convex problems we study also depends on X δ . The reason that is called the dual to will become clear in the next subsection.
If there is no delay in the model, corresponding to δ = 0, X δ is identical with X and so there exists a functionL : t, x, z, h). Then the optimal value function φ, corresponding toL and l, depends only on (θ, ξ ). Hence, Theorem 3.1 yields that P = (P T ,Ṗ ) ∈ P is identical with (θ * , ξ * ), so that (P ) = φ * (θ * , ξ * ), and Applying the same technique as that in (3.15) to the last two terms on the right-hand side of the above equation, we obtain (P ) = IL * (Ṗ , P , H P ) + J l * (−P T ) + P (0), x 0 (0) , recovering the dual function given by [1, Definition II-1] with fixed initial value P (0).

Relationship between the optimalities for dual problems
The following relationship between the primal function and its dual function is a direct consequence of (3.5) and Theorem 3.1. We now use stochastic calculus to obtain the relationships between the optimal solutions of the primal and its dual problems as follows. This result generalizes [1, Theorem IV-2] for the Markovian convex problems. In particular, the third equivalent condition given below provides the crucial basis in the next section for us to derive the Hamiltonian and the associated adjoint equation for stochastic optimal control problems with discrete delay. (ii) Second,X and (P ,Q) are the respective optimal solutions to the primal problem (2.3) and its dual problem (3.18), and the equality in (3.19) is attained.
Note that if ∂L and ∂l denote the subdifferential sets of L and l, conditions (3.21) and (3.22) are, respectively, equivalent to
(see [17, p. 207] where A 1 is the process defined by the left-hand side of (3.21) and A 2 is the random variable defined by the left-hand side of (3.22). Since, for fixed (ω, t) ∈ × [0, T ], L * and l * are the conjugate convex functions of L and l, respectively, A 1 and A 2 are nonnegative. Then (3.23) implies that A 1 (t) = 0, dP ⊗ dt-a.s., and A 2 = 0, dP-a.s., so that both (3.21) and (3.22) hold. This completes the proof.

A stochastic optimal control problem with discrete delay
Having obtained the conditions for optimality of the stochastic convex problem with delay, we now turn our attention to the stochastic optimal control problem with discrete delay.
Let U ⊂ R r be a convex set, where r > 0 is a given integer; b : [0, T ] × R n × R n × U → R n and σ : [0, T ] × R n × R n × U → R n×m be two given measurable functions and the continuous F (t)-adapted state process X : × [−δ, T ] → R n be described by the controlled SDDE where x 0 , X δ , and δ are as defined in Section 2 and u : × [0, T ] → U is an F (t)-adapted control process. For given continuous functions G : [0, T ]×R n ×R n ×U → R and g : R n → R, the cost functional J associated with controlled SDDE (4.1) is defined by

G(t, X(t), X δ (t), u(t)) dt + g(X(T )) .
Let U denote the space of admissible controls u for which controlled SDDE  We shall callū an optimal control.
Note that this optimal control problem is a special case of the stochastic optimal control problems considered in [4] and [12], where the models also included the discrete delayed control u δ .

Reformulation of the problem
To use the results for the stochastic convex problem with delay, obtained in the previous section, to study control problem (4.2), we link problem (4.2) with a particular convex problem (2.3) as follows. For (ω, t, x, y, z, h) Using C, take the functions L and l, respectively, in the primal function (2.2) to be and l(x) = g(x). (4.5) With L and l so defined, control problem (4.2) becomes a particular stochastic convex problem (2.3).
If r = n and if b and σ are both affine functions of (x, y, u), the corresponding C defined above contains a single element, determined by n(1+m) linear equations, if it is not empty. Then the expression for the corresponding L simplifies. Moreover, under appropriate assumptions on the coefficients of these affine functions and on G and g, including the convexity of G and g, it can be checked that the corresponding problem (2.3) satisfies the required Assumptions 2.1-2.3. In the following example, we demonstrate that this connection makes it possible to express an optimal controlū of (4.2) in terms of solutions to the corresponding dual problem. Example 4.1. For simplicity, we set n = m = 1. Suppose that U = R; that b(t, x, y, u) and σ (t, x, y, u) in (4.1) are given by subject toẊ where X is identified with (Ẋ, H X ) ∈ X via (2.1). For P identified with (P T ,Ṗ ) ∈ P via (3.7), since l(x) = g(x), Similarly, (4.4) for L yields

t)],Q(t), P (t), H P (t))
= sup where H P is specified by P via (3.8) and P T = −2a 3 X(T ) by (3.22). To find an explicit expression for L * in (4.8), we take the derivatives, with respect to x and y, respectively, of the function within the first bracket on the right-hand side of (4.8). We find that the corresponding derivatives are 0 if and only iḟ

P (t) = E[Q(t + δ)1 [0,T −δ] (t)|F (t)] − a 1 (t)P (t) − a 2 (t)H P (t),
Similarly, taking the derivative, with respect to u, of the function within the second bracket on the right-hand side of (4.8), we see that the corresponding derivative is 0 if and only if

t)P (t) + c 2 (t)H P (t)}. (4.10)
This yields Now, if (u, X, P ,Q) is such that u satisfies (4.10); X is identified with (Ẋ, H X ), where (Ẋ, H X ) is defined by (4.7); and P is identified with (−2a 3 X(T ),Ṗ ), where (Ṗ ,Q) satisfies (4.9), then it can be verified that the two equalities in Theorem 3.2(iii) hold for such (u, X, P ,Q). Thus, by Theorem 3.2, u is an optimal control for the control problem corresponding to (4.6).
Note that, if we replace g in Example 4.1 by g(x) = a 3 x, the above argument and derivation can be repeated except that l * (−P T ) becomes 0. Then the modification to the result is that P T = −a 3 rather than −2a 3 X(T ). Since P T becomes a constant, the corresponding H P is 0 and P is deterministic (see [4]). Thus, the corresponding optimal u is also deterministic and given by u = c 1

(t)P (t)/c 3 (t).
For more general b, σ , G, and g, to ensure that the set C is not empty and that the link of stochastic control problem (4.2) to stochastic convex problem (2.3) enables us to apply Theorem 3.2, we make the following assumptions. (4.11) Hypothesis 4.2. We say that g is a convex function of x. Moreover, there exist constants c 2 ∈ R and c 3 > 0 such that We now show that, under these two hypotheses, L and l defined by (4.4) and (4.5) satisfy Assumptions 2.1-2.3, except for the convexity requirement for L.
It is straightforward to verify that, under these hypotheses, L and l so defined are lower semicontinuous and are not identically infinite. Moreover, the argument for the Markovian control problems of [1, p. 393] can be generalized to show that the conditions of Assumption 2.1(ii) for L and l are satisfied. Thus, except for the required convexity of L, all conditions in Assumption 2.1 are satisfied by L and l. We now show, in the following proposition, that the remaining two assumptions are also satisfied. Proof. By Hypothesis 4.2, G and g are bounded below, which implies that L and l are bounded below. Hence, L and l satisfy Assumption 2.2.
Turning to the convexity of L, which is not guaranteed by Hypotheses 4.1 and 4.2, but is required for Assumption 2.1, the following proposition gives a sufficient condition for it to hold. x, y, u). (4.12) If H is concave with respect to (x, y, u) x, y, u)), (p, h p ) } . (4.14)

Since H is linear in (p, h p ), (z, h), (p, h p ) − H (t, x, y, u, p, h p )
is convex in (u, p, h p ) by the assumption. Then the order of the supremum and the infimum on the right-hand side of (4.13) can be exchanged (see [18,Corollary 37.2.2]) so that whereĤ (t, x, y, p, h p ) = sup u∈U H (t, x, y, u, p, h p ). Since U is a convex set, it is easy to check thatĤ is concave in (x, y) and convex in (p, h p ). Therefore, (4.15) implies that L is convex in (x, y, z, h), as required.
To end this subsection, we use an example to demonstrate that there are indeed stochastic control problems where at least one of b and σ is not an affine function of (x, y, u), but which can be reformulated as stochastic convex problems studied in the previous sections. Clearly, L is a convex function of (x, y, z, h). Hence, by Proposition 4.1, as well as the discussion prior to it, the stochastic control problem associated with b, σ , G, and g defined here is transformed into a stochastic convex problem of the type studied in the previous sections.

Stochastic maximum principles
We now use Theorem 3.2, in particular conditions (3.21) and (3.22), to derive the sufficient conditions for optimality, as well as the expressions for the Hamiltonian and associated adjoint equation, for problem (4.2).
For control problem (4.2), define the processes (P , H P ) ∈ L 2∞ F × L 22 F by the following anticipated BSDE: where H is defined by (4.12), where we have used the shorthand notation

X(t), X δ (t), u(t), P (t), H P (t))
and similarly for the partial derivative (∂H /∂y)(t + δ), and where we assume the necessary differentiability of H. Note that, if δ = 0 so that there is no delay in the model, H defined by (4.12) is independent of y, corresponding to X δ . Then the corresponding H and (4.16) are termed as the (stochastic) Hamiltonian (function) and the adjoint equation due to their link with the deterministic cases (see [21,Chapter 3]). We adopt them for our model and the following result justifies this usage.  1 and 4.2 hold and that L defined by (4.4) is convex  with respect to (x, y, z, h). In addition, assume that U is compact and that the functions b, σ , and G are continuously differentiable with respect to (x, y) and that g is continuously differentiable with respect to x. Suppose thatX ∈ X and (P ,Q) ∈ P × L 21 F together satisfy (3.21) and (3.22) with L and l being defined by (4.4) and (4.5), respectively. Then it is necessary that there exists aū ∈ U realizing (4.2). Moreover, (i)X is the unique strong solution of controlled SDDE (4.1) with u in the functions b and σ replaced byū; (ii) (P , HP ) is a solution of the adjoint equation (4.16), replacing (X, X δ , u) by (X,X δ ,ū), where HP is specified byP via (3.8); (iii) dP ⊗ dt-a.s.,
Proof. Given that control problem (4.2) has been reformulated as the corresponding primal problem (2.3), with L defined by (4.4) being convex, Assumptions 2.1-2.3 are satisfied by reformulated problem (2.3). Moreover, under the given conditions, it follows from Theorem 3.2(ii) thatX is a solution of the corresponding primal problem (2.3).
On the other hand, using (4.4) for L and using the definition of conjugation functions, L * in (4.18) can also be expressed, in terms of b, σ , and G, as

y,ū(t),P (t), HP (t))}.
Since b, σ , and G are differentiable with respect to (x, y), by taking the derivatives with respect to x and y of the function within the bracket on the right-hand side of the above equation, the fact that the maximum in the above equation is attained at (X(t),X δ (t)), dP ⊗ dt-a.s., implies thatṖ whereH (t) = H (t,X(t),X δ (t),ū(t),P (t), HP (t)). ReplacingQ in (4.21) using (4.22) yieldṡ Since l * is the conjugate convex function of l and since l = g, the above, together with the definitions of conjugate functions, imply that Taking the derivative, with respect to x, of the function within the bracket on the right-hand side of the above equation, we see thatP T must satisfy the condition P T = − ∂g ∂x (X(T )), dP-a.s. (4.24) Now, sinceP = (P T ,Ṗ ) ∈ P, using (3.8), (4.23), and (4.24) yields i.e. (ii) holds.
Note that, rather than defining them, the proof of the above theorem uses the techniques of conjugate duality to derive the Hamiltonian H and the associated adjoint equation for problem (4.2). If δ = 0, the Hamiltonian H is independent of y, which corresponds to the delayed variable, and then adjoint equation (4.16) reduces to a classic BSDE studied in [21,Chapter 3].
Recall that, by Proposition 4.2, the concavity condition on the Hamiltonian H implies the required convexity of L. Under such a concavity condition on H, the proof of Theorem 4.1 can be modified to give the following sufficient maximum principle.

Theorem 4.2.
In addition to Hypotheses 4.1 and 4.2, we assume further that the functions b, σ , and G are continuously differentiable with respect to (x, y), that g is continuously differentiable with respect to x, and that H (t, x, y, u, p, h) is concave with respect to (x, y, u). Letū ∈ U,X be the solution to controlled SDDE (4.1) associated withū, and (P , HP ) be the solution to adjoint equation (4.16) associated with (ū,X). If (ū,X,P ) satisfies (4.17) thenū is an optimal solution for the control problem (4.2).
Comparing with [4], [12], and [14], the above sufficient stochastic maximum principle is proved using the method of conjugate duality, for which we require Hypotheses 4.1 and 4.2. Otherwise, the other conditions set in the theorem are similar to those required in [4, Theorem 3.2] and the result is similar to those of [4], [12], and [14] when their models are restricted to ours.

The inclusion of exponential moving average delay
The methods and results obtained in the preceding sections can be extended to include an exponential moving average delay, in addition to the discrete delay X δ , in the model. That is, the continuous F (t)-adapted state process X is described by the controlled SDDE dX(t) = b(t, X(t), X a (t), X δ (t), u(t)) dt + σ (t, X(t), X a (t), X δ (t), u(t)) dB(t), t ∈ (0, T ], where x 0 , X δ , δ, and u are defined as before and X a denotes the exponential moving average delay of X given by The functions G and g may also depend, respectively, on X a and X a (T ), and the associated optimal control problem is to findū ∈ U realizing inf u∈U J a (u), Note that this type of stochastic control problem with delay was studied in [13], where the authors obtained a sufficient condition for the maximum principle using methods of stochastic calculus.
As in [7], we introduce the state process V : × [0, T ] → R n defined by e λs x 0 (s) ds.
Then V (t) = X a (t) and so the combined SDDE for W = (X, V ), given by (5.1) with X a replaced by V and (5.3), is equivalent to the original controlled SDDE (5.1) for X. In terms of this new combined SDDE, the stochastic optimal control problem associated with (5.1) becomes a stochastic optimal control problem with discrete delay, where its drift and diffusion coefficients are independent of V δ . To derive the adjoint equations and the stochastic maximum principle for the stochastic optimal control problem associated with (5.1), and to improve the results of [13] and [14], we modify our previous conjugate duality approach to extend it to W = (X, V ). For this, in addition to X ∈ X, we identify (V , H V ) ∈ X with the continuous F (t)-adapted stochastic process V : × [0, T ] → R n defined by in a similar fashion to the identification of X with (Ẋ, H X ) ∈ X. At the same time, take L a and l a to be modifications of L and l in Section 2, so that they depend also on (V ,V , H V ) and on V (T ), respectively. Then the corresponding stochastic convex problem with discrete delay is to find (X,V ) ∈ X × X realizing inf (X,V )∈X×X a (X, V ), (5.4) where a (X, V ) = I L a (X, V , X δ ,Ẋ,V , H X , H V ) + J l a (X(T ), V (T )). Adapting the arguments in Section 3, in addition to P = (P T ,Ṗ ) ∈ P, we require another continuous F (t)-adapted stochastic process P a to pair with V ∈ X, where P a : × [0, T ] → R n is identified with (P a T ,Ṗ a ) ∈ P in the same sense that P is identified with (P T ,Ṗ ) using (3.7). Assuming that L a and l a satisfy the appropriately modified Assumptions 2.1-2.3, the argument for the proof of Theorem 3.1 can be used to obtain the dual problem to (5.4) in order to realise inf (P ,P a ,Q)∈P×P×L 21 F a (P , P a ,Q), (5.5) where a (P , P a ,Q) ,Ṗ a ,Q, P , P a , H P , H P a ) and where H P a ∈ L 22 F is obtained by applying the martingale representation theorem to P a ∈ P as for H P obtained from P via (3.8). Since the combined SDDE is independent of V δ , the inclusion of P a in a does not result in the dependence of a on an additional Q a as was the case for the inclusion of Q in . The expression for a then enables us to modify the proof of Theorem 3.2 to obtain the following equivalent conditions for optimality of this new stochastic convex problem.
Theorem 5.1. For any given (X,V ) ∈ X × X and (P ,P a ,Q) ∈ P × P × L 21 F , the following three statements are equivalent. (iii) Finally,
Returning to the optimal control problem (5.2), by adapting the technique for the proof of Theorem 4.1, we see similarly that Theorem 5.1 implies the following extension of Theorem 4.1 to have a sufficient condition for the optimality of (5.2), involving the Hamiltonian H a of problem (5.2) defined by H a (t, x, y, z, u, p, r, h p , h r ) = b(t, x, y, z, u), p + x − λy − e −λδ z, r + σ (t, x, y, z, u), h p − G(t, x, y, z, u), and associated adjoint equations.

Theorem 5.2.
Under the modified conditions to those in Theorem 4.1, suppose that (X,V ) ∈ X×X and (P ,P a ,Q) ∈ P×P×L 21 F together satisfy the two equalities given in Theorem 5.1(iii) with L a and l a being defined using G and g in a similar manner to that specified in Section 4. Then it is necessary that there is aū ∈ U realising (5.2). Moreover, (i)X is the unique strong solution of the controlled SDDE (5.1) with u in the functions b and σ replaced byū; (ii) (P , HP ) and (P a , HP a ) are solutions of the following adjoint equations, replacing (X, X a , X δ , u) by (X,X a ,X δ ,ū): where HP and HP a are, respectively, specified byP andP a via (3.8); (iii) dP ⊗ dt-a.s.,

H a (t,X(t),X a (t),X δ (t),ū(t),P (t),P a (t), HP (t), HP a (t))
= max u∈U H a (t,X(t),X a (t),X δ (t), u,P (t),P a (t), HP (t), HP a (t)). (5.8) Note that the adjoint equations derived here are different from those defined in [13]: instead of the adjoint equations for a triple of stochastic processes in [13], we have those for paired stochastic processes. In addition, instead of a classic controlled BSDE as in [13], one of the adjoint equations here is described by an anticipated BSDE. Note also that the Hamiltonian and adjoint equations here are both different from those defined in [14].
Similarly, we can generalize Theorem 4.2 to obtain the following sufficient stochastic maximum principle for control problem (5.2). In particular, it requires weaker assumptions than those of [13, Theorem 2.2] and [14, Theorem 3.1], of which our result is therefore a generalization. Theorem 5.3. In addition to modified Hypotheses 4.1 and 4.2, we assume further that the functions b, σ , and G are continuously differentiable with respect to (x, y, z), that g is continuously differentiable with respect to (x, y), and that H a (t, x, y, z, u, p, r, h p , h r ) is concave with respect to (x, y, z, u). Letū ∈ U,X be the solution to the controlled SDDE (5.1) associated withū, and (P , HP ) and (P a , HP a ) be the solutions to the adjoint equations (5.6) and (5.7) associated with (ū,X). If (ū,X,P ,P a ) satisfies (5.8) thenū is an optimal solution for control problem (5.2).
We note that, if (5.1) is independent of X a , then the Hamiltonian and the associated adjoint equations involved in the maximum principles for control problem (5.2) coincide with those obtained in Section 4 for the corresponding control problem with just discrete delay. Hence, our results in Section 4 become a special case of those for the optimal control problems with both discrete and exponential moving average delays.
Finally, we complete the paper by considering the following simple control problem with both discrete and exponential moving average delays. Note that it usually cannot be solved using the results of either [14] or [13] as, for the former, g needs to be independent of y and, for the latter, the parameters need to satisfy the constraints to ensure that one of the adjoint processes there be identically 0.
Similarly to Example 4.1, it can be verified that this control problem can be reformulated as a particular convex problem, where the corresponding Assumptions 2.1-2.3 are satisfied. The Hamiltonian for this problem is given by   H a (t, x, y, z, u, p, r, h p , h r
By taking the derivative, with respect to u, of H a , we find that is an optimal control for the problem, where (P , HP ), together with (P a , HP a ), is the solution of the paired adjoint equations. It can be verified that the pair of adjoint equations in this example admits a unique solution. In particular, since P (T ) and P a (T ) are both constants, H P (t) = H P a (t) ≡ 0. Hence, this delayed control problem has a deterministic solution.