Sparse and Switching Infinite Horizon Optimal Control with Nonconvex Penalizations

A class of infinite horizon optimal control problems involving mixed quasi-norms of $L^p$-type cost functionals for the controls is discussed. These functionals enhance sparsity and switching properties of the optimal controls. The existence of optimal controls and their structural properties are analyzed on the basis of first order optimality conditions. A dynamic programming approach is used for numerical realization.


Introduction
In this work we continue our investigations of infinite horizon optimal control problems with nonconvex cost functionals which we started in [21]. We focus on optimal control of nonlinear dynamical systems which are affine in the control. The input control is a vector-valued function u = (u 1 , . . . , u m ) in the space L ∞ (0, ∞; R m ) under control constraints. The focus rests on that part of the cost functional which involves the control. It is given as follows: where 0 < p < 1 and p ≤ q ≤ 1. This functional is nonsmooth and nonconvex, leading to a challenging optimal control problem with interesting properties for the optimal control laws, in particular sparsity and switching. It appears that the terminology "sparse" is not rigorously defined in the literature, but generally it is used to describe the property of the optimal control to be identically zero over nontrivial subsets of the temporal domain. Here, by sparsity we refer to the situation that the whole vector u(t) is zero. Switching control, is related to coordinate-wise sparsity, and is used to describe the property u i (t)u j (t) = 0 for i, j ∈ {1, . . . , m}, i = j, t ≥ 0, which is equivalent to saying that at most one coordinate of u(t) is non-zero at t. While the use of the control penalty (1.1) does not guarantee sparsity or switching properties, it enhances them. This is illustrated in Figure 1, where unit balls for different q/p ratios are shown. For a fixed q decreasing p (column-wise in the sub-figure) one direction becomes dominant over the other. To further illustrate the effect of (1.1) let us consider the case p = 1/2 and q = 1. Then the running cost for the control is given by where the L 1 -penalization on u i will support sparsity in the control and the product penalization enhances switching phenomena. More generally, if q p = j ∈ N is an integer, then the running cost is is combination of an L q -penalization on each control coordinate u i , and it further contains weighted summands of (up to) j− tuples of fractional powers of |u i |, with the sum of the powers for each tuple summing to q. Fixing q, and decreasing p we expect that the control cost (1.1) increases the switching nature of the optimal controls, since the weights on the tuples compared to those on the singletons increase. Moreover, decreasing q we expect that the subdomain over which the optimal control vanishes (in all coordinates) increases. These properties will be illustrated by numerical experiments.
The case with p = q and 0 < p ≤ 1 has been studied in [21]. Existence and sparsity properties of optimal controls have been analyzed for this case, and these properties have been observed in the numerical simulations in the case with 0 < p = q < 1. In the present work, the analysis is made for more general nonconvex problems with the control cost (1.1). Concerning the question of existence of optimal solutions, which is not guaranteed in general, we follow the ideas from [21] to reformulate the problem in infinite-dimensional sequence spaces by descretizing the controls, and extending an important result on weakly sequentially continuous mappings from [19] to obtain the existence result for our purposes.
The analysis of the sparsity and switching structure is based on the optimality conditions. For this purpose we derive the necessary first order optimality conditions of the original problem, which follow from general results which are available in the literature. We also derive sufficient optimality condition for the reformulated problems. Subsequently, we investigate the sparsity and switching properties of the optimal controls under box constraints. Finally, by using dynamic programming techniques, optimal control laws are approximated globally in the state space for linear and nonlinear dynamical systems.
Let us mention previous related work on sparse and switching control. Closed-loop infinite horizon sparse optimal control problems with L p (0 < p ≤ 1) functionals were analyzed in [21]. Open-loop, finite horizon L 1 sparse optimal control for dynamical systems have been studied in e.g. [15,26,3,9]. Openloop, finite horizon sparse optimal control for partial differential equations was studied in e.g. [18,7,23].
The Hamilton-Jacobi-Bellman equation for impulse and switching controls was discussed in [5,28]. The synthesis of sparse feedback laws via dynamic programming has been studied in [13,20,1]. In the context of partial differential equations optimal control of systems switching among different modes were analysed in [16,17], problems with convex switching enhancing functionals were investigated in [11], and problems with nonconvex switching penalization in [12]. In [29] switching controls based on functionals suggested by controllability considerations were investigated. Mixed (quasi-)norms as in (1.1) with p = q have been used earlier, though typically in convex situations with p ≥ 1, q ≥ 1. These investigations were carried out in the context of machine learning, regression analysis, and mathematical imaging, with the goal of achieving group sparsity or structured parsimony, see e.g. [5,14,22,27,30], and the references given there.
The structure of the paper is the following. The short section 2 contains the precise problem formulation. Existence of optimal controls, which are discretized in time, is obtained in section 3. The sparsity and switching structure of the optimal controls is analyzed on the basis of the optimality conditions for the time-continuous as well as the time discrete problems in sections 4 and 5, respectively, and section 6 contains numerical results.

Optimal control problem
Let U ⊂ R m be a closed set and let f i : R d → R d be continuous functions for i = 0, . . . , m. We consider the following control system: given x ∈ R d , Here y(t) ∈ R d is the state variable and u(t) = (u 1 (t), . . . , u m (t)) ∈ R m is the input control. Given p ∈]0, 1[, we set for the vector u = (u 1 , . . . , Let q ∈ [p, 1], λ > 0, γ > 0 and y d ∈ R d . For any x ∈ R d , consider the cost functional where (y, u) satisfies the state equation (2.2), and the infinite horizon optimal control problem In (2.3), λ is called the discount factor, γ is the weight of control cost and · 2 is the Euclidean norm in R d . The following assumptions are made.
(H1) The control set U is compact and convex.
(H2) There exists L > 0 such that Let us mention that the cost functional J is convex in the state variable and nonconvex in the control. The case q = p has been discussed in [21].

Time-discretized model
Since the cost functional J is not convex in u, existence of optimal controllers for problem (2.4) does not hold in general. For this purpose we analyse the existence in the case of a time-discretized approximation to (2.4). We introduce the temporal grid (t k ) k∈N : and denote by I k = [t k ,t k+1 [ for k ∈ N. The control is then restricted to the following set of piecewise constant functions: Consider the following optimal control problem where y solves (2.2). A direct computation shows that For any r > 0, the infinite dimensional sequence space r = {u ∈ ∞ : ∑ ∞ k=1 |u k | r < ∞} is endowed with For convenience we recall that r , with 1 < r < ∞, are reflexive Banach spaces and r 1 ⊂ r 2 if 1 ≤ r 1 < r 2 ≤ ∞. To investigate the existence of optimal controls, we follow the idea introduced in [19] by defining the following reparametrization ψ : q/p → q with ψ(z) k = |z k | 1 p sgn(z k ), for z = (z 1 , z 2 , . . .) ∈ q/p , k = 1, 2, . . . .
Using the fact that ψ is an isomorphism, (3.5) is equivalent to where y(·) satisfies (3.7) To obtain existence for problem (3.5), the following lemma is needed which gives some important properties of ψ. The idea of proof is inspired by [19, Lemma 2.1].
Lemma 3.1. Let q > p and let β denote the conjugate exponent of q/p. The mapping ψ : q/p → β is weakly (sequentially) continuous, i.e. z n →z weakly in q/p implies that ψ(z n ) → ψ(z) weakly in β .
Proof. Let r = 1 p + 1 and let r * denote the conjugate exponent of r given by r * = p + 1. Then which implies r * < β . For any z ∈ q/p , we have The above computations imply that (ψ(z), z) r * , r = ψ(z) r * z r , and ψ(z) r * r * = z r r , which means that ψ is the duality mapping from r to r * and is weakly sequentially continuous. If z n →z weakly in q/p , then z n →z weakly in r since 1 < q/p < r. Therefore, ψ(z n ) → ψ(z) weakly in r * . Using that r * < β , this implies that ψ(z n ) → ψ(z) weakly in β .
Proof. The case q = p has been dealt with in [21]. Therefore, we focus on the case q > p. Let w n = (w n 1 , . . . , w n m ) be a minimizing sequence for the problem (3.6), we set u n = (u n 1 , . . . , u n Note that b k and u n i,k are uniformly bounded with respect to k, i, n, and therefore w n i,k are uniformly bounded. For each i ∈ {1, . . . , m}, , which implies that w n i is bounded in q/p . On the other hand, Noting that q/p ⊂ β /p since β = q q−p > q, we deduce that converges weakly to somew i in q/p (see [10, pp. 73]). By the same reason, a subsequence of {ψ(w n i )} ∞ n=1 converges weakly to some ξ i in β . From Lemma 3.1 one deduces that Let y n be the solution to (3.7) with control w n . Then on each interval I k , we can deduce by the Arzelà-Ascoli theorem that there existsȳ k : I k → R d such that y n →ȳ k uniformly in I k , as n → ∞.
Forȳ : [0, ∞) → R d defined byȳ| I k =ȳ k for k ∈ N, it follows that for any T > 0 Therefore,ȳ is the solution to (3.7) corresponding tow := (w 1 , . . . ,w m ). Here we use that the dynamics f is affine in ψ(w i ), i = 1, . . . , m. Using the fact that y n →ȳ pointwise in [0, ∞) and w n i,k →w i,k for any i = 1, . . . , m, k ∈ N, we obtain by Fatou's lemma that which implies thatw is a minimizer for problem (3.6). Hence a minimizerū ∈ U ∆ for problem (3.5) is given byū

Sparsity and switching properties: the time-continuous problem
For the time-continuous problem (2.4), the necessary optimality conditions are known from the literature and are next recalled for convenience.
For each x ∈ R d , ifū is a locally optimal control for problem (2.4) andȳ is the associated optimal trajectory, then there exists an adjoint state ϕ : and ϕ satisfies, for t ∈]0, ∞[ a.e., for all u ∈ U.
According to (4.8), for t ∈]0, ∞[ a.e. we look for the minimizer of the following function Assume that the set of control constraints U has the form of box constraints: where ρ i > 0. In this case the optimality condition can be used to derive the following structural properties of a minimizer.
Theorem 4.2. Letū be an optimal control for problem (2.4) with U ∞ given in (4.9), letȳ be the associated optimal trajectory and ϕ the associated adjoint state. For t ∈]0, ∞[ a.e., we define the following index sets: Then the following properties hold: (ii) For t ∈]0, ∞[ a.e. and i ∈ I 0 (t), (iii) For t ∈]0, ∞[ a.e. and i ∈ I + (t), we havē Let us briefly comment on sparsity and switching properties which follow from Theorem 4.2. For the coordinates in the index set I − (t), the controllers are zero. We refer to these coordinates as the sparse control coordinates at the time t. If I + (t) = / 0, then i ∈ I 0 (t) ∪ I − (t) for all i = 1, . . . , m, and hence u is switching or sparse at time t. If I + (t) = / 0 then the coordinates in I 0 (t) behave like those in I − (t), they are 0. The coordinates of the optimal control in the index set I + (t) are not completely determined by (iii). They are either active, or zero and thus they join the set of sparse control coordinates. Comparing to the case p = q which was treated in [21,Proposition 5.2], the case (iii) is such that the control is necessarily active. Thus p < q enhances additional sparsity compared to p = q. Finally, as a consequence of the box constraints, the optimal control is of bang-off-bang type, except for case (ii) with q = 1.
Proof. We shall use that by Lemma 4.1 we know thatū(t) minimizes G t in U ∞ for a.e. t ∈ (0, ∞). For convenience of notations, let us set In Step 1 below we verify (i) and (ii). The claims in (iii) are proved in Step 2.
Otherwise ifũ = (0, . . . , 0) and u =ũ, We then deduce thatū The proof for the case when ϕ t,i ≤ 0 for i = 1, . . . , m is thus concluded. The other cases when ϕ t,i have different signs can be treated analogously. Now we proceed to look at the case q = p. In this situation, G 2 ≡ 0 and G ≡ G 1 . The minimizers of G 1 have been analyzed in the previous arguments, and we therefore arrive at the conclusion.
Step 2: proof of (iii). We turn to analyze the behavior of the coordinates with indices in I + (t). In particular in this case I + (t) = / 0, and consequently by (i) and (ii) Therefore, where, to simplify notation, we set for τ = 1, . . . , Following the definition of I + (t), we have Letw be the minimizer and let us start by considering the case ψ τ < 0, for all τ = 1, . . . , .
For the other cases where ψ τ is positive for some τ ∈ {1, . . . , }, ψ τ and w τ can be replaced by −ψ τ and −w τ in (4.11). Then by following the same arguments as in the previously we can obtain that −w τ ∈ {0, 1}. Therefore we conclude that w τ ∈ {0, − sgn(ψ τ )}, for τ = 1, . . . , , with the additional information that |w 1 | = 1. The definition of w τ and ψ τ in (4.12) implies that with the additional information that max i∈I + |ū i | ρ i = 0. This completes the proof of (iii). In Theorem 4.2 the study has been made for the case of box constraints. Next we briefly consider the problem under Euclidean norm constraints. In this case, due to the coupling of the coordinates which is inherent to the Euclidean norm, it appears to be more complicated to achieve explicit information on the structure of the minimizers compared to that which was obtained for box constraints.
We define for ρ > 0 Theorem 4.3. Letū be an optimal control for problem (2.4) with U given in (4.17), letȳ be the associated optimal trajectory, and ϕ its associated adjoint state. Let I − (t), I 0 (t) and I + (t) be as defined in Theorem 4.2. If for some t ∈]0, T [ the cardinality of I + (t) is less or equal to 1, then (i), (ii), and (iii) of that theorem remain valid. Otherwise we have Proof.
Step 1. From (4.8) we know that for t ∈]0, ∞[ a.e.,ū(t) is the minimizer of the following function where α i (t) = f i (ȳ(t)), ϕ(t) . At first we note that U 2 is a subset of U ∞ , if ρ i = ρ for all i, and hence min u∈U ∞Ḡ t (u) ≤ min u∈U 2Ḡ t (u). Moreover, if a minimizer ofḠ t over U ∞ is contained in U 2 , then this minimizer is also a minimizer ofḠ t over U 2 . Following this observation, letū(t) be a minimizer ofḠ t over U ∞ with cardinality of I + (t) ≤ 1. Then by Theorem 4.2 all components ofū(t) are 0 except for at most one. In case the cardinality of I + (t) equals one, then there is one non-trivial coordinate of the control at time t whose norm then equals ρ.
Step 2. Now we turn to the general case (assuming that I + is nonempty) and prove that the optimal control is necessarily active. Since I + (t) is non-empty there exists at least one index τ such that γ t − |α τ (t)|ρ 1−q < 0. Setting the value of this coordinate equal to ρ we obtain which implies that at least one coordinate ofū is nontrivial and G(ū(t)) < 0. Let˜ denote the number of nontrivial coordinates ofū and without loss of generality assume that these are the˜ first ones ofū(t).
If some of the coordinates of α are such that α i (t) ≥ 0, then necessarilyū i (t) ≤ 0 and, adapting Ω accordingly, it can again be verified that ∑ l 1ū 2 5 Sparsity and switching properties: the time-discretized problem In this subsection we consider the following linear dynamical system: for x ∈ R d , where A ∈ R d×d and B ∈ R d×m . Let us recall the optimal control problem: given x ∈ R d , consider The cost functional is recalled as follows: To investigate the optimality conditions satisfied by the optimal controllers, we introduce firstly the adjoint equation associated to (y, u) satisfying (5.19): Here ϕ is called the adjoint state of y. Since the controls in U ∆ are piecewise constant functions, we consider at first the optimal control on each time interval I k , k ∈ N.
Proposition 5.1. Letũ ∈ U ∆ satisfy the following: for any k ∈ N,ũ(·) ≡ũ k in I k and u k ∈ arg min whereỹ is the corresponding trajectory andφ is the adjoint state associated to (ỹ,ũ). Further for any arbitrary ω ∈ R m such that ω +ũ k ∈ U, we define the perturbed control Then it holds that J ∆ (x, u ω ) ≥ J ∆ (x,ũ).
Assumption (5.21) implies that By the definition of u ω , (5.22) is equivalent to For almost all t > 0 we obtain, Note that lim t→∞φ (t) = 0 and y ω (0) −ỹ(0) = 0, and therefore Consequently we obtain To compute the left-hand side of (5.24), we have for every t > 0 y ω (t) − y d Then we deduce that which ends the proof.
Proposition 5.1 provides the way to construct optimal controls on each I k , and this procedure can be naturally extended to construct globally optimal controls. Theorem 5.2. Letū ∈ U ∆ satisfy the following: for any k ∈ N and t ∈ I k , u(t) ∈ arg min u=(u 1 ,...,u m )∈U I k φ(t), Bu dt + γb k u q p , (5.26) whereȳ is the corresponding trajectory andφ is the adjoint state associated to (ȳ,ū). Thenū ∈ U ∆ is a minimizer of problem (3.5), i.e.
Therefore, u n →ū pointwise in [0, ∞[. Let y n be the trajectory associated with u n . By the same argument as in Theorem 3.2, we deduce that Assumption (5.26) and Proposition 5.1 imply that By Fatou's Lemma, and we conclude thatū is a minimizer of problem (3.5).
Based on the optimality conditions (5.26), similar results on sparsity and switching properties as Theorem 4.2 can be deduced by the same arguments as in the proof of Theorem 4.2.
Theorem 5.3. Following the same assumptions and notations in Theorem 5.2, we set For each k ∈ N, we define the following index sets: The following properties hold: (i) For k ∈ N, t ∈ I k and i ∈ I − k ,ū i (t) = 0.

Numerical experiments
In this section we present numerical experiments for the computation of optimal control laws for the problem constrained to the nonlinear dynamical system For the realization of globally optimal control laws we proceed as in [21], i.e. by following a dynamic programming approach. The value function V (x) := inf J(x, u) associated to this infinite horizon optimal control problem satisfies the following first order Hamilton-Jacobi-Bellman equation which leads to the optimal feedback map The solution of the Hamilton-Jacobi-Bellman equation and of the optimal feedback mapping are numerically approximated by a first-order semi-Lagrangian scheme with policy iteration as discussed in [2]. The well-posedness of this numerical scheme is guaranteed under boundedness and continuity assumptions for the dynamics f (x, u) and the cost. Convergence of controls, however, is only guaranteed for convex running costs. Nevertheless, the results we report indicate that the semi-Lagrangian scheme converges to optimal controls exhibiting the expected sparsity and switching properties. This scheme has also been applied to the solution of sparse optimal feedback control problems in [1,13]. In the case p = q = 1 the minimization operation in (6.27) can be realized by means of semismooth Newton methods as [20].
For different values of p and q, the minimizer is chosen by discretizing the control set U ∞ into a finite number of values and making a pointwise evaluation of the Hamiltonian.

Eikonal dynamics
We begin by considering eikonal-type dynamics for planar motion of the forṁ where |u i (s)| ≤ 0.5 for i = 1, 2. The state space is set to be Ω = [−1, 1] 2 , the discount factor λ = 0.2, and γ = 1. The goal is to drive the state to the origin, and therefore y d = (0, 0). The optimal control fields in the state space for different p, q values are shown in Figure 2. We observe the following: a) The case p = q = 1 has been already reported in [21]. There exists a switching band of width γλ , where the optimal control points unidirectionally towards the origin, andū = 0 for u ∞ ≤ γλ .
b,c) Departing from p = q = 1 and reducing the value of p, a switching region with only one active control component arises. It increases as the ratio q/p increases. Note that for q = 1, the region whereū = 0 remains unchanged.
d) The switching and the sparsity regions are larger for p = q = 0.2 than for p = q = 1. Only in the particular case ρ = 1 these regions would remain the same. Figure 2: Eikonal dynamics, optimal control fields for different control penalizations u q p .
e,f) Increasing the q/p ratio by departing from smaller values of q generates a larger switching region, leading to a fully switching controller for a ratio of q/p sufficiently large. Note that increasing q/p for q = 1 also leads to a decrease of the sparsity region.

Nonlinear dynamics of a double-well potential
We now address the synthesis of optimal controllers for nonlinear dynamics. We consider a system corresponding to a single one-dimensional particle moving in a double-well potential, subject to a controlled damping, and a direct external forcing viȧ In the absence of control action (u 1 = u 2 = 0), the damped particle has two stable equilibrium positions, namely x = ±1, v = 0 (we drop the state-space notation (x 1 , x 2 ) for (x, v)), with their corresponding basins of attraction. Here our goal is to steer the particle to the equilibrium y d = (1, 0). We consider a set of initial conditions in Ω = [−2, 2] 2 , and set γ = 0.1, ρ = 1, and λ = 0.01. Optimal controls are shown in Figure 3. We observe: a,b,c) By reducing the value of p with q = 1, the region where the control u 1 is active decreases.
d,e,f) Reducing q does not affect the sparsity pattern of u 2 . The linear control action via u 2 is more relevant for the stabilization goal than the bilinear control term u 1 v. As expected it becomes insignificant as v becomes small.
g,h,i) Overall, the reduction of p has a significant effect on the increase of the switching region. In order to investigate a setting with a richer interplay between the control variables and the switching structure, we consider a modified version of the double-well control system given bẏ where u 2 enters now in a bilinear fashion. The optimal controllers are significantly different compared to the previous setting, as shown in Figure 4. We note that: a,b,c) The sparsity region of u 1 increases as the ratio q/p increases.
d,e, f) The sparsity region of u 2 also increases as q/p increases.
g,h,i) Overall, the switching pattern of the two control variables becomes dominant as the ratio q/p becomes large. Only a reduced region of the state space requires the simultaneous action of two control variables. Concluding remarks. In this paper we have studied infinite horizon optimal control problems with a control cost of the form u q p , where 0 < p ≤ q ≤ 1, leading to a non-convex, non-smooth optimization problem. From the analysis of the associated optimality conditions, we have shown that such control penalizations induce not only sparsity, but also a switching structure in the optimal control field. The switching pattern is determined by the different parameters of the control problem, but most notably, by the value of q and the ratio q/p. By means of dynamic programming techniques, we have shown numerically that, for an increased q/p ratio the optimal control has a dominant switching pattern, tending to minimize a counting · 0 measure over an enlarged region of the state space. We believe that an important direction for future research is a thorough study of the interplay between the underlying dynamical structure of the control system and the switching pattern. More concretely, it would be desirable to know whether the sparse/switching control does benefit from the basin of attraction of a given equilibrium point, or whether the inclusion of · q p norms could lead to minimum time-type controllers.