TENSOR DECOMPOSITION METHODS FOR HIGH-DIMENSIONAL HAMILTON-JACOBI-BELLMAN EQUATIONS

. A tensor decomposition approach for the solution of high-dimensional, fully nonlinear Hamilton-Jacobi-Bellman equations arising in optimal feedback control of nonlinear dynamics is presented. The method combines a tensor train approximation for the value function together with a Newton-like iterative method for the solution of the resulting nonlinear system. The tensor approximation leads to a polynomial scaling with respect to the dimension, partially circumventing the curse of dimensionality. A convergence analysis for the linear-quadratic optimal control problem is presented. For nonlinear dynamics, the eﬀectiveness of the high-dimensional control synthesis method is assessed in the optimal feedback stabilization of the Allen-Cahn and Fokker-Planck equations with a hundred of variables.

1. Introduction.Richard Bellman first coined the expression "curse of dimensionality" when referring to the overwhelming computational complexity associated to the solution of multi-stage decision processes through dynamic programming, what is nowadays known as Bellman's equation.More than 60 years down the road, the curse of dimensionality has become ubiquitous in different fields such as numerical analysis, compressed sensing and statistical machine learning.However, it is in the computation of optimal feedback policies for the control of dynamical systems where its meaning continues to be most evident.Here, the curse of dimensionality arises since the synthesis of optimal feedback laws by dynamic programming techniques demands the solution of a Hamilton-Jacobi-Bellman (HJB) fully nonlinear Partial Differential Equation (PDE) cast over the state space of the dynamics.This intrinsic relation between the dimensions of the state space of the control system and the domain of the HJB PDE generates computational challenges of formidable complexity even for relatively simple dynamical systems 1 .Much of the research in control revolves around circumventing this barrier through different trade-offs between dimensionality, performance, and optimality of the control design.Prominent examples of the research landscape shaped by the curse of dimensionality include model order reduction, model predictive control, suboptimal feedback design, reinforcement learning and distributed control.However, the effective computational solution of dynamic programming equations of arbitrarily high dimensions through deterministic methods remains an open quest with fundamental implications in optimal control design.
In this paper, we present a computational approach based on tensor decomposition techniques for the solution of high-dimensional HJB PDEs arising in optimal feedback control of systems governed by partial differential equations.We show that for evolution equations arising from the semi-discretization of PDEs, our technique scales at a rate which is at most polynomial (of largest observed order 4) in the dimension.This scaling allows the computation of accurate feedback laws for nonlinear dynamics with over 100 dimensions.We assess our design over a class of challenging problems, including the optimal feedback stabilization of nonlinear parabolic PDEs such as the Allen-Cahn and Fokker-Planck equations in one and two-spatial dimensions, and in the presence of control constraints.
1.1.The Numerical Approximation of HJB PDEs.Since the seminal work by Crandall and Lions [18], the approximation of HJB equations through computational PDE methods has been addressed by a range of discretization strategies, most notably finite differences, level-set methods and semi-Lagrangian schemes [24].The aforementioned techniques have proven to overcome the difficulties associated to the fully nonlinear character of the HJB PDEs [49].However, they are inherently gridbased schemes suffering from the curse of dimensionality.That is, for a multidimensional ansatz defined from a tensor product of 1d objects, the scaling of the total number of degrees of freedom of the discretization grows exponentially with respect to the dimension of the HJB PDE.In practice, this makes the problem computationally intractable for dimensions larger than 4.This is a fundamental limitation in the context of nonlinear optimal feedback control, where the dimension of the associated HJB PDE is determined by the dimension of the state space of the control system.A partial remedy to this difficulty is the coupling of grid-based discretizations for low-dimensional HJB PDEs with model reduction techniques to lower the dimension of the control system [43,5].This approach has been successfully applied to dynamics with strong dissipative properties, but its overall performance relies on a good state space sampling and deteriorates for dynamics including transport, delays, or highly nonlinear phenomena [37].While the rigorous design of numerical methods for the solution of very high-dimensional HJB PDEs remains largely an open problem, encouraging results have been obtained over the last years.A non-exhaustive list includes the use of machine learning techniques [59,22,31,35], approximate dynamic programming in the context of reinforcement learning [11,56], causality-free methods and convex optimization [40,17], max-plus algebra methods [47,1], polynomial approximation [39,38], tree structure algorithms [4], and sparse grids [14,25].A very recent stream of works [22,59,35,55,36] has explored the use of machine learning techniques to approximate high-dimensional nonlinear PDEs.The work [59] proposes the so-called Deep Galerkin Method, combining a deep neural network ansatz for the solution together with a PDE residual minimization.In [22,31,35], the authors focus on the class of time-dependent HJB equations arising in stochastic control where a pointwise evaluation of the solution can be realized through a representation formula involving the solution of a backward stochastic differential equation.These latter approaches can be linked to causality-free methods, with deterministic counterparts explored in [40,48,17,66].All the aforementioned methods have been assessed in high-dimensional settings and have relevant features which make them appealing for specific instances of the feedback design problem.However, to the best of our knowledge, the methodology proposed in this work is the first PDE-based computational method to go beyond 100 dimensions for the stationary HJB PDE associated to deterministic infinite horizon nonlinear optimal control.While a benchmarking against data-driven methods goes beyond the scope of this paper, we compare our method with the sub-optimal feedback law obtained from solving a linearised control problem, which is a technique widely used in the control literature and which scales well for high-dimensional settings.
1.2.A tensor calculus framework for nonlinear HJB.In this work we propose a numerical method for the solution of HJB equations based on tensor decomposition techniques [12,42,30], which have proven to be successful in tackling the curse of dimensionality in the context of numerical analysis of PDEs [41,19].The use of low-rank structures such as the tensor-train (TT) format [50] to represent high-dimensional objects allows the solution of linear high-dimensional problems by generalizing standard numerical linear algebra techniques to multi-index arrays of coefficients (tensors) and the multivariate functions they approximate.This approach has been recently explored in [34] for solving a class of finite-horizon stochastic control problems where the associated time-dependent HJB PDE can be transformed into a linear equation [62].A fixed-point iteration algorithm using tensor approximations with a Markov chain discretization was proposed in [27,6].A model-free learning of a TT representation of the value function from Monte Carlo realisations was proposed in [53].However, all Monte-Carlo-based methods may suffer from slow convergence.Here, we extend the tensor calculus framework to approximate the solution of fully nonlinear, first-order, stationary HJB PDEs arising from deterministic infinite horizon control, using a fast deterministic Newton-type policy iteration and a spectral discretization that ensures a fast algebraic (in some examples nearly exponential) convergence, given a high regularity of value functions arising in the considered class of feedback control problems for semi-linear PDEs.Furthermore, through the selection of suitable control penalties [45], our method allows the inclusion of control constraints in the design.

Contributions of this work.
We develop a computational approach based on tensor decompositions for the solution of Hamilton-Jacobi-Bellman PDEs arising in the computation of optimal actions in feedback form.Our method scales at most polynomially with the state space dimension for a wide class of systems governed by high-dimensional nonlinear evolution equations, including dynamics originating from the Allen-Cahn and Fokker-Planck equations.The mitigation of the curse of dimensionality allows the accurate synthesis of optimal feedback maps for nonlinear dynamics with over 100 dimensions at a moderate computational cost.The method can effectively incorporate control constraints in the design.
The rest of the paper is structured as follows.In Section 2, we formulate the design problem of computing optimal feedback controllers for nonlinear dynamics via the HJB equation, together with an abstract iterative method for its approximation.In Section 3 we develop all the building blocks underpinning a tensor decomposition framework for the solution of high-dimensional HJB equations.Finally, in Section 4 we apply the proposed methodology to the computation of optimal feedback laws for the Allen-Cahn and Fokker-Planck equations.
2. The HJB PDE in optimal feedback control.We study the following infinite horizon optimal control problem: subject to the nonlinear dynamical constraint where the state y(t) = (y 1 (t), . . ., y d (t)) ∈ R d , the control u ∈ U ≡ L ∞ ([0; +∞[; U ), with U a compact set of R, the running cost (y) : R d → R + 0 , and the control penalization γ > 0. We assume the state cost (y) and the system dynamics f (y) : R d → R d and g(y) : R d → R d to be C 1 (R d ).Without loss of generality, the origin y = 0 is an equilibrium of the uncontrolled dynamics and (0) = 0.The control problem (2.1) corresponds to the design of a globally asymptotically stabilizing control signal u(t), which can be solved by dynamic programming techniques.Defining the value function we characterize the solution of the infinite horizon control problem as the unique viscosity solution of the HJB PDE where In the unconstrained case U ≡ R, the minimizer u * is expressed in a feedback form which after inserting in (2.4) leads to the HJB equation This derivation can be extended to controls in R m with m > 1.Moreover, it can be modified to enforce box constraints in the control action [45], by replacing the control penalty γ|u| 2 by where P : R → R is an odd, bounded, integrable, bijective C 1 function.The optimal feedback is given by where we can impose lower and upper bound constraints by choosing penalties of the type P(x) = u max • tanh(x/u max ).Note that such a choice for the control penalty generates a signal which effectively remains inside [−u max , u max ], however, it differs from the optimal signal that would be obtained by casting a hard control constraint and replacing (2.8) by the projection (2.9) 2.1.An iterative approach for solving nonlinear HJB PDEs.The construction of a numerical scheme for (2.4) begins by dealing with the quadratic nonlinearity in the gradient.We apply the Continuous Policy Iteration developed in [8], a variant of the well-known policy iteration algorithm in dynamic programming [9,54,3].Conceptually speaking, given an initial guess u 0 (x) for the optimal feedback control, we insert it into (2.4) which then becomes a linear PDE for V (x), whose solution dictates the update of the feedback control via (2.5).Algorithm 2.1 summarizes the main steps of the procedure.The algorithm is equivalent to the application of a Newton-type method for V (x) directly over (2.6).To guarantee the convergence of the policy iteration when solving (2.4) over a subdomain Ω ⊂ R d , we require an initial feedback map u 0 (x) such that J (u 0 (x), x) < ∞ over Ω.

4:
Set error = V s − V s−1 , s := s + 1. 5: end while 6: return (V s , u s ) ≈ (V * , u * ) 3. A tensor decomposition framework for high-dimensional HJB equations.We employ the Galerkin spectral element approximation similarly to [39] except that we construct the basis functions from the Legendre polynomials of bounded maximal individual degree n − 1, where φ i k (x k ) are the univariate Legendre polynomials of degree i k ≤ n − 1, and the multi-indices i = (i 1 , . . ., i d ), j = (j 1 , . . ., j d ).In the sth iteration of Alg.2.1, we seek the value function in the form by making the Galerkin residual of the linearized HJB orthogonal to {Φ i }.This requires solving a system of n d Galerkin equations in n d unknowns v(j), where •, • is an inner product in L 2 [−a, a] d with an appropriately chosen domain size a > 0. Given the tensor product structure of (3.2), its accuracy can be analyzed with univariate polynomial approximation theory [63], with an exponential error decay rate O(n −p ) for V (x) ∈ C p (Ω) (e.g.C ∞ (R d ), see Remark 3.4 below).
3.1.Compressed Tensor Train representation.The coefficients in (3.2) are enumerated by d independent indices, so v can be treated as a d-dimensional tensor.
Throughout the paper, we approximate such tensors by the so-called Tensor Train (TT) decomposition [50], v (1)  α0,α1 (i 1 )v (2)  α1,α2 The smaller (3-dimensional) tensors v (k) on the right hand side are called TT blocks, and the new summation ranges r 0 , . . ., r d are called TT ranks.Notice that when r 0 = r 1 = • • • = r d = 1 the complete separation of variables is attained, as the tensor becomes a product of univariate factors.In general we can consider r 1 , . . ., r d−1 to be larger than 1.In case of two variables, the TT decomposition becomes just a dyadic factorisation of a low-rank matrix.The optimal low-rank matrix approximation can be computed using the singular value decomposition (SVD), and a general TT decomposition can be computed for any tensor by d − 1 SVDs with quasi-optimal TT ranks for the given accuracy ε [50].For convenience we can introduce the maximal TT rank r := max k=0,...,d r k .Counting the number of unknowns in the TT blocks in (3.4), one can conclude that the TT decomposition needs at most dnr 2 unknowns.For numerical efficiency, we assume that r can be taken much smaller than the original cardinality n d for a desired approximation accuracy.This assumption is fulfilled for many relevant cases, as we show next.
For theoretical analysis, it is convenient to combine (3.4) with (3.2) and to consider the functional TT format [51], to work directly over the TT ranks of a function.Sharp TT rank bounds are usually hard to derive though: the SVD might reveal an optimal decomposition that is difficult to express analytically.It was proven for certain (such as rational) classes of functions [64,58,28] and extensively tested numerically in wider scenarios that smooth (e.g.analytic) functions exhibit a logarithmic growth of TT ranks, r ∼ log p (1/ε), to achieve an error ε.As a rationale for using the TT format for the HJB equations, here we show that quadratic value functions admit TT approximations with a similar polylogarithmic convergence rate.Theorem 3.1.Assume that (y) = 1 2 y T Qy, where Q is a symmetric positive definite matrix with the Cholesky decomposition Q = D D, and that the linear system ẏ(t) = Ay(t) + Bu is stabilisable.There exists a solution Π ∈ R d×d of the Riccati equation and that rank(B) ≤ r b .Then for any ε ∈ (0, 1) the value function V (x) = x Πx admits a TT approximation V (x) with the TT ranks and the error For the upper bound we employ [20, Thm 4.2]: for the second order polynomial V (x) = x Πx with a symmetric matrix Π there exists an exact TT decomposition with the TT ranks governed by the ranks of the off-diagonal blocks of Π, This gives an obvious bound min(k, d − k) + 2. However, there might exist an approximate TT decomposition of lower TT ranks.First, we notice that the Riccati equation can be rewritten as a Lyapunov equation.In fact, using a stabilised matrix [61] A − 1 γ BB Π, and also the Cholesky factor of Q, we obtain where I is a d × d identity matrix.The left hand side is constructed from the stable matrix A π .Using the Kronecker product, we can write the Lyapunov equation as a large linear system, where vec(•) stacks all columns of a matrix into a vector.Now we can use [28, Thm.9]: there exists an approximate inverse A −1 of the Kronecker product form with the approximation error where C > 0 is a constant independent of other parameters in (3.7).Multiplying the approximation A −1 with the low-rank first term in the right hand side of (3.6), we obtain an incomplete solution of the form vec( Π) = (2R+1)r b i=1 p i ⊗ q i , and hence the rank of Π is bounded by (2R + 1)r b .The negative identity matrix in (3.6) yields the second term in the ultimate approximate solution Π = Π + Π.Since the first term Π is a low-rank matrix, all its off-diagonal blocks (3.5) have low ranks of at most (2R + 1)r b too.However, Π is a full-rank matrix and we need to investigate its off-diagonal, also called quasi-separable [23], ranks directly.First, we recall [28,Lemma 16] that each matrix exponential in (3.8) can be approximated by a sum of 2k e + 1 resolvents with an error (3.9) Since the quasi-separable rank of A z := z I + 2tj λmax (A π + A π ) coincides with that of A π , which is M + r b , and on the other hand it coincides with the quasi-separable rank of the inverse matrix A −1 z [23,29], we can conclude that the approximate quasiseparable rank of exp − 2tj λmax (A π + A π ) is bounded by (2k e + 1)(M + r b ), and the quasi-separable rank of Π (3.8) is bounded by (2R + 1)(2k e + 1)(M + r b ).
The approximate value function is constructed as V (x) = x Πx, and by (3.5) we can estimate its TT rank as (3.10) For the error estimate, we have From (3.7) we obtain for some constant Ĉ > 0, while (3.9) together with [28, Lemma 5] gives for Č > 0 being some other constant.Plugging these bounds into (3.10),we obtain the first estimate of the TT rank.
Remark 3.2.In many cases one can take Q, and hence D, to be diagonal matrices, for example, if the 2-norm of the state vector (corresponding to the L 2norm of the state function) is used in the cost functional.In this case the ranks of the off-diagonal blocks of A D coincide with those of A.
Remark 3.3.Although Thm. 3.1 is formulated only for linear systems, the remarkable proportionality between the TT ranks of the value function and the offdiagonal ranks of the linearised system matrix seems to hold for some semi-linear systems as well.In particular, we observe from Fig. 4.2 that the TT ranks of the value function for discretised one-dimensional PDEs grow very mildly with the number of variables, which can be also attributed to the growth of the ratios λ min /λ max , µ/λ max .However, when the system is produced from a two-dimensional PDE, the TT ranks grow proportionally to the number of degrees of freedom introduced in each spatial direction (see Fig. 4.7).Thus, the ranks of the actuator matrix and of the offdiagonal blocks of the Jacobian matrix can give a useful hint whether the TT approach might be efficient for the HJB equation of a particular dynamical system of interest.
Remark 3.4.Alternatively, fast convergence of the TT approximation can be related to the smoothness of the original function [64,58].For example, it was verified in [15] that the cost functional for the bilinear optimal control problem for the Fokker-Planck equation we present in our numerical results belongs to C ∞ in a neighborhood of the origin.In [44] sufficient conditions for C 1 -regularity of the value function on bounded sets are given.
3.2.TT decomposition of the system functions and HJB equation.To take advantage of the TT decomposition of the value function, it is necessary to find also compatible representations of the stiffness matrix and right hand side of the Galerkin HJB equations (3.3).For example, matrices of size n d × n d , such as that in the left hand side of (3.3), can be represented in a slightly different matrix TT format, where we separate pairs of row and column indices, For complexity estimates we define also the upper bound R := max k=0,...,d R k .In particular, we need to construct linear parts of the stiffness matrix, where f p (x) is the p-th component of the drift f (x) = (f 1 (x), . . ., f d (x)), p = 1, . . ., d.
The integral in (3.12) can be approximated by a tensorised Gauss-Legendre quadrature, e.g.
where x k , w k , k = 1, . . ., m, are quadrature nodes on (−a, a) and weights, respectively.For efficiency we always take m = O(n), e.g.m = 2n.Suppose that a TT decomposition of f p (x k1 , . . ., x k d ) is given, Plugging (3.13) into (3.12) and distributing the summations, we can compute directly the TT blocks of a TT decomposition of the stiffness matrix part, where and for q = p.Summing all different components A fp over p = 1, . . ., d, one obtains the complete linear part Φ i , f DΦ j L 2 (Ω) .This summation can be performed in the TT format directly, followed by a TT rank truncation [50], or using the iterative Alternating Linear Scheme approximation (see Sec. 3.3 and [33]).In our numerical calculations we use the latter approach which requires O(d 2 n 2 R 2 r 2 ) floating point operations.Similarly we can compute the right hand side entries b(i) = −Φ i (x), (x)+γu s (x) 2 L 2 (Ω) , as well as nonlinear parts of the stiffness matrix, provided that tensors of nodal values g(x k1 , . . ., x k d ) and u s (x k1 , . . ., x k d ) are approximated by TT decompositions, similarly to (3.13).
The system functions are approximated in the TT format (3.13) using another iterative procedure, the so-called TT-Cross algorithm [52].Here we only recall the main idea of the TT-Cross method to illustrate the key operations needed.Any exact TT decomposition, e.g. in the form (3.13), can be recovered from samples of the original tensor by an interpolation formula [57] ) where are left, respectively, right index sets chosen such that the intersection matrices f p (I ≤k , I >k ) are invertible.For uniformity of notation, we let I ≤0 = I >d = ∅.Note that the inverse intersection matrices can be multiplied with the adjacent three-dimensional factors to obtain the TT blocks of (3.13), e.g.
For numerical stability, the inversion is computed via the QR decomposition [52] or the incremental LU decomposition [57].
In practice, one seeks an approximate decomposition of the form (3.16).In this case it becomes important to find indices that not only give invertible intersection matrices, but deliver a small approximation error.The TT-Cross algorithm optimises the index positions iteratively.In the first step, assume that the right sets I >k are given (for example at random).One can compute the first factor F {1} (i 1 , β 1 ) = f p (i 1 , I >1 β1 ), which can be seen as an m×R 1 matrix.The smallest approximation error among sampling rank-R 1 approximations is given by such i 1 ∈ I ≤1 ⊂ {1, . . ., m} that select the submatrix of maximum volume, i.e. | det F {1} (I ≤1 , : This set can be found by the maxvol algorithm [26] in O(mR 2  1 ) operations, similarly to the LU decomposition with pivoting.
Assume now that we have the left index set I ≤k−1 and the right set I >k .We can compute R k−1 mR k elements of the tensor and arrange them as a ).Now we can apply the maxvol algorithm to F {k} to derive the next index set I ≤k as a subset of the union of I ≤k−1 and i k .This recursive procedure continues until the last TT block f p (I ≤d−1 , i d ) is computed.Moreover, we can reverse it in a similar fashion and carry out several TT-Cross iterations, as shown in Algorithm 3.1.This allows to optimise all index sets and, consequently, the approximations (3.16), (3.13) even if the initial guess was inaccurate.
Algorithm 3.1 TT-Cross [52] iteration for the TT approximation (3.13) Apply maxvol algorithm to F {k} to obtain Apply maxvol algorithm to (F {k} ) to obtain Remark 3.5.The result of TT-Cross might still depend on the heuristically chosen initial indices.Therefore, we distinguish the stopping threshold (called δ from now on) and the actual approximation error ε in the rest of the paper.
Having computed TT approximations to all components f p , we construct the matrix TT blocks (3.14)- (3.15).Similarly, we apply the TT-Cross algorithm to construct all components of gu s , assemble the corresponding parts of the stiffness matrix (3.3) in the TT format, and sum them together.
Now the control signal u ≈ Bv can be constructed simply as a sum of products of a TT matrix (3.11) and a TT tensor (3.4), again with O(d 2 n 2 R 2 r 2 ) complexity [50].
In the constrained control case, the first step is the same, followed by approximating the pointwise constraint function u(j 1 , . . ., j d ) = ũmax tanh ( Bv)(j 1 , . . ., j d )/ũ max (3.18) in the TT format using again the TT-Cross method.Since the TT approximation may over-or undershoot the exact limits by the relative approximation error ε ≤ δ, we seek a slightly tighter bound ũmax = (1 − δ)u max .

3.3.
A shifted iterative tensor algorithm for solving (3.3).The policy update solves (3.3) at every iteration by taking the previous iterate of the value tensor v, constructing the control signal, the stiffness matrix and the right hand side, and finally by solving the linear system on the new value tensor approximation.The latter step implies using only iterative methods that can preserve the TT structure of all data.One of the most robust techniques used nowadays is the Alternating Linear Scheme (ALS) [33] and its enhanced version, the Alternating Minimal Energy (AMEn) algorithm [21].In this section we recall the main idea of those, and also propose a shifted version to make the AMEn method applicable to the stationary HJB PDE.
The ALS is a linear projection method similarly to the Krylov techniques, but in contrast to the latter it projects the equations onto bases constructed from the TT decomposition of the solution itself.Notice that the TT decomposition (3.4) is linear with respect the the elements of each particular TT block, e.g.v (k) .Given (3.4), let us define a partial TT decomposition where v (k) is replaced by the identity matrix Clearly, the original TT decomposition (3.4) can be produced from V =k by just multiplying it with the k-th TT block.Specifically, we introduce the vector form and treat V =k as a n d × (r k−1 nr k ) matrix.One can check that ṽ = V =k v(k) holds for any k = 1, . . ., d.Now, assuming that the entire stiffness matrix and the right hand side in (3.3) are assembled in TT formats (3.11) and (3.4), the ALS method iterates over k = 1, . . ., d (hence the name alternating), seeking for one TT block at a time by making the residual orthogonal to V =k , ) operations [33].Here we assume also that R = O(r), which is the case for the quadratic HJB equation.Since the reduced matrix in (3.21) inherits the product structure of the matrix TT decomposition (3.11), we can solve (3.21) iteratively using a fast matrix-vector product with the same cost of O(dn 2 r 4 ).
In the AMEn method [21], the TT blocks are also enriched with auxiliary vectors, such as approximate residuals, which gives a mechanism for increasing TT ranks and adapting them to the desired accuracy.Consider an auxiliary TT decomposition approximating the residual, projected onto the first k − 1 TT blocks of the solution, After solving (3.21) and updating (3.20), we expand v (k) and v (k+1) increasing their sizes (the corresponding TT ranks) by ρ k , Note that this step does not perturb the whole solution tensor ṽ due to the zero block in v (k+1) , but the subspace of columns of V =k+1 in the next step is enriched due to the residual block z (k) .To reduce the TT rank, we can truncate v (k) using the SVD.Finally, we can orthogonalise TT blocks using QR decompositions [50] such that This makes the projection matrix V =k orthogonal as well, which improves numerical stability of the algorithm.If the matrix A was symmetric positive definite then the projected system (3.21) could be rigorously related to the energy optimization problem and the nonlinear block Gauss-Seidel method.In our problem (3.3) this is not the case: A is nonsymmetric and degenerate due to the gradient operator D, which annihilates any constant component in the solution.In this case, the degenerate reduced matrix in (3.21) can prevent convergence.
However, A is compatible with the right hand side under the condition (0) = u s (0) = 0.Moreover, the eigenvalues of A are located in the right half of the complex plane for a suitable choice of the domain size a and polynomial order n.Here we propose a modification of AMEn to resolve both issues by solving shifted systems, mimicking the implicit Euler time propagation.We introduce a shift µ > 0, and solve Algorithm 3.2 Policy update with shifted AMEn linear solver 1: Choose initial value tensor v, shift µ > 0, stopping threshold δ > 0, previous iterate v = 0, shift reduction factor 0 < q < 1.
Policy iteration

8:
Assemble and solve

9:
Update v (k) via (3.20) and truncate r k using SVD up to accuracy δ. end for 12: end while where v is the previous iterate of v.In practice, we can even combine this shifted AMEn solver and the policy updates into a single iteration as shown in Alg.3.2.
If Re λ(A) ≥ 0, then the spectral radius of the transition matrix µ(A + µI) −1 is less than 1 for any µ > 0. On the other hand, if µ > −v Av for any v : v 2 = 1, then the reduced matrix V =k AV =k + µI (remember that V =k V =k = I) is invertible.This gives a freedom to choose µ such that the method remains stable and converges fast enough.In practice we need to ensure µ > −v Av only for those v that belong to span V =k .It turns out that as the solution converges, we can decrease µ geometrically (in particular, we multiply it by a factor q = 0.98 in each iteration), which ensures faster convergence near the end of the process.

4.
Optimal feedback control of nonlinear PDEs.High-dimensional nonlinear control systems naturally arise when the dynamics of the system are governed by partial differential equations.From a dynamical perspective, PDEs correspond to abstract, infinite-dimensional systems and therefore the HJB synthesis is understood over an infinite-dimensional state space.Computationally, the treatment is based on the so-called method of lines [65].Given an evolutionary PDE, we discretize the space dependence either by finite differences/elements or spectral methods, leading to a large-scale dynamical system with as many state variables as the space discretization dictates.We perform the HJB synthesis over this finite but high-dimensional system, as summarized in Fig. 4.1.The accuracy of such a representation and its implications over the control design vary depending on the class of PDEs under consideration.Strongly dissipative PDEs can be accurately represented with few degrees of freedom in space, while convection-dominated PDEs might require a much more complex state space representation.Therefore, the performance study of our framework with respect to the dimension parameter is central to our analysis.It benefits from the fact that, unlike the nonlinear ODE world, the taxonomy of physically meaningful nonlinearities in time-dependent PDEs is well-delimited.We focus on nonlinear reaction PDEs where we take the Allen-Cahn equation as a reference model due to its rich equilibrium structure, and to nonlinear convection in the Fokker-Planck equation, where the control action enters as a bilinear term.The semi-discretization in space of these nonlinearities leads to well-structured finite-dimensional realizations [39] which allow a systematic analysis of the scaling of our methodology with respect to the dimension.
A Matlab implementation of Algorithm 3.2 and the numerical examples benchmarked below are available at https://github.com/dolgov/TT-HJB.All tests were carried out in Matlab 2017b on one core of an Intel Xeon E5-2640 v4 CPU at 2.4 GHz.Flowchart illustrating the proposed approach to optimal feedback control of PDEs.The optimal feedback synthesis for the abstract PDE dynamics leads to an infinite-dimensional HJB equation.By using a pseudospectral discretization in space, we obtain a finite-dimensional approximation of the original PDE dynamics, for which the optimal feedback synthesis is realized by solving a finite-dimensional HJB PDE.At this level, we apply our tensor decomposition-based HJB solver, which ultimately leads to a finite-dimensional optimal feedback law approximating the optimal law for the abstract dynamics.4.1.The Allen-Cahn equation.We consider the following nonlinear diffusionreaction equation [39]:

Infinite
in [−1, 1] × R + with Neumann boundary conditions.We set σ = 0.2, and the scalar control signal u(t) acts through the indicator χ ω of the subdomain ω = [−0.5,0.2].This equation has x = 0 as unstable equilibrium and x = ±1 as stable equilibria.The cost is given by where for concreteness we take γ = 0.1.Thus, the control objective consists in stabilizing the unstable equilibrium.(4.1) is discretized by Chebyshev pseudospectral collocation method [63] using d points ξ k = − cos(πk/(d + 1)), k = 1, . . ., d.The discrete state is collected into a vector of nodal values where where " " is the coordinatewise Hadamard product, A is the pseudospectral differentiation matrix corresponding to the Laplace operator, and B is a vector corresponding to the pseudospectral discretisation of the indicator function χ ω (ξ).The HJB equation solver is applied directly to (4.3), restricting the domain of the value function to (−3, 3) d , sufficient to accommodate typical initial states.We compare our design against the linear-quadratic regulator LQR feedback law [60, Chapter 8] computed for the dynamics in (4.3) linearised around the origin, A L = A + I.
Algorithmic performance for the one-dimensional Allen-Cahn equation.We first investigate the performance of Algorithm 3.2 with respect to the dimension of the dynamical system.We fix n = 5, δ = 10 −3 , and the initial shift in Alg.3.2 µ = 50.In Fig. 4.2 we can observe that the maximal TT rank grows linearly with the dimension.The O(dn 2 r 4 ) complexity of the AMEn method leads to a total cost bound in order of d 5 .However, the effective cost is closer to O(d 4 ) (Fig. 4.2, right), which can be attributed to a non-uniform distribution of TT ranks along the decomposition.This is a significant reduction compared to the exponential cost of the full Cartesian ansatz n d .However, the method can become slow in very high dimensions, mainly due to the increase in the number of policy iterations as shown in Fig. 4.2 (left), resulting from a larger condition number of the linearised system.
A possible remedy is to construct the value function for a lower-dimensional discretisation of the PDE, and interpolate the state of a system with finer discretisation onto this lower-dimensional spatial grid.The resulting error, proportional to the discretisation error of the lower-dimensional grid, might be still much smaller than the error resulting from e.g. the linearisation of the system.x(ξ, 0) = 2 + cos(2πξ) cos(πξ).Fig. 4.3 (right) shows the corresponding control signals.Since the linearized system is unstable, the LQR acts very aggressively during the transient phase.The HJB synthesis is able to detect the stabilizing effect of the nonlinearity and produce a control at much lower cost.We observe differences in the control signals and total costs for the HJB laws depending on the dimension of the dynamical system, justifying the need for accurate high-dimensional HJB solvers.CPU time (min.)

Jn − J8
Now we analyse the performance of the TT-HJB scheme depending on the number of Legendre polynomials n in each variable (Fig. 4.4) and the stopping threshold δ in Alg.3.2 (Fig. 4.5).Due to the nonlinearity in (4.3), the value function is significantly far from a quadratic polynomial, which is reflected in Fig. 4.4 by the linear growth of TT ranks with n, and a relatively slow algebraic decay of the error.Nevertheless, even an order-4 approximation can give a substantially better control signal than the LQR approximation, see Fig. 4.3.From Fig. 4.5 we see that the number of iterations and the TT ranks depend logarithmically on the TT approximation error, which is a more optimistic result than that predicted by Thm.3.1, although the problem is nonlinear.The errors in the total cost start higher than the threshold δ, but eventually the two error indicators are of the same order.
Allen-Cahn problem with control constraints.As discussed in Section 2, the proposed framework allows to enforce control constraints through a suitable choice of the control penalties in (2.8).Allen-Cahn problem with 2-dimensional space.We study the extension of the problem (4.1) to two spatial dimensions with the state depending on two space coordinates, x(ξ, t) = x(ξ 1 , ξ 2 , t).We replace the second derivative with the Laplace operator, and the control is applied on the domain ω = [−0.5,0.2] 2 .We use the Cartesian product of the same Chebyshev grids in each direction, and similar homogeneous Neumann conditions on the boundary of Ω = [−1, 1] 2 .The CPU times and TT ranks are shown in Fig. 4.7 (left).Although Theorem 3.1 is not immediately applicable to a nonlinear system, we still observe a linear growth of the TT ranks with the number of Chebyshev points in each direction.The values of the TT ranks are larger than those in the one-dimensional case, leading to increased computing times.However, the performance of the high-dimensional HJB controller is satisfactory.Figures 4.7 (right) and 4.8 show the response of the system with an initial state x(ξ, 0) = 2 + cos(2πξ 1 ) cos(πξ 2 ).We can see again that the HJB-controlled state is stabilized while the LQR synthesis fails.
where the computational domain will be set Ω = (−6, 6).This equation models the density of particles, controlled with laser-induced electric force with potential G(x) + u(t)H(x) [32], with G the ground and H the control potential.We refer to [7] for a recent survey on further illustrations on the importance of pdf-based method for the control of the Fokker-Planck equation, including the control agent-based dynamics [2].This system has 0 as an eigenvalue with associated eigenstate x ∞ = exp(−(log ν + G ν )), see eg. [16].Henceforth the eigenstates are considered as normalized in L 2 (Ω).It is known that x ∞ is stable, but the convergence to this steady state, which is given by the second eigenvalue and depends on ν and G, can be extremely slow, see for instance [46, pg 251].Thus, to speed up convergence in the transient phase, control is of importance.To obtain a suitable stabilization problem we introduce the shifted state y = x − x ∞ .It satisfies The control objective consists now in driving y to zero.To compute the controller we further introduce a positive, i.e. destabilising, shift by adding σy to the right hand side of (4.5).If this controller is applied to the unshifted equation it accelerates convergence of y to 0 and hence the convergence of x to x ∞ .
Considering the variational form of (4.5) one observes that the control will not have an effect on a subspace of co-dimension one.For this reason we introduce Y P = {v ∈ L 2 (Ω) : Ω v dξ = 0}, and denote by P ∈ L(L 2 (Ω), Y P ) the projection onto Y P along x ∞ , which is given by Py = y − ( Ω y dξ)x ∞ .Subsequently we apply P to (4.5) with initial datum given by Px(0).For the details we refer to [15].
The Fokker-Planck equation (4.5) is discretized using a finite difference scheme with D intervals.To allow for possible further reduction of the dimension a balanced truncation based model reduction, adapted to bilinear systems [10], is used, to reduce the system to dimension d.
For the numerical results we fix γ = 10 −2 , ν = 1, σ = 0.2, and the potentials G(ξ) and H(ξ) are chosen to reproduce the setting in [16], as shown in Fig. 4.9.That is, the ground potential is set to be whereas H(ξ) is given by with the disjoint intervals united with an Hermite interpolant.Since both the original system size D, and the reduced dimension d, are approximation parameters, we need to set them to appropriate values that deliver a desired accuracy in the model outcomes, such as the total cost.Note also that in contrast to the linear case, the generalized balanced truncation method for bilinear systems does not exhibit an a priori error bound [10].
In Fig. 4.10 (left) we study the total cost in the LQR stabilised system.Here we initialise the Fokker-Planck system with the density function of the uniform distribution on [−6, 6].We can deduce that an absolute error of about 10 −3 (a relative     [16] that the free system exhibits a very slow convergence to equilibrium when started from a right-sided distribution, since the particles must flow through a region of low probability.In Fig. 4.12 we show the components of the running cost for the original unshifted system, both UNControlled and controlled with HJB and LQR signals, obtained for the shifted system.We see that the free system converges at a slow rate x 2 ∼ exp(−0.29t),while the controller computed for the de-stabilised system can accelerate this rate by almost a factor of 2. Note that when the HJB controller is computed for the original system (σ = 0), it accelerates the convergence only a little, so the shift is important to achieve the speedup.However, larger shifts make the HJB equation more difficult to solve.In particular, for larger shifts σ and larger state domain sizes a the stiffness matrix in (3.3) might become indefinite, and the policy iteration fails to converge.The domain size should be large enough to fit the trajectory, e.g. for the right-sided initial state the domain size of a = 20 is necessary to avoid excessive extrapolation of Legendre polynomials.This poses certain limitations on the range of possible applications of the TT-HJB approach.Nevertheless, when the policy iteration converges the HJB regulator can deliver a lower cost than LQR.

Conclusion.
We presented a numerical method for the solution of high-dimensional HJB PDEs arising in optimal feedback control for nonlinear dynamical systems.Our algorithm combines a continuous policy iteration with a tensor-train ansatz for the value function.An important matter of investigation is the identification of a class of optimal control problems where the value function can be accurately represented with a low-rank tensor train structure.For the class of optimal control problems we have explored in this work, consisting of systems governed by nonlinear parabolic PDEs, we consistently showed that the maximum TT rank in the value function approximation scales linearly with the dimension.This allows us to circumvent the curse of dimensionality up to a great extent, solving HJB PDEs with more than 100 dimensions.Control constraints are effectively enforced through penalties, despite larger TT ranks of the value function.The applications of the proposed methodology are extensive.Here we explored the synthesis of feedback control laws for high-dimensional dynamics arising from the semi-discretisation of nonlinear PDEs.However, high-dimensional dynamics also play a crucial role in aerospace engineering [13], networks and agentbased models [2].Finally, our methodology based on spectral approximation and tensor calculus, opens possibilities for a new error analysis.

Fig. 4 .
Fig. 4.1.Flowchart illustrating the proposed approach to optimal feedback control of PDEs.The optimal feedback synthesis for the abstract PDE dynamics leads to an infinite-dimensional HJB equation.By using a pseudospectral discretization in space, we obtain a finite-dimensional approximation of the original PDE dynamics, for which the optimal feedback synthesis is realized by solving a finite-dimensional HJB PDE.At this level, we apply our tensor decomposition-based HJB solver, which ultimately leads to a finite-dimensional optimal feedback law approximating the optimal law for the abstract dynamics.

Fig. 4 . 2 .
Fig. 4.2.Allen-Cahn problem (4.1).Left: numbers of policy iterations and maximal TT ranks.Right: differences in total running cost and CPU times for different spatial dimensions d.Numbers above points in the right plot denote d.Other settings n = 5 and δ = 10 −3 .

Fig. 4 . 4 .
Fig. 4.4.Allen-Cahn problem (4.1) with δ = 10 −3 and d = 14.Left: numbers of policy iterations and maximal TT ranks.Right: differences in total running cost and CPU times for different univariate polynomial degrees n.Numbers above points in the right plot denote n.

Fig. 4 . 5 .
Fig. 4.5.Allen-Cahn problem (4.1) with n = 5 and d = 14.Left: numbers of policy iterations and maximal TT ranks.Right: differences in total running cost and CPU times for different TT approximation thresholds δ.Numbers above points in the right plot denote δ.

Figure 4 . 6 (
left) shows the total CPU times and TT ranks of the constrained feedback law.

Figure 4 . 6 (
right) presents the control signals for three bound parameters.As P(x) becomes steeper for more severe control constraints, the TT ranks increase leading to longer computing times.Nevertheless, Alg. 3.2 remains effective for a wide range of constraints, adjusting the value function accordingly.

2 .
The Fokker-Planck equation.We compute optimal feedback regulators for the stabilised bilinearly controlled Fokker-Planck equation

Fig. 4 . 10 .
Fig. 4.10.Errors in the total cost in the Fokker-Planck model with the uniform initial state x(ξ, 0) = 1 12 for different numbers of discretisation points (left) and different dimensions of the reduced model (right).
Efficient practical implementation employs the fact that (3.19) mimics the original TT decomposition(3.4)in the sense that same original indices i 1 , ..., i d are separated.Therefore, given(3.11), the reduced matrix V =k AV =k can be computed block by block in a total of O(dn 2 r 4 .21)Notice that the reduced system (3.21) is of size r k−1 nr k , i.e. much smaller than the original system (3.3).Once we solve (3.21), we populate the TT block v (k) with the elements of v(k) through(3.20),andusethe updated v (k) to construct(3.19)in the next step k → k + 1 or k → k − 1.