An abstract analysis of optimal goal-oriented adaptivity

We provide an abstract framework for optimal goal-oriented adaptivity for finite element methods and boundary element methods in the spirit of [Carstensen et al., Comput. Math. Appl. 67 (2014)]. We prove that this framework covers standard discretizations of general second-order linear elliptic PDEs and hence generalizes available results [Mommer&Stevenson, SIAM J. Numer. Anal. 47 (2009); Becker et al., SIAM J. Numer. Anal. 49 (2011)] beyond the Poisson equation.

1. Introduction 1.1.State of the art & contributions.This work considers the simultaneous adaptive control of two error estimators η u,⋆ and η z,⋆ which satisfy certain abstract axioms from Section 2.4, below.The estimator product η u,⋆ η z,⋆ is designed to control the error in goal-oriented adaptivity and allows to prove optimal error decay for the goal functional.This is discussed in Section 1.2 and demonstrated in Section 4-6 for various model problems.We analyze three adaptive mesh-refinement algorithms (Algorithm A, B, C) which allow optimal convergence rates for the estimator product in the sense that each possible algebraic convergence r > 0 will be achieved, i.e., η u,ℓ η z,ℓ (#T ℓ − #T 0 ) −r for all ℓ ∈ N, without any a priori knowledge of the optimal rate r opt .Here, the triangulations (T ℓ ) ℓ∈N 0 are generated by the respective adaptive algorithm starting from some given initial triangulation T 0 .
While standard adaptivity aims to approximate some unknown exact solution u at optimal rate in the energy norm (see, e.g., [18,23,42] for adaptive FEM, [21,22,24,26] for adaptive BEM, and [15] for a recent overview on available results), goal-oriented adaptivity aims to approximate, at optimal rate, only the functional value g(u) (also called quantity of interest in the literature).Goal-oriented adaptivity is usually more important in practice than standard adaptivity.It has therefore attracted much interest also in the mathematical literature; see, e.g., [6,8,9,19,28,31,40] for some prominent contributions.However, as far as convergence and quasi-optimality of goal-oriented adaptivity is concerned, earlier results are only the two works [7,37] which are concerned with the Poisson model problem and the work [29] which considers general second-order linear elliptic PDEs, but is concerned with convergence only.Moreover, the analytical arguments of [7,37] are tailored to the Poisson equation and do not directly transfer to the more general setting of [29].The quasi-optimality analysis for goal-oriented adaptivity has also been named as an open problem in the recent work [13].In view of this, the contributions and advances of the present work can be summarized as follows: • We give an abstract analysis for optimal goal-oriented adaptivity which applies to general (non-symmetric) second-order linear elliptic PDEs in the spirit of [23] which even extends the problem class of [29].• The analysis avoids any (discrete) efficiency estimate and thus allows for simple newest vertex bisection, while [7,37] require local bisec5-refinement in the spirit of [42].
• Unlike [29], our analysis does not enforce the condition that the initial triangulation T 0 is sufficiently small, since we do not exploit the regularity of the dual solution to prove some crucial quasi-Galerkin orthogonality.• Finally, our analysis does not only cover the finite element method (FEM), but also applies to the boundary element method (BEM).
Related recent work includes [20], where the goal are point errors in symmetric BEM computations.Although we shall verify the mentioned estimator axioms only for standard FEM and BEM discretizations, we expect that they can also be verified for discretizations in the frame of isogeometric analysis; see, e.g., [34] for some goal-oriented adaptive IGAFEM.
1.2.Goal-oriented adaptivity in the frame of the Lax-Milgram lemma.The following introduction covers the main application of the abstract theory, we have in mind.Let X be a Hilbert space with norm • X , and let a(• , •) : X × X → R be a continuous and elliptic bilinear form on X .For a given linear and continuous functional f ∈ X * , let u ∈ X be the unique solution of a(u , v) = f (v) for all v ∈ X .
(1) Let g ∈ X * be the so-called goal functional, i.e., we aim to approximate g(u) at optimal rate.To this end, suppose that associated with each triangulation T ⋆ of some problem related domain Ω ⊂ R d , there is a finite dimensional subspace X ⋆ ⊆ X .Let U ⋆ ∈ X ⋆ be the unique Galerkin approximation of u which solves Furthermore, let z ∈ X be the unique solution to the so-called dual problem a(v , z) = g(v) for all v ∈ X .
(3) Then, for any Z ⋆ ∈ X ⋆ , it follows Here and throughout, abbreviates ≤ up to some generic multiplicative factor C > 0 which is clear from the context, e.g., the hidden constant in (4) is the continuity bound of a(•, •).Suppose that we compute the unique Galerkin approximation Z ⋆ ∈ X ⋆ of the dual solution z ∈ X , i.e., a(V ⋆ , Z ⋆ ) = g(V ⋆ ) for all V ⋆ ∈ X ⋆ , (5) and that the Galerkin errors on the right-hand side of (4) can be controlled by computable a posteriori error estimators Under these assumptions, we are altogether led to Overall, we thus aim for some adaptive algorithm which drives the computable upper bound on the right-hand side of (7) to zero with optimal rate.Remark 1.Using the residual R u,⋆ (v) := f (v) − a(U ⋆ , v) ∈ X * , one can improve the hidden constants in (4) by Some a posteriori error estimators (such as, e.g., estimators based on flux equilibration [44]) allow for the reliability estimate of the residual R u,⋆ X * ≤ C rel η u,⋆ even with known constant C rel = 1.The same arguments for the residual R z,⋆ (v 1.3.Outline.In Section 2, we propose three algorithms which are analyzed in the sections below, define the used mesh-refinement strategy, and outline the main result.Moreover, we provide the abstract framework in terms of four axioms for the estimators.Section 3 proves optimal convergence rates for each adaptive algorithm.In Section 4 we apply the abstract theory to conforming goal-oriented FEM for second-order elliptic PDEs.Section 5 covers goal-oriented FEM for the evaluation of some weighted boundary flux, whereas Section 6 applies the abstract theory to goal-oriented adaptivity for BEM.The final Section 7 discusses our results and points at extensions and open questions.

Adaptive Algorithms for the Estimator Product
We consider an adaptive algorithm which allows to drive the estimator product to zero with optimal rate.This includes, in particular, the problem class from Section 1.2, but also covers adaptive BEM for the approximation of point errors; see the recent own work [20].We suppose that each admissible triangulation T ⋆ (see Section 2.2 below) allows for the computation of the error estimators η u,⋆ and η z,⋆ , where the local contributions are (at least heuristically) linked to the elements T ∈ T ⋆ , cf. (8).To abbreviate notation, we shall write for w ∈ {u, z} and all U ⋆ ⊆ T ⋆ .
2.1.Adaptive algorithm.We consider two adaptive algorithms, which have been proposed and analyzed in [37] (Algorithm A) and [7] (Algorithm C) for goal-oriented adaptive FEM for the Poisson problem, and propose a slight modification of the algorithm from [37] (Algorihm B) which is also related to that of [29].Note that all algorithms differ only in the marking strategy: Algorithms A-B employ a separate Dörfler marking in step (iii)-(iv), whereas Algorithm C employs a combined Dörfler marking in step (iv).
(iii) Determine a set M u,ℓ ⊆ T ℓ of up to the multiplicative factor C mark minimal cardinality such that (iv) Determine a set M z,ℓ ⊆ T ℓ of up to the multiplicative factor C mark minimal cardinality such that to be the set of minimal cardinality.(vi) Let T ℓ+1 := refine(T ℓ , M ℓ ) be the coarsest refinement of T ℓ such that all marked elements T ∈ M ℓ have been refined.Output: Sequence of successively refined triangulations T ℓ and corresponding error estimators η u,ℓ , η z,ℓ for all ℓ ∈ N 0 .
Remark 2. In the frame of Section 1.2, the computation of η u,ℓ in (i) and η z,ℓ in (ii) usually requires to solve the primal (2) and the dual problem (5) to obtain the approximations U ℓ resp.Z ℓ .Remark 3. To construct a set M u,ℓ with minimal cardinality (i.e., C mark = 1) which satisfies the Dörfler marking (9), one needs to sort the refinement indicators.This results in logarithmic-linear costs.For C mark = 2, the algorithmic construction of some set M u,ℓ which satisfies (9), is possible in linear complexity [42].
Next, we propose a modified version of Algorithm A which allows for more aggressive marking in step (v), i.e., less adaptive steps.A similar but non-optimal algorithm has been proposed and tested in [29].
Algorithm B. Input: Initial triangulation T 0 , marking parameter 0 < θ ≤ 1, and (iv) Determine a set M z,ℓ ⊆ T ℓ of up to the multiplicative factor C mark minimal cardinality such that (vi) Let T ℓ+1 := refine(T ℓ , M ℓ ) be the coarsest refinement of T ℓ such that all marked elements T ∈ M ℓ have been refined.Output: Sequence of successively refined triangulations T ℓ and corresponding error estimators η u,ℓ , η z,ℓ for all ℓ ∈ N 0 .Remark 4. In our numerical experiments below, we choose M ℓ as follows: Having picked M ℓ to be the minimal set amongst M u,ℓ and M z,ℓ , we enlarge M ℓ by adding the largest # M ℓ elements of the other set, e.g., if #M u,ℓ ≤ #M z,ℓ , then M ℓ consists of M u,ℓ plus the #M u,ℓ largest contributions of M z,ℓ .This yields C ′ mark = 2. Remark 5.In [29], the authors consider Algorithm B, but define M ℓ := M u,ℓ ∪ M z,ℓ in step (v).While this also leads to linear convergence in the sense of Theorem 16, [29] only proves suboptimal convergence rates min{s, t} instead of the optimal rate s + t in Theorem 20; see [29,Section 4].We note that the strategy of [29] leads to linear convergence η u,ℓ+n ≤ Cq n η u,ℓ and η z,ℓ+n ≤ Cq n η z,ℓ for either estimator and all ℓ, n ∈ N 0 , where C > 0 and 0 < q < 1 are independent constants, while the optimal strategies considered in this work only enforce linear convergence η u,ℓ+n η z,ℓ+n ≤ Cq n η u,ℓ η z,ℓ for the estimator product.
Finally, the following algorithm has been proposed in [7].
Algorithm C. Input: Initial triangulation T 0 , marking parameter 0 < θ ≤ 1, and T ) 2 for all T ∈ T ℓ .(iv) Determine a set M ℓ ⊆ T ℓ of up to the multiplicative factor C mark minimal cardinality such that (v) Let T ℓ+1 := refine(T ℓ , M ℓ ) be the coarsest refinement of T ℓ such that all marked elements T ∈ M ℓ have been refined.Output: Sequence of successively refined triangulations T ℓ and corresponding error estimators η u,ℓ , η z,ℓ for all ℓ ∈ N 0 .Remark 6. Building on the results of [37], it is claimed and empirically investigated in [7] that the combined Dörfler marking (13) requires less adaptive steps to reach a prescribed accuracy.We note that this has only been proved rigorously in [7] for the Poisson problem with polynomial data, while general data have to satisfy a certain saturation assumption for the related data oscillation terms; see [7, eq. (4.4)] and [7,Theorem 4.1].We note that this assumption also restricts the quasi-optimality analysis of [7] which, in the spirit of [37], relies on the (constrained) contraction of the total error.

Mesh-refinement.
We suppose that the mesh-refinement is a deterministic and fixed strategy, e.g., newest vertex bisection [43].Unlike [7,37], we do not require the interior node property guaranteed by bisec5-refinement of marked elements [39].For each triangulation T and marked elements M ⊆ T , we let T ′ := refine(T , M) be the coarsest triangulation, where all elements T ∈ M have been refined.This may, in particular, include the preservation of conformity or bounded shape regularity.We write T ′ ∈ refine(T ), if there exist finitely many triangulations T (0) , . . ., T (n) and sets M (j) ⊆ T (j) such that T = T (0) , T ′ = T (n) and T (j) = refine(T (j−1) , M (j−1) ) for all j = 1, . . ., n, where we formally allow n = 0, i.e., T = T (0) ∈ refine(T ).To abbreviate notation, let T := refine(T 0 ), where T 0 is the given initial triangulation of Algorithms A-C.

Main result.
Our main result requires the following abstract approximation class for the error estimator η u,ℓ resp.η z,ℓ .Let T N := {T ∈ T : #T − #T 0 ≤ N} denote the (finite) set of all refinements of T 0 which have at most N elements more than T 0 .For s > 0 and w ∈ {u, z}, we write w ∈ A s if w As := sup where η w,⋆ is the error estimator associated with the optimal triangulation T ⋆ ∈ T N .In explicit terms, w As < ∞ means that an algebraic convergence rate O(N −s ) for the error estimator is possible, if the optimal triangulations are chosen.
For either algorithm, our abstract main result is twofold: First, we prove linear convergence (Section 3.1): For each 0 < q < 1 there exists some n such that for all ℓ ∈ N, the reduction of η u,ℓ η z,ℓ by the factor q requires at most n steps of the adaptive loop, i.e., η u,ℓ+n η z,ℓ+n ≤ q η u,ℓ η z,ℓ .Second, we prove optimal convergence behavior (Section 3.3-3.5):With respect to the number of elements N ≃ #T ℓ −#T 0 , the product η u,ℓ η z,ℓ decays with order O(N −(s+t) ) for each possible algebraic rate s + t > 0, i.e., u As + z At < ∞.This means that the adaptive algorithms will asymptotically realize each possible algebraic decay.
Remark 7. Since our analysis works with the estimator instead of the error, it avoids the use of any (discrete) efficiency bound.Compared to [7,37], this allows to use simple newest vertex bisection instead of bisec5-refinement for marked elements.Lemma 22 below states that for standard FEM our approximation class coincides with that of [7,18,37] which is defined through the so-called total error (i.e., error plus data oscillations).
2.4.Axioms of Adaptivity.In the following, let dl w (• , •) : T × T → R ≥0 denote a distance function on the set of admissible triangulations which satisfies with some uniform constant C dist > 0; see also Remark 11 below.The convergence and optimality analysis of Algorithm A is done in the frame of the following four axioms of adaptivity [15], where axiom (A3) is slightly relaxed when compared to [15]: (A1) Stability on non-refined elements: There exists a constant C stb > 0 such that for each triangulation T ℓ ∈ T and all refinements T ⋆ ∈ refine(T ℓ ) the corresponding error estimators satisfy Reduction on refined elements: There exist constants 0 < q red < 1 and C red > 0 such that for each triangulation T ℓ ∈ T and all refinements T ⋆ ∈ refine(T ℓ ) the corresponding error estimators satisfy Then, for all ε > 0, there exists some constant C orth (ε) > 0 such that for all n ≤ N, for which T ℓn , . . ., T ℓ N are well-defined, it holds (A4) Discrete reliability: There exists a constant C rel > 0 such that for each triangulation T ℓ ∈ T and all refinements T ⋆ ∈ refine(T ℓ ), it holds where the set R w (T ℓ , T ⋆ ) satisfies and hence consists essentially of the refined elements only.We recall some elementary observations of [15].Lemma 8 (quasi-monotonicity of estimator [15,Lemma 3.5]).There exists a constant C mon > 0 which depends only on stability (A1), reduction (A2), and discrete reliability (A4), such that for all T ℓ ∈ T and all refinements T ⋆ ∈ refine(T ℓ ), it holds η 2 w,⋆ ≤ C mon η 2 w,ℓ .Lemma 9 (optimality of Dörfler marking [15,Proposition 4.12]).Suppose stability (A1) and discrete reliability (A4).For all 0 < θ < θ ⋆ := (1 + C 2 stb C 2 rel ) −1 , there exists some 0 < κ ⋆ < 1 such that for all T ℓ ∈ T and all refinements T ⋆ ∈ refine(T ℓ ), it holds where R w (T ℓ , T ⋆ ) is the set of refined elements from (A4).
Remark 11. (i) In the setting of Section 1.2, let w ∈ {u, z} with W ⋆ ∈ {U ⋆ , Z ⋆ } being the corresponding Galerkin solution for T ⋆ ∈ T. The abstract distance is then defined by (ii) Suppose that the bilinear form a(•, •) is additionally symmetric, and let |||v||| := a(v, v) 1/2 denote the induced energy norm which is an equivalent norm on X .Then, nestedness X n ⊆ X m ⊆ X k of the discrete spaces for all k ≥ m ≥ n implies the Galerkin orthogonality This and (A4) imply w,ℓn .and hence the quasi-orthogonality (A3) with ε = 0.

Generalized linear convergence.
The following estimator reduction is first found in [18] and, e.g., proved along the lines of [15,Lemma 4.7].Since we need a slightly stronger result than that of [15], which covers arbitrary refinements T ⋆ ∈ refine(T ℓ+1 ) instead of T ℓ+1 only, we include the proof for convenience of the reader.
The following result generalizes [15,Proposition 4.10] to the present setting.We note that (A4) enters only through the quasi-monotonicity of the estimator (Lemma 8).

Optimal Convergence of Adaptive Algorithms
Throughout this section, we suppose that the error estimators η u,ℓ and η z,ℓ satisfy the respective assumptions (A1)-(A4) of Section 2.4.Without loss of generality, we suppose that η u,ℓ and η z,ℓ satisfy the axioms (A1)-(A4) with the same constants.
Remark 14.The axioms (A1)-(A4) are designed to cover weighted-residual error estimators in the frame of FEM and BEM.However, as is shown in [15, Section 8] for optimal adaptivity for the energy error, it is sufficient that for w ∈ {u, z} the error estimator η w,ℓ used in the adaptive algorithm is locally equivalent to some error estimator η w,ℓ which satisfies (A1)-(A4), i.e., η ℓ,w (T ) η ℓ,w (ω ℓ (T )) and η ℓ,w (T ) η ℓ,w (ω ℓ (T )) for all T ∈ T ℓ , where ω ℓ (T ) denotes a patch of T .Then, the convergence (Theorem 16) as well as optimality results (Theorem 17, 20, and 21) remain valid.We leave the details to the reader, but note that such arguments cover averaging-based error estimators, hierarchical error estimators, as well as estimators based on equilibrated fluxes; see [15,33].A, B, and C. The following result gives a criterion for linear convergence which is satisfied for either adaptive algorithm (Algorithm A, B, and C).As a consequence, linear convergence is independent of C mark , and we may formally also choose C mark = ∞.Finally, linear convergence (22) does only rely on discrete reliability (A4) to ensure quasi-monotonicity of the estimator (Lemma 8).In the frame of the Lax-Milgram lemma from Section 1.2, the quasi-monotonicity already follows from classical reliability (6); see [15,Lemma 3.6].
Proof for Algorithm B. Analogously, the set M ℓ satisfies in each step of Algorithm B either (11) or (12).Since M ℓ ⊆ M ℓ ⊆ T ℓ \ T ℓ+1 , Proposition 15 applies and concludes the proof for Algorithm B.
In particular, this shows that Proposition 15 applies and concludes the proof.

Fine properties of mesh-refinement.
The following Theorem 17 states optimal convergence behavior.Unlike linear convergence, the proof of optimal convergence rates is more strongly tailored to the mesh-refinement used.First, we suppose that each refined element has at least two sons, i.e., #(T \T ′ ) + #T ≤ #T ′ for all T ∈ T and T ′ ∈ refine(T ).(23) Second, we require the mesh-closure estimate where C mesh > 0 depends only on T 0 .This has first been proved for 2D newest vertex bisection in [11] and has later been generalized to arbitrary dimension d ≥ 2 in [43].
While both works require an additional admissibility assumption on T 0 , this has at least been proved unnecessary for 2D in [32].Finally, it has been proved in [18,42] that newest vertex bisection ensures the overlay estimate, i.e., for all triangulations T , T ′ ∈ T there exists a common refinement T ⊕ T ′ ∈ refine(T ) ∩ refine(T ′ ) which satisfies Although not used explicitly, we note that for newest vertex bisection, the triangulation T ⊕ T ′ is, in fact, the overlay of T and T ′ .For 1D bisection (e.g., for 2D BEM computations in Section 6), the algorithm from [2] satisfies ( 23)- (25) and guarantees that the local mesh-ratio is uniformly bounded.For meshes with first-order hanging nodes, ( 23)-( 25) are analyzed in [12], while T-spline meshes for isogeometric analysis are considered in [38].
3.3.Optimal convergence rates for Algorithm A. Our proofs of the following theorems (Theorem 17, 20, 21) follow the ideas of [37] as worked out in [20].We include it here for the sake of completeness and a self-contained presentation.
Then, (26) implies for all 0 < s < s max and 0 < t < t max and all ℓ ∈ N where the hidden constants additionally depend on s max − s > 0 resp.t max − t > 0.
The heart of the proof of Theorem 17 is the following lemma.
Proof.Adopt the notation of Lemma 9.For ε := C −1 mon κ ⋆ η u,ℓ η z,ℓ , the quasi-monotonicity of the estimators (Lemma 8) yields ε ≤ κ ⋆ η u,0 η z,0 < u As z At < ∞.Choose the minimal N ∈ N 0 such that u As z At ≤ ε (N + 1) s+t .Then, Lemma 8, the definition of the approximation classes, and the choice of N give , and Lemma 9 hence proves (28).It remains to derive (27).First, note that (29) since refined elements are refined into at least two sons (23).Second, minimality of N yields Combining ( 29)-( 30), we conclude (27) with Proof of Theorem 17.According to (28) of Lemma 19 and the marking strategy in Algorithm A, it holds for all j ∈ N 0 With the mesh-closure estimate (24) and estimate (27) of Lemma 19, we obtain Linear convergence (22) implies η u,ℓ η z,ℓ ≤ C lin q ℓ−j lin η u,j η z,j for all 0 ≤ j ≤ ℓ and hence With 0 < q := q 1/(s+t) lin < 1, the geometric series applies and yields Combining this with the first estimate, we obtain Rearranging this estimate, we conclude (26) with Proof.We note that Lemma 19 is not affected by the marking strategy and hence remains valid.To conclude the proof, we only need to show that estimate (31) also remains true.Since M j in Algorithm B is a set of minimal cardinality up to the factor C ′ mark C mark which satisfies either (9) or (10), estimate (31) holds with different constants, i.e., Proof.We only need to show that (31) remains valid.Note that 0 < 2θ < θ ⋆ .Therefore, estimate (28) of Lemma 19 yields Either for R j := R u (T j , T ⋆ ) or for R j := R z (T j , T ⋆ ) this implies According to the marking strategy in Algorithm C, we obtain which is (31).Therefore, the claim follows with C opt = max{C lin C 2 , C mesh C mark C 1 }.

Goal-Oriented Adaptive FEM for Second-Order Linear Elliptic PDEs
In this section, we extend the ideas from [23] and prove that our abstract frame of convergence and optimality of goal-oriented AFEM applies, in particular, to general secondorder linear elliptic PDEs.
To formulate the residual error estimators in ( 35)-( 36) below, we additionally require that div f 2 , div g 2 exist in L 2 (Ω) elementwise on the initial mesh T 0 and that the edge jumps satisfy [f 2 • n], [g 2 • n] ∈ L 2 (∂T ) for all T ∈ T 0 .(These assumptions are for instance satisfied if f 2 , g 2 are T 0 -piecewise constant.)Note that L is non-symmetric as We suppose that the induced bilinear form is continuous and H 1 0 (Ω)-elliptic and hence fits in the frame of Section 1.2.The right-hand side of (1) reads f (v 4.2.Discretization.For a given regular triangulation T ⋆ of Ω and a polynomial degree p ≥ 1, define P p (T ⋆ ) := {V ∈ L 2 (Ω) : V | T is polynomial of degree ≤ p for all T ∈ T ⋆ }.We consider X ⋆ := S p 0 (T ⋆ ) := P p (T ⋆ ) ∩ H 1 0 (Ω) and let U ⋆ , Z ⋆ ∈ X ⋆ be the unique FEM solutions of (2) resp.(5), i.e., The residual error-estimator for the discrete dual problem (5) reads where The error estimators satisfy reliability (6); see, e.g., [1,44].The abstract analysis of Section 1.2 thus results in and we aim for optimal convergence of the right-hand side.Moreover, efficiency and the Céa lemma prove that the estimator based approximation class A s from Section 2.3 coincides with the approximation class based on the total error used, e.g., in [7,18,37].The following result is proved in [23,Lemma 5.1] for f 2 = 0 = g 2 , but holds verbatim in the present case.
Lemma 22.Let w ∈ {u, z}.There holds w ∈ A s if and only if sup Here, Π q T : L 2 (T ) → P q (T ) denotes the L 2 -orthogonal projection onto polynomials of degree q and Π q ∂T : L 2 (∂T ) → P q (S ∂T ) denotes the L 2 -orthogonal projection onto (discontinuous) piecewise polynomials of degree q on the faces of T .

Verification of axioms.
With newest vertex bisection from [43] as meshrefinement strategy, the assumptions of Section 3.2 are satisfied.It remains to verify the axioms (A1)-(A4), where dl and W ℓ resp.W ⋆ are the corresponding FEM approximations of w ∈ {u, z}.
Lemma 24.In the setting of Theorem 23 and for Algorithm A-C, the Galerkin approximations U ℓ and Z ℓ converge in the sense of for certain U ∞ , Z ∞ ∈ H 1 0 (Ω).Moreover, there holds at least U ∞ = u or Z ∞ = z.Proof.Note that adaptive mesh-refinement guarantees nestedness X ℓ ⊆ X ⋆ for all T ℓ ∈ T and T ⋆ ∈ refine(T ℓ ).As in [15, Section 3.6] or [5, Lemma 6.1], the Céa lemma thus implies a priori convergence in the sense that there exist For w ∈ {u, z}, let ℓ w,n denote the subsequences which satisfy θη 2 w,ℓw,n ≤ η w,ℓw,n (M w,ℓw,n ) 2 for all n ∈ N.There holds #{ℓ w,n : n ∈ N} = ∞ for at least one w ∈ {u, z}.While this is obvious for Algorithm A and Algorithm B, it follows for Algorithm C from the proof of Theorem 16.For this particular w, the estimator reduction from Lemma 12 reads η 2 w,ℓ w,n+1 ≤ q est η 2 w,ℓw,n + C est dl w (T ℓ w,n+1 , T ℓw,n ) 2 for all n ∈ N. The a priori convergence (38) implies dl w (T ℓ w,n+1 , T ℓw,n ) 2 → 0 as n → ∞.Elementary calculus thus yields lim n→∞ η w,ℓw,n = 0; see, e.g., [15,Corollary 4.8] Proof of Theorem 23, (A3).Recall the sequence T ℓn from (A3).With the a priori convergence of Lemma 24, the proof of [23,Lemma 3.5] applies and shows the weak convergence in ⇀ 0 and . With this, [23, Proposition 3.6] applies for the primal as well as the dual problem and shows that given any δ > 0, there exists j δ ∈ N such that all j ≥ j δ satisfy The discrete reliability (A4) and the convergence (38) With ( 39)-( 40), the quasi-monotonicity from Lemma 8 (since (A1), (A2), (A4) have already been verified) implies for δ = 1 − 1/(1 + εC −2 rel ) and hence 1/(1 ≤ Another application of the reliability (40) shows 4.5.Numerical experiment I: Goal oriented FEM for the Poisson equation.We consider a numerical example proposed in [37,Example 7.3] for the Laplace operator in 2D, while a nonsymmetric second-order elliptic operator is considered in Section 5.5.The goal of this first experiment is to verify the optimal convergence of Algorithm A-C as predicted by theory, and to compare the various algorithms as well as standard AFEM (i.e., non-goal-oriented adaptive FEM, where M ℓ := M u,ℓ resp.M ℓ := M z,ℓ in Algorithm A); see, e.g., [15,18,23,42]).
Figure 2 (left) shows the typical convergence behavior for the estimators η u,ℓ and η z,ℓ , the estimator product η u,ℓ η z,ℓ , and the goal error |g(u) − g(U ℓ )|, where we used Algorithm A-C with θ = 0.5.Similar results are obtained for other choices of 0 < θ < 1 (not Example from Section 4.5: Over the numbers of elements #T ℓ , we plot the estimators η u,ℓ and η z,ℓ , the estimator product η u,ℓ η z,ℓ , as well as the goal error |g(u) − g(U ℓ )| as output of Algorithm A-C with θ = 0.5 (left) resp.the estimator product for various θ ∈ {0.1, . . ., 0.9} as well as for θ = 1.0 which corresponds to uniform refinement.displayed).The estimator product η u,ℓ η z,ℓ shows the optimal convergence rate of O(N −3 ) as predicted by theory for p = 3 in 2D.
Figure 2 (right) shows that all Algorithms A-C yield the optimal rate of convergence O(N −3 ), for a large range of values of θ including θ = 0.9.Uniform refinement corresponds to θ = 1.0 and shows a suboptimal rate of O(N −1 ) Figure 3 shows the numerical results for standard AFEM, which are based on adaptivity for either the primal or the dual problem.In both cases, theory predicts optimal number of elements N = #T ℓ Figure 3. Example from Section 4.5: Output of standard (non-goaloriented) AFEM algorithms with θ = 0.5, where adaptive mesh-refinement is steered only by the primal error estimator η u,ℓ (top) resp.the dual error estimator η z,ℓ (bottom).Over the numbers of elements #T ℓ , we plot the estimators η u,ℓ and η z,ℓ , the estimator product η u,ℓ η z,ℓ , as well as the goal error |g(u)−g(U ℓ )| (left) resp.the estimator product for various θ ∈ {0.1, . . ., 0.9} as well as for θ = 1.0 which corresponds to uniform refinement.convergence behavior O(N −3/2 ) for the related error estimator, at least if the adaptivity parameter θ is sufficiently small; see, e.g., [18,23,15].For all θ ∈ {0.observe the optimal rate O(N −3/2 ) for the error estimator which drives the adaptive process.However, for the estimator product η u,⋆ η z,⋆ these strategies result in a suboptimal convergence rate O(N −2 ).
In adaptive computations, the overall runtime depends on the entire history of adaptively generated meshes.To better compare the various algorithms, Figure 5 shows the cumulative number of elements which is necessary to reach a prescribed accuracy of η u,ℓ η z,ℓ ≤ tol, versus θ ∈ {0.1, 0.9}.The definition of N cum reflects the total amount of work in the complete adaptive process.Altogether, we compare five adaptive strategies: Besides Algorithm A-C, we consider standard AFEM based on the primal error estimator and standard AFEM based on the dual error estimator.For example, for a tolerance tol = 10 −5 and p = 3, Figure 5 (left) shows that N cum is smallest for Algorithm B-C for θ = 0.8.Furthermore, we see that the goal-oriented algorithms A-C are superior to standard AFEM.Amongst the goal-oriented algorithms, because of having combined primal and dual refinement, Algorithm B-C are superior to Algorithm A, which only does one-sided refinement per iteration step.Furthermore, Algorithm B is at least competitive and sometimes even superior to Algorithm C. As visible in Figure 5 (right), for tol = 10 −4 and p = 2, N cum is smallest for Algorithm B and θ = 0.6.

Goal-Oriented Adaptive FEM for Flux Evaluation
where u is the solution to (32).For smooth u, g(u) can be rewritten as for all z ∈ H 1 (Ω) with z| Γ = Λ.Since the right-hand side is well-defined for u ∈ H 1 0 (Ω), this is a valid generalization of the flux [28, Section 7].Let z be the unique solution of the following inhomogeneous Dirichlet problem: 5.2.Discretization.For a given regular triangulation T ⋆ of Ω and a polynomial degree p ≥ 1, let P p (T ⋆ ) be defined as in Section 4.2.Consider S p (T ⋆ ) := P p (T ⋆ ) ∩ H 1 (Ω) and S p 0 (T ⋆ ) := P p (T ⋆ ) ∩ H 1 0 (Ω).Let U ⋆ be the unique FEM solution of the homogeneous Dirichlet problem belongs to the discrete trace space on the initial triangulation T 0 .To approximate N z (u) from ( 43), we let Z ⋆ be the unique FEM solution of and define Lemma 25.There holds where U ⋆ denotes the FEM approximation of u from (2) and C flux > 0 depends only on a(•, •).
where we used the definition of z and Z ⋆ .

Residual error estimator.
The residual error estimator for the primal problem remains the same as in (35), i.e., Since the inhomogeneous boundary data satisfies Λ ∈ S p (T 0 | Γ ) also the dual estimator η z,⋆ remains the same as in (36) with g 1 = 0 and g 2 = 0, i.e., Lemma 25 together with the reliability of η w,⋆ for w ∈ {u, z} (see, e.g., [3,Proposition 3] for the inhomogeneous Dirichlet problem for z) implies Hence, the problem fits into the abstract framework of Section 2. We aim for optimal convergence of the right-hand side of (48).

Verification of axioms.
With newest vertex bisection from [43] as meshrefinement strategy, the assumptions of Section 3.2 are satisfied.It remains to verify the axioms (A1)-(A4), where dl Theorem 26.Consider the model problem of Section 5.1.Then, the conforming discretization (44) of Section 5.2 with the residual error estimators (46)-(47) from Section 5.3 satisfies stability (A1), reduction (A2) with q red = 2 −1/d , quasi-orthogonality (A3), and discrete reliability In particular, the Algorithms A-C are linearly convergent with optimal rates in the sense of Theorem 16, 17, 20, and 21 for the upper bound in (48).
Proof.For the primal problem, (A1)-(A4) follow as in Theorem 23.For the dual problem, the axioms (A1)-(A2) follow from Theorem 23 since the estimator did not change.The discrete reliability (A4) is proved in [3] for general W ∈ H 1 (Γ).In our particular situation, the proof simplifies vastly and shows even R z (T ℓ , T ⋆ ) = T ℓ \T ⋆ .To see the quasiorthogonality (A3), choose a discrete extension Then, there holds Ω) .Since Z 0 ⋆ is the solution to a homogeneous Dirichlet problem, the proof of (A3) follows analogously to that of Section 4.

Numerical experiment II:
Flux-oriented adaptive FEM for convectiondiffusion.We consider a numerical experiment similar to [36,Section 5.3] for some convection-diffusion problem in 2D.The goal of this experiment is to verify the optimal convergence of Algorithm A-C for the flux quantity of interest (43) and, moreover, to illustrate this for a nonsymmetric second-order elliptic operator, which is covered by our theory.
We set f (v) = 0 and consider non-homogeneous Dirichlet data on ∂Ω for the primal problem, a pulse, defined by the continuous piecewise linear function  Note that u Dir trivially extends to some discrete function u Dir ∈ S 1 (T 0 ) if T 0 is chosen appropriately.Therefore, we can rewrite the problem into a homogeneous Dirichlet To study the robustness of the goal-oriented algorithm with respect to the diffusion coefficient ν ∈ {10 −3 , 10 −4 , 10 −5 } (left, from top to bottom), we plot the estimators η u,ℓ and η z,ℓ , the estimator product η u,ℓ η z,ℓ , as well as the goal error |N z (u) − N z,ℓ (U ℓ )| as output of Algorithm B with θ = 0.6 over the numbers of elements #T ℓ (left).We show some related discrete meshes with > 20,000 elements (right).problem.To that end, write u = u 0 + u Dir with u 0 ∈ H 1 0 (Ω) and solve a(u 0 , v) = f (v) − a(u Dir , v) for all v ∈ H 1 0 (Ω).Note that the additional term on the right-hand side is of the form divλ + λ for some T 0 -element wise constant λ and some λ ∈ L 2 (Ω).A direct computation shows that the weighted-residual error estimator with respect to u 0 coincides with η u,ℓ .Arguing as in the proof of Theorem 26, we see that the estimator satisfies the axioms (A1)-(A4).Altogether, the problem thus fits in the frame of our analysis.
The primal solution corresponds to the clockwise convection-diffusion of this pulse.We choose the boundary weight function Λ : ∂Ω → R as a shifted version of the above pulse: The dual solution corresponds to the counter-clockwise convection-diffusion of this pulse.
For small ν, the (primal and dual) pulses are transported from ∂Ω into Ω and eventually back to ∂Ω where a boundary layer develops.See Figure 6 (left) for an illustration of the supports of the primal and dual Dirichlet data, and the primal and dual convective fields.
All discrete approximations are computed with lowest-order finite elements of degree p = 1.The uniform initial triangulation T 0 is as shown in Figure 6 (right) ensures that the (primal and dual) Dirichlet data belong to the discrete trace space S 1 (T 0 | Γ ).
As shown in Figure 7, Algorithm A-C yield optimal convergence rates for the flux quantity of interest.For ν = 10 −3 and a large range of values of θ ∈ {0.1, . . ., 0.9}, we observe the optimal convergence rate O(N −1 ), while uniform mesh-refinement appears to be slightly suboptimal.
To compare the overall performance of the different algorithms, Figure 8 visualizes over different marking parameters θ ∈ {0.1, . . ., 0.9} the cumulative number of elements N cum which is necessary to reach a prescribed accuracy of η u,ℓ η z,ℓ ≤ 10 −4 vs. the marking parameter θ ∈ {0.1, . . ., 0.9}; see (42) for the definition and interpretation of N cum .For Algorithms A-C, we observe that N cum is smallest for relatively large values θ ≥ 0.5, with Algorithm A being less efficient than Algorithm B and C. Overall, Algorithm B with θ = 0.6 seems to be the best choice.
Figure 9 shows several approximations and meshes obtained with Algorithm B. Because ν = 10 −3 is relatively small, both the primal and the dual solution have significant boundary layers.These layers as well as the weak singularities coming from the kinks in the Dirichlet data are well captured by the adaptive algorithm.
Figure 10 illustrates the effect of varying ν ∈ {10 −3 , 10 −4 , 10 −5 }.The optimal convergence rate of the estimator product is observed for the indicated values of ν, however, the pre-asymptotic regime is longer for smaller values of ν.This is to be expected, as the hidden constant in (48) depends on the reliability constant for the estimators, which in turn depends on ν.
6.3.Residual error estimator.The residual error estimators from [16] for the discrete primal problem (2) and the discrete dual problem (5) read The error estimators satisfy reliability (6); see, e.g., [16].The abstract analysis of Section 1.2 thus results in and we aim for optimal convergence of the right-hand side.
We define the initial mesh T 0 as shown in Figure 11.As weight function Λ ∈ S 1 (T 0 ), we consider the hat function defined by Λ(z 0 ) = 1 and Λ(z) = 0 for all other nodes z of T 0 (the node z 0 is indicated in Figure 11).
For the lowest-order case p = 0 and θ = 0.5 in Algorithm A-C, Figure 12 shows the convergence rates of the error estimators η u , η z , their product η u η z , and the error in the goal functional |g(u) − g(U ℓ )|.Moreover, we compare the convergence rate of the error in the goal functional for different values of θ ∈ {0.1, . . ., 0.9}.For either choice of θ and all adaptive algorithms, we observe the optimal convergence rate (#T ℓ ) −3/2 for the respective error estimators as well as (#T ℓ ) −3 for the error in the goal functional.
For different values of θ ∈ {0.1, . . ., 0.9}, Figure 13 plots the cumulative number of elements N cum := ℓ k=0 #T k necessary to reach a given error tolerance 10 −6 .We observe that for all three algorithms a large θ ≈ 0.8 seems to be optimal.Moreover, Algorithms B-C show comparable performance which is clearly superior to that of Algorithm A in the whole range of θ. 6.6.Numerical experiment with non-conforming weight function.We consider the same setting as in Section 6.5, with the only difference that Λ is the characteristic function of Γ Λ Γ, i.e., Λ(x) = 1 on Γ Λ and Λ(x) = 0 on Γ \ Γ Λ .We choose Γ Λ Γ as the part of Γ which is marked in red in Figure 11.This implies that the goal functional takes the form Consequently, this example is not covered by the theory of the previous sections.This is also reflected by the numerical results, if the adaptive algorithms are naively employed; see Figure 14, where we do not observe convergence at all.
To account for the fact that Λ / ∈ H 1 (Γ), we approximate Λ in each adaptive step by the continuous function Λ ℓ ∈ S 1 (T ℓ ) defined by Λ ℓ (z) = 1 for all nodes z of T ℓ with z ∈ Γ Λ and Λ ℓ (z) = 0 for all other nodes.Convergence Λ ℓ → Λ is assured by marking of the two elements where Λ ℓ is not constant in each adaptive step.Since there clearly holds Λ ℓ H 1/2 (Γ) → ∞ as ℓ → ∞, we need to rescale the error estimators for the primal and the dual problem, respectively.Given ε > 0, define and η ε z,ℓ (T ) Since a thorough analysis is beyond the scope of this paper, we only provide a heuristic motiviation for this rescaling: With Λ ℓ ≈ Λ, the error in the goal functional is estimated Since sup ℓ∈N Λ ℓ H 1/2−ε (Γ) < ∞, the last estimate is even rigorous and follows from appropriate Poincaré inequalities; see, e.g., [14,16].
For ε = 0.3, θ = 0.5, and lowest order BEM p = 0, Figure 16 shows the convergence rates of the error estimators η u , η z , their product η u η z , and the error in the goal functional |g(u) − g(U ℓ )|.Moreover, we compare the convergence rates of the error in the goal functional for different values of θ ∈ {0.1, . . ., 0.9}.Except for θ = 0.9 and Algorithm C, we observe for either choice of θ and all adaptive algorithms the optimal convergence rate (#T ℓ ) −3 for the error in the goal functional as well as the estimator product.
Figure 17 plots the cumulative number of elements N cum = ℓ k=0 #T k necessary to reach a given error tolerance 10 −6 for different values of θ ∈ {0.1, . . ., 0.9}.We observe that for Algorithms A-C a large θ ≈ 0.8 seems to be optimal, whereas Algorithm B shows optimal behavior for 0.3 ≤ θ ≤ 0.7.Overall, Algorithm B seems to be the best choice in this experiment.

7.
Conclusions & Open Questions 7.1.Analytical results.We have derived an abstract framework to prove convergence with optimal algebraic rates for goal-oriented adaptivity for finite element methods and boundary element methods.While the analysis of prior works [7,37] was tailored to the Poisson model problem resp.symmetric boundary integral formulations [20], our approach which is inspired by [15], is a priori independent of the model problems and covers general linear second-order elliptic PDEs and fixed order elements in the frame of the Lax-Milgram lemma.Following [18], our argument avoids the discrete efficiency and hence the interior node property of the mesh-refinement required in [7,37].Following [23], our argument uses the concept of a general quasi-orthogonality which allows to work beyond symmetric problems and, unlike [36,29], to avoid any assumption on the initial mesh T 0 .As firstly observed in [3] and later used in [23,15], the convergence and quasi-optimality analysis relies essentially only on reliability of the error estimator (see axioms (A1)-(A4)), while efficiency is only used to characterize the estimator-based approximation classes in terms of the so-called total error, i.e., error plus data oscillations (see Lemma 22).In addition to the algorithm from [37] (Algorithm A), we gave a thorough analysis for the algorithm from [7] (Algorithm C) without additional assumptions on the given data.Moreover, we proposed a variant of the algorithms from [37] and [29] (Algorithm B).All three algorithms are proved to be linearly convergent with optimal algebraic rates (see Theorem 16,17,20,21), where theory guarantees linear convergence for all marking parameters 0 < θ ≤ 1, while optimal convergence rates are qualitatively guaranteed for 0 < θ < θ ⋆ (Algorithm A-B) resp.0 < θ < θ ⋆ /2 (Algorithm C) for some a priori bound 0 < θ ⋆ < 1 which depends on the given problem.

Empirical results.
To underline our analysis, we considered three different problems: First (Section 4.5), we computed an example from [37] which considers finite elements for the Poisson model problem with some right-hand side f = div f and goal function g = div g for some piecewise constant vector fields f , g : Ω → R d .Essentially for all choices of adaptivity parameters 0.1 ≤ θ ≤ 0.9, we observed optimal convergence behavior of the goal-oriented adaptive algorithms, while standard adaptivity leads to a reduced order of convergence.Second (Section 5.5), we modified an example from [36] with a non-symmetric operator, where the goal is the evaluation of the flux for some finite element computation.All goal-oriented adaptive algorithms are robust with respect to the choice of 0.1 ≤ θ ≤ 0.9.Finally (Section 6.5), we considered an example in the frame of the boundary element method, where the goal was some local flux evaluation.Again, all goal-oriented adaptive algorithms are robust with respect to the choice of 0.1 ≤ θ ≤ 0.9 and lead to optimal convergence behavior.Throughout, our observation was that the new algorithm (Algorithm B) leads to the best results with respect to the cumulative sum of elements (42) which seems to be an appropriate measure for the overall computational performance to reach a prescribed accuracy.Although we did not observe that Algorithm C leads to suboptimal convergence rates for large θ, where Algorithm A and B still are optimal, we note that this has been observed in [20] for the point evaluation

Figure 1 .
Figure 1.Example from Section 4.5: The initial mesh T 0 (left) and the triangles T f (bottom left) and T g (top right) indicated in gray.An approximation to the primal solution (middle) and dual solution (right) on a uniform mesh with 256 elements, where the singularities of both are clearly visible.

Figure 4 .
Figure 4. Example from Section 4.5: Meshes generated by Algorithm A, B, and C as well as standard (non-goal-oriented) AFEM driven by the primal error estimator resp.the dual error estimator (from left to right) for θ = 0.5.

Figure 6 .
Figure 6.Example from Section 5.5: The left figure shows the geometry of the domain Ω, the support of the primal Dirichlet data (blue), the direction of the primal convective field (blue arrow), the support of the dual Dirichlet data (red), and the direction of the dual convective field (red arrow).The right figure shows the initial triangulation T 0 so that the inhomogeneous Dirichlet data belong to the discrete trace space S 1 (T 0 | Γ ).

Figure 9 .
Figure 9. Example from Section 5.5: Primal approximations U ℓ (top), dual approximations Z ℓ (middle) and adaptively generated meshes T ℓ (bottom) for ℓ ∈ {6, 12, 18} (from left to right) as output of Algorithm B for θ = 0.6 and ν = 10 −3 .Although we use a non-stabilized Galerkin scheme, initial oscillations in unresolved boundary layers are picked up immediately by the adaptive algorithm for both, the primal as the dual solution.

Figure 10 .
Figure 10.Example from Section 5.5: To study the robustness of the goal-oriented algorithm with respect to the diffusion coefficient ν ∈ {10 −3 , 10 −4 , 10 −5 } (left, from top to bottom), we plot the estimators η u,ℓ and η z,ℓ , the estimator product η u,ℓ η z,ℓ , as well as the goal error |N z (u) − N z,ℓ (U ℓ )| as output of Algorithm B with θ = 0.6 over the numbers of elements #T ℓ (left).We show some related discrete meshes with > 20,000 elements (right).

Figure 11 .
Figure 11.Example from Section 6.5 with conforming weight: Domain Ω with initial triangulation T 0 (upper left), exact solution u plotted over the boundary (upper right), exact primal and dual solution plotted over the arc-length, where s = 1 corresponds to the reentrant corner and s = 0.25 corresponds to z 0 (lower left), and adaptive mesh with #T 20 = 279 elements generated by Algorithm B with θ = 0.5 (lower right).

6. 5 .Figure 13 .
Figure 13.Example from Section 6.5 with conforming weight: Over different values of θ, we plot the cumulative number of elements N cum := ℓ k=0 #T k necessary for Algorithms A-C to achieve an error accuracy |g(u) − g(U ℓ )| < 10 −6 .

Figure 14 .Figure 15 .
Figure 14.Example from Section 6.6 with non-conforming weight: Counterexample to show that rescaling is necessary.We plot the output of Algorithm A for θ = 0.5 without rescaling of the estimators, i.e., η u = η ε u and η z = η ε u with ε = 0. We do not observe convergence at all.

Figure 17 .Figure 18 .
Figure 17.Example from Section 6.6 with non-conforming weight: Over different values of θ, we plot the cumulative number of elements N cum := ℓ k=0 #T k necessary for Algorithms A-C to achieve an error accuracy |g(u) − g(U ℓ )| < 10 −6 .We use the rescaled estimators (54) with ε = 0.3.
A3) Quasi-orthogonality: Let T ℓn be the (possibly finite) subsequence of triangulations T ℓ generated by Algorithm A, B, or C which satisfy the Dörfler marking on the refined elements, i.e.,