Fast and Exact Newton and Bidirectional Fitting of Active Appearance Models

Active appearance models (AAMs) are generative models of shape and appearance that have proven very attractive for their ability to handle wide changes in illumination, pose, and occlusion when trained in the wild, while not requiring large training data set like regression-based or deep learning methods. The problem of fitting an AAM is usually formulated as a non-linear least squares one and the main way of solving it is a standard Gauss–Newton algorithm. In this paper, we extend AAMs in two ways: we first extend the Gauss–Newton framework by formulating a bidirectional fitting method that deforms both the image and the template to fit a new instance. We then formulate a second order method by deriving an efficient Newton method for AAMs fitting. We derive both methods in a unified framework for two types of AAMs, holistic and part-based, and additionally show how to exploit the structure in the problem to derive fast yet exact solutions. We perform a thorough evaluation of all algorithms on three challenging and recently annotated in-the-wild data sets, and investigate fitting accuracy, convergence properties, and the influence of noise in the initialization. We compare our proposed methods to other algorithms and show that they yield state-of-the-art results, out-performing other methods while having superior convergence properties.


I. INTRODUCTION
A CTIVE APPEARANCE MODELS are generative models of shape and appearance widely used and studied in the field of Computer Vision, especially for facial landmark detection. First introduced by [1], AAMs formulate the problem of landmark detection as a non-linear sum of squares minimization. A linear model of both shape and appearance is built in a strongly supervised way and that model is aligned to a new instance to localize landmarks. Fitting an AAM to a new image is then done by reconstructing that object, i.e. reconstructing its appearance and deforming either the image in the forward framework or the template in the inverse framework so that the difference between the two is as small as possible. The deformation is modelled via a motion model, typically a piecewise affine warping, that warps the appearance from a given image to the mean shape. Finding the correct parameters for the affine warping is equivalent to localizing the landmarks on the face.
There are two main approaches to solving the AAM problem: regression based -the goal of which is to learn a function that maps directly the appearance features to the desired target, as the original AAM [1], [2], [3]-and optimisation based, which solve it analytically. In this paper, we focus solely on optimisation based methods which have been shown to produce state-of-the art results [4], [5]. In that case, the problem of fitting an Active Appearance Model is formulated as a non-linear least-squares one and is iteratively solved in a Lucas-Kanade fashion. Prior work has been focusing exclusively on Gauss-Newton methods in either the inverse or the forward framework.
The Lukas-Kanade algorithm was introduced in [6] for image alignment and an appearance-based version was introduced by Hager and Belhumeur [7]. It was first applied to AAM fitting by Matthews and Baker in [8] where they notably introduce the simultaneous inverse compositional (SIC) framework for fitting algorithms to solve the AAM problem. As its name indicates, it works in the inverse compositional framework in the sense that, at each iteration, it deforms the template to align it to the image and composing the inverse of the resulting warp update to the current image warp estimate. However, albeit robust and exact and although it has gained significant interest following the work of [8], its computational cost remained prohibitive for most applications [9], [10].
For that reason, the project-out inverse compositional (POIC) algorithm, introduced in [8] has been for a long time the preferred method for person specific AAMs. In contrast to SIC, POIC is a very fast yet approximate algorithm which has been shown unable to generalise well for the case of large appearance variations.
Besides SIC and POIC, fast versions of exact Gauss-Newton algorithms (both inverse and forward) were recently intro-duced in [11]. The proposed methods capitalize on results from optimization theory to provide solutions that are both exact and computationally efficient, making them prime choices.
Most recently, the authors of [4] introduced a part-based model, coined GN-DPM which is built in the same way as the Active Appearance Model but replaces the holistic appearance model by a more flexible, local, patch based one. This method has been showed to produce state-of-the-art results [12], [4], [5], even outperforming regression-based methods such as SDM [3] and its variants [13] while being more robust and more computationally efficient thanks to a sparse formulation.
Active Appearance Models and most recently GN-DPM are therefore widely used in practice, mainly owing to their ability to handle challenging pose illumination and occlusion conditions when trained in the wild. In addition, their generative nature makes it easy to build an instance of the model even with very few training images, which is extremely useful for person-specific modelling, such as in a tracking context [14].
In this work, we depart from the de facto standard approach to AAM fitting using Gauss-Newton optimisation and make several contributions: • For the first time (to the best of our knowledge), we propose two novel, fast and exact optimisation frameworks for AAM fitting. The first algorithm is a Bidirectional fitting approach which elegantly combines both inverse and forwards formulations. The second algorithm is a fast yet exact second-order method based on Newton Optimisation. • Naive derivations of these methods result in computationally heavy algorithms, in practice prohibitive for most applications. We show how to address this problem by exploiting the structure in the AAM problem to derive fast and exact solutions. • We derive these methods for both holistic and part-based Active Appearance Models in a unified framework and extend them to handle robust features. • We provide comprehensive experiments on three different datasets recently annotated in-the-wild, investigating both fitting accuracy and convergence properties. • We investigate their robustness to noise in the initialisation. • We provide comparison with the State-of-the-Art.
A preliminary version of the Newton and Bidirectional methods was previously formulated in [15] and [16] respectively for the simple case of intensity-based holistic Active Appearance Models.
In the rest of the paper, Sec. II introduces rigorously sparse and part-based Active Appearance Models, Sec. III quickly reviews prior work while Sec. IV introduces a unified objective function for fitting the models. Sec. V, details the derivation of the Bidirectional method to solve that problem. Sec. VI shows how the fast version of SIC and Forward can be derived for the weighted case as special cases of the Bidirectional problem and Sec. VII details the derivation of the Newton algorithm. The experimental setting, implementation details, results and analysis of these are presented in Sec. VIII.

II. BUILDING THE ACTIVE APPEARANCE MODELS
Active Appearance Models, be them holistic or part-based, are generative models defined by a shape model, an appearance model and a motion model: • Shape model: A linear model of shape, shared by both holistic and part-based AAM. • Appearance model: A linear model of appearance defined in some reference canonical frame that depends on the motion model used (also known as texture model). This appearance model is holistic for AAMs and partbased for GN-DPM / part-based AAMs. • Motion model: This is a function that warps the pixels from the image frame to the reference frame and can be a piecewise affine warp for holistic AAMs [8] or a simple translation one for part-based AAMs [4]. In this work we unify the formulations for both holistic and part-based AAMs and derive the solutions for all main optimisation methods.

A. Shape model
We assume that we have a dataset of D training images represented as functions of their pixels (I k (x, y)) k=1,··· ,D for which the coordinates (x, y) T of u landmarks have been annotated (typically manually). For a given object, the set of these u coordinates (x 1 , y 1 , · · · , x u , y u ) T ∈ IR 2u defines the shape of that object. The shape model is obtained by first aligning the training shapes by applying a generalised Procrustes analysis, which removes similaritiy transformations (translation, scaling and rotation). PCA is applied to these similarity-free shapes and the n − 4 resulting eigenvectors with the highest associated eigenvalues are kept to obtain the shape model defined by the mean shape s 0 and these eigenvectors. Since this model has been built on similarity-free shapes it is unable to model scaling translation and rotation. We address that by appending four similarity eigenvectors and re-orthonormalising the whole set of vector. Finally, we stack these n shape eigenvectors as the columns of the matrix S ∈ IR 2u,n . Instances of this shape model are then expressed as: with p = (p 1 , · · · , p n ) T ∈ IR n containing the shape parameters.
The shape model is built in the same way (as described above) for both AAMs and GN-DPMs. We now detail appearance model, that is built slightly differently for the two methods. However, for both method, the end result is a linear model of appearance, similar to the shape one, which can be summarised by a mean appearance and a set of appearance eigenvectors, allowing unified notations and abstracting away the difference between the two models.

B. Holistic Active Appearance Models
Holistic Active Appearance Models usually use a Piecewise Affine Warping as their motion model W. A piecewise affine warp each defined as follows: first both shape and mean shape are triangulated (eg a Delaunay triangulation). Each triangle in the target shape, together with its the corresponding triangle in the mean shape, define an affine transformation. The collection of all affine transformation defined by all triangle pairs defines the piecewise affine warp. The appearance model is then obtained by warping each training image to the mean shape s 0 which forms the base mesh and we denote V the set of the N pixels V = (v l ) l=1,··· ,N = (x l , y l ) T l=1,··· ,N inside that mesh. We then apply PCA on these flattened shape-free images to obtain the appearance model of which we again keep only the first m with the highest associated eigenvalues. The resulting appearance model is described by the mean appearance A 0 ∈ IR N and the appearance eigenvectors stacked as the column of an appearance matrix A ∈ IR N,m . Note that the appearance eigenvectors can also be considered as functions A i (x, y), i ∈ {1, · · · , m} of the pixel locations v = (x, y) T ∈ V. Instances of this appearance model can be expressed as: with c = (c 1 , · · · , c m ) T ∈ IR m containing the appearance parameters.
Let v = (x, y) ∈ V and s = (x 1 , y 1 , · · · , x u , y u ). The derivative of W(v, p) with respect to the shape parameter p depends on the shape vertices.
For more detail on how to compute the derivatives for the case of a piecewise affine please refer to [11].

C. Part-Based Active Appearance Model
Part-based Active Appearance Models on the other hand use a translational motion model W. First similarities are removed from the training images by warping them to a reference frame. Then, around each landmark, a patch of size N s × N s is extracted. The resulting u patches are concatenated and flattened to form a warped image of size u×N 2 s . The appearance model is then obtained in the same way described for holistic AAMs by applying PCA on that set of warped images, and again the appearance space is described by the mean appearance A 0 ∈ IR N and the appearance eigenvectors stacked as the column of an appearance matrix A ∈ IR N,m , with an instance of that model given by (2). As previously done for holistic AAMs, we denote V the set of the N = N s × N s pixels v = (x, y) T inside the patches, V = v l = (x l , y l ) T l=1,··· ,N . As now the motion model is a translational one, its derivative is simpler than that of a piecewise affine warping: with v = (x, y) ∈ V; where δ k v = 1 if v is in the patch extracted around s k , 0 otherwise and S k is the 2 × n matrix of parameters of the k th landmark.

III. BACKGROUND WORK
The problem of fitting an Active Appearance Model is traditionally expressed as a non-linear least squares problem: This problem has been previously solved using a Gauss-Newton method, either in the inverse framework or in the forward framework.

A. Inverse Framework
The Simultaneous Inverse Compositional algorithm solves (4) by linearising the model around a parameter p = 0 and computing at each iteration an optimal update ∆p.
The resulting optimisation problem is: where for all i = {0, · · · , n}, J Ai is the matrix of derivatives of A i with respect to p, with J Ai ∈ IR 1,n . All the terms will be introduced in more detail in the next section. Typically, (5) is solved over a single parameter p c ∈ IR n+m that combines both shape and appearance parameters appearance parameters. The shape parameter is then updated in an inverse compositional way, p = p • ∆p −1 . This results in complexity O((n + m) 2 N ) which is prohibitive for most applications [8].
Fast-SIC adopts a smarter approach in solving the same problem by capitalizing on optimization theory [17]. The result is an algorithm of O(nmN + n 2 N ) which is much less than O((n + m) 2 N ) for the original SIC. We generalise based on [17] to derive fast and exact solutions for our Newton and bidirectional AAM fitting algorithms.

B. Forward Framework
In the forward framework, the image rather than the template is linearized by re-writing the problem as: where J I is the matrix of derivatives of I(W(v l , q)) with respect to q, with J I ∈ IR 1,n .
Again, at each iteration, optimal updates ∆q and ∆c are obtained for the shape and the texture parameter, respectively.
The shape parameter is then updated in a forward additional way, q = q + ∆q.
Fast-Forward works in a similar way as Fast-SIC by also capitalizing on (14) to solve problem (6). Again, one can show that solving the above optimization problem has a cost O(nmN + n 2 N ) [11].

IV. UNIFIED OBJECTIVE FUNCTION
To formulate our Newton and Bidirectional methods, we first introduce here a unified framework in which we derive all methods by formulating the problem of fitting an Active Appearance Model as a more general weighted non-linear least squares problem. For this purpose we introduce a parameter q used to deform the image, not the template. Note that all the calculations are done in the coordinate frame of the mean shape.
The goal is then to solve the following optimization problem: where and W is a weight matrix, i.e. a diagonal matrix which diagonal elements W ll , l ∈ {1, · · · , N } define the weights associated with each pixel. In this work we set that ∀l ∈ {1, · · · , N }, W ll ∈ {0, 1}, therefore allowing for sparsity. In particular, we define a sparse grid over V by considering only every K -th pixel (in practice K = 2 or K = 4). This reduces drastically the speed as it divides by 2 or 4 the number of features of the appearance model and results in computationally much more efficient algorithms, with virtually no decrease in performance.
We now provide the derivatives needed to compute the forward, inverse and bidirectional algorithms. For all v ∈ V, the derivatives of g with respect to its different parameters are given by: • A i,x and A i,y are the x and y gradients of In addition, to derive the Newton method, we will need the second order derivatives that are given by: , p) contains the second order derivatives of A i (W(v, p): where A i,xx and A i,xy are the x and y gradients of A i,x (W(v, p)) and A i,yx and A i,yy are the x and y gradients of A i,y (W(v, p)).
Finally, since the second order derivative of the motion model W is null, the second order derivative of g with respect to p simplifies to

A. Vectorised form
We vectorise the calculations over all the pixels by rewriting: We denote N the number of pixels v ∈ V in the mean shape coordinate frame.
We can then write f as: We also stack the first order derivatives for each pixels into a vector form to obtain the following terms: Minimising f is usually done using the Gauss-Newton method. The main idea is to linearise, using a first order Taylor expansion, either the template around p = 0 as or the image as The former is called inverse framework while the latter is called forward framework. Note that the template is already linear with respect to the appearance parameter c.

By abuse of notation we will denote T[p], A 0 [p], A[p]
and I[q] by simply T, A 0 , A and I, respectively.

B. Robust descriptors
We present the results of holistic AAM and part-based AAM using robust features which prove more robust to changes in illumination and occlusion [5], [18]. This is easily done using vector notation by flattening, for each pixel, the descriptor vector and considering each of its items as additional pixels. Assuming a dense descriptor: that maps each pixel of an image to a descriptor of size N p , we can rewrite I as the vectorized flattened feature-image [Ψ(I(v))] v∈V . In this work we used SIFT features [19] which were shown to perform best [5]. In particular, we used a compact representation with N p = 8 where each feature is extracted from an eight by eight window as in [4]. Therefore, in the rest of the paper we will simply use the term AAM to mention SIFT-AAM.

C. Parameters update
The appearance parameter update is straightforward and, at each iteration, given an update ∆c, the texture parameter is updated as c = c + ∆c. In the forward case, at each iteration, an update ∆p is computed and the shape parameter is updated as p = p+∆p. In the inverse case, an update ∆q is estimated by deforming the template rather than the image and the update is done in an inverse compositional way as p = p • ∆q −1 . Note that in the case of a part-based AAM (or GN-DPM), composition update is equivalent to a simple addition [4] and p • ∆q −1 = p − ∆q.

V. FAST BIDIRECTIONAL ALGORITHM
We formulate here a bidirectional Gauss-Newton algorithm for AAM fitting that combines forward and inverse approaches and works by deforming both the image and the template at each iteration. Both template (11) and image (12) are linearised and the optimization is done jointly over ∆q, ∆p and ∆c: arg min ∆q,∆p,∆c The problem is solved by capitalizing on optimization theory [17] and using: Therefore, (13) is first optimised with respect to ∆c which yields using the projection operator P = (W − WA A T WA −1 A T ), where, as specified earlier, we write x 2 P to denote the weighted 2 -norm x T Px. We go on by optimizing (16) with respect to ∆q. This gives where the projected-out Jacobian and Hessian matrices are given by G q = PJ q ∈ IR N,n and H q = G q T G q ∈ IR n,n , respectively.
Next, we plug (17) into (16), to get the following optimization problem arg min where R = P(E − Q) and Q = G q H q −1 G q T . The final step is to optimize (18) with respect to ∆p. This gives: where the projected-out Jacobian and Hessian matrices are given by G p = RJ p ∈ IR N,n and H p = G p T G p ∈ IR n,n , respectively.
Finally the shape and appearance parameters are updated as q ← q • ∆p −1 + ∆q and c ← c + ∆c.
The overall complexity per iteration for computing these updates is readily given by O(nmN + n 2 N ).

VI. WEIGHTED FAST-SIC AND FAST-FORWARD
Having introduced our new bidirectional algorithm, Fast-SIC and Fast-Forward are simply special cases of (13) obtained by ignoring some of the terms.
FAST-SIC: With the unified notations, SIC can be obtained by ignoring parameter ∆q and solving the following simplified problem: arg min ∆p,∆c By using the same strategy as for bidirectional we obtain the following update rules: And for the shape parameter: with G p = PG p and H p = G p T G p .
FAST-Forward: Similarly, we rewrite (13) by ignoring the terms in ∆p and solve the following simplified problem: arg min {∆q,∆c} At each iteration, the optimal ∆c is given by The update for the shape parameters is: with G q = PG q and H q = G q T G q .
VII. FAST NEWTON ALGORITHM Newton differs from the previous Gauss-Newton based algorithms in that it performs a Taylor expansion to the second order of the whole objective function f rather than simply a first order expansion on Φ (in other words, it approximates the objective function with a quadratic function rather than approximating Φ with a linear one).
The Newton update rules for minimising f can be obtained by solving: We detail here the derivation of the Newton update rules for the inverse framework. The first order derivatives of f are easily derived from those of Φ: Since the Newton method uses the exact term for the Hessian and not only the Gauss-Newton approximation, we also need the second order derivatives of Φ and f .
We introduce the terms: From these we easily get the Hessians of f : Note that ∀l ∈ {1, · · · , N }, W ll × ∂ 2 g(v l ,p,q,c) ∂c∂p can be precomputed, leaving only a dot product to compute at each iteration. The cost of computing H f cp is therefore O(mnN ). The computational cost of H f pp is simply O(n 2 N ). We can now solve the original optimisation problem (26) and, using Schur's complement, the following update rules are obtained: The Gauss-Newton method can be derived from the Newton formulation by simplifying the second order terms and approximating the Hessian as

VIII. EXPERIMENTAL RESULTS
In this section we provide a comprehensive comparison of holistic Active Appearance Models and Part-Based Active Appearance Models for all four fitting algorithms: Fast-Forward (Forward), Fast-SIC (SIC), Fast-Newton (Newton) and Fast-Bidirectional (Bidir). We test the methods on three challenging datasets recently annotated with 68 landmarks in the same configuration as the Multi-Pie dataset: LFPW [20], Helen [21] and AFW [22]. We compare our Part-Based AAM with other state-of-the-art methods and with all competitors of the recently held 300 Faces In-The-Wild Challenge [23]. Comparison with the state-of-the art. Our bidirectional part-based AAM largely outperforms both SDM (intraface) [3] and Chehra [13].

A. Experimental setting
We conducted two different sets of experiments: first we compare all fitting algorithms presented in the paper for both Part-Based and Holistic AAM on three challenging datasets. In each case we initialised the algorithm using the bounding-box from the face-detector [22]. To make the experiments more realistic, and in order to empirically evaluate the robustness of each method, we added some random translation and scaling to the initialisation, defined by a standard deviation σ noise , following the same protocol as in [24]. We tested two scenarios: adding a small (σ noise = 1.5) and a larger (σ noise = 3) amount of random noise to these initialisations. Note that the noise is different for each image but the same for each method to allow for a fair comparison.
Second we compare against the state of the art of these three datasets for available state-of-the-art regression methods (Fig. 7) and on the 300-Faces In-The-Wild challenge against all the competitors, both industry and academia for both 51 and 68 points (Figures 8 and 9).
In the whole paper, the performance is measured in terms of the well-established normalised point-to-point error introduced in [22] and defined as the RMS error normalised by the face size (pt-pt-error) ( Figs 3,5,7,8,9). We also evaluate the convergence speed of the AAM fitting algorithms by measuring the averaged normalised point-to-point error over the whole dataset, at each iteration (Figs 4, 6). We report results in performance and convergence for both Part-Based and Holistic AAM for small and large noise in the initialisation.

B. General observations
Our Bidirectional Part-Based AAM matches or out-performs other state-of-the-art methods. It also largely out-performs classical AAMs, especially for accurately capturing the boundaries. Reconstructing these boundaries seems to be the hardest part for all methods and especially for the Forward algorithm. Bidirectional performs better than the other fitting methods while having superior convergence properties and being more robust to noise. The difference between SIC and Bidirectional is observed for both holistic and part-based AAM and is especially large for holistic AAMs. Newton performs similarly to SIC with better convergence properties in the case of partbased AAMs. As expected, it performs very well in the vicinity of the solution with a slight decrease in performance as the initialisations become more noisy. However, given enough iterations, all methods seem to converge to more similar solutions.

C. Implementation details
Holistic AAM: To increase performance, we used a multiresolution approach with two levels. The lower level has m = 50 appearance vectors and n = 11 shape vectors while the higher level has m = 400 appearance vectors and p = 25 shape vectors. We used a step of 2 effectively dividing by two the number of features.
Part-based AAM: We again used a pyramid of two levels with m = 70 appearance vectors and n = 15 shape vectors in the lower level. The higher level has m = 200 appearance vectors and p = 25 shape vectors. We also used a step of 4 effectively dividing by four the number of features. We found that Part-Based AAM worked as well with a larger step (here a step of 4) while the holistic model requires a smaller step (here a step of 2).
These parameters were obtained by performing a randomised grid-search over a small set of parameters and a small validation set. In all cases, both holistic AAM and part-based AAM where trained using the training sets of LFPW [20] and Helen [21]. Note that we never compute derivatives for all pixels v ∈ V but only for the subset of pixels {v l , l ∈ {1, · · · , N } ∧ W ll = 1}, i.e. we only store the points for which the corresponding weight is not null. That makes all our algorithms computationally much more efficient. On a standard desktop configuration, initializing the method with [22] takes on average 20 seconds per image, due to the nature of the algorithm, while an iteration of the Holistic and Part-Based AAM takes less than a second.

D. Small noise
We present results for 68 facial landmarks, which include boundary points 1 . This is particularly interesting as the boundary points are significantly harder to accurately detect and sometimes ill-defined, in particular for challenging cases such as those with large poses.
For Part-based AAM, we notice that in all cases, Forward performs slightly worse than others methods, Fig 3. We believe this is due to the fact that unlike for other methods, the gradients are extracted directly from the image and not reconstructed with a learned linear model. Therefore, the original boundaries can be potentially far off the correct solution and therefore be very different from the gradients learned from actual faces. Bidirectional consistently outperforms or matches the performance of other methods on LFPW (Fig 3a), Helen (Fig 3c) and AFW (Fig 3e), with a slight advantage for small errors. In term of convergence, Fig 4, there is a clear hierarchy, with bidirectional and Newton both converging much faster in all cases. Forward converges much slower in comparison.
Similar observations can be made for Holistic AAM for which the relative performance of the methods is very similar although the overall fitting accuracy is slightly worse. The advantage of bidirectional is even more noticeable in the case of holistic AAM, where it performs better than all other methods while its convergence advantage is even clearer, while Newton still performs as well as SIC but this time does not match the convergence speed of the bidirectional method.

E. Large noise
We noticed that, for small amounts of noise in the initialisation, SIC and Newton clearly behave best, with SIC following closely and Forward performing worst, on all datasets, for both performance and convergence. However, when increasing the noise, Fig 5, the performance of all methods decrease, but Bidirectional still converges much faster than SIC and Newton's while out-performing them (Fig 5a) or at least matching their performance (Figs 5c, 5e). As theoretically expected, the performance of the Newton method slightly deteriorates, making it significantly slower than Bidirectional, but still faster than SIC and Forward. Similar observations can be made for holistic AAMs with an even more impressive convergence speed for bidirectional which now clearly out-performs all other method in both fitting accuracy and convergence speed.
Finally, Fig 10 shows some representative examples of images taken from AFW along with the initialisation used and the fitted results obtained using this initialisation for each method.

F. Comparison with the state-of-the-art
We provide a comparison of our part-based AAM with stateof-the art methods. Fig 7 shows a comparison of our method with SDM [3] and Chehra [13] on all LFPW, Helen and AFW for σ noise = 3. The comparison was done in the same setting as the previous experiments, using the same bounding-box initialisations for all methods and σ noise = 3. Results are for the 49 interior points since these are the only landmarks returned by SDM and Chehra. Our method performs significantly better than both methods on all three datasets.
We also compare our methods to the recently published 300 Faces In-The-Wild challenge [23], on both outdoor and indoor images, for 51 and 68 points, Fig 8 and 9. We used the same performance metric as in the previous experiments and obtained the performance curves with that metric for the other methods directly from the organisers of the competition [23]. In order to handle the very large pose present in some of the challenge images, we trained three part-based AAMs, one for approximately frontal poses and two for extreme poses. We used the DPM head detector of [25], [22] to estimate the pose and initialise one of three pose-specific part-based AAMs. As can be seen from Figures 8 and 9, our partbased AAM performs remarkably well. Its performance is on par with that of Deng et al. [26], without employing any complicated multiple initialisation scheme. The work in [26] used a multi-view, multi-scale and multi-component cascade shape regression model using multi-scale HOG. So, to wit, the performance of the work in [26] is not due to the suitability of the proposed model to the task of facial landmark detection so much as it is due to complex engineering of the used algorithm which could also be used in our formulation, but this falls beyond the scope of this paper. On the other hand, the work in [27] outperforms our method in the case of very small errors. However, the opposite is the case for any error larger than 0.02. This is to be expected as the work in [27] is a submission from industry (Megvii company) using cascaded Deep Convolutional Neural Networks trained on undisclosed Fig. 10. Example of fitting results from the AFW dataset [22] obtained with a DPM. From left to right: initialisation (black), bidirectional (blue), SIC (yellow), newton (magenta) and forward (green). As illustrated, our SIFT-DPM performs remarkably well even for in-the wild images presenting challenging conditions of illumination, pose and occlusion, even with bad initialisations. Although Forward generally has a tendency to not reconstruct the boundary as well (row 3), it sometimes captures information missed by SIC, while sometimes both SIC and Forward fail to accurately locate the boundary pixels (row 4). Bidirectional advantageously combines the two approaches allowing it to locate correctly locate landmarks when SIC or forward fail (rows 3 and 4). Finally, Newton uses the Hessian to avoid local minimums and in some cases converges to a better solution (rows 3 and 4). datasets. In [28], a coarse-to-fine with a near frontal DPM is used and learned using structured output SVM, while [29] used a commercial face detector to initialise a structured output SVM-based method that fits a 3D shape model. Finally, [30] used a cascade of regressors modified to use an 2,1 norm and multiple initialisations. Our method outperforms all these methods while using three models trained on less than 1000 images and using the output of the DPM for initialisation.

IX. CONCLUSION
We proposed a unified framework for solving both holistic and part-based Active Appearance Models, in which we formulated new Bidirectional and Newton methods. We showed how to exploit the structure of the problem in order to derive exact and computationally efficient algorithms and extended them to handle robust features. We provided a comprehensive study of the performance and convergence of all fitting algorithms for both models on three highly challenging datasets and additionally provided comparison with other methods on these and on the recently published 300 Faces In-The-Wild Challenge database. Our Fast Bidirectional and Fast Newton part-based AAM out-perform or match the performance of other State-of-the-Art methods such as regression, while having superior convergence properties compared to existing AAM fitting algorithms. Going forward, we are planning to extend the same Bidirectional and Newton fitting strategies to the work of [24].