Principal nested shape space analysis of molecular dynamics data

Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean sub-space based PCA (Jung, Dryden and Marron, 2012, Biometrika). Sub-spaces of successively lower dimension are fitted to the data in a backwards manner, with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D sub-shape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.


Introduction
There are many notions of shape, and one of the most common is that the shape of an object is obtained by removing location, rotation and scale (Kendall, 1984). Analyzing the shapes of objects measured at sets of labelled landmarks is of interest in a wide variety of disciplines (Dryden and Mardia, 2016), and in many applications the dataset is large, either in sample size or in dimension, or in both. For example, in molecular dynamics simulations the three dimensional co-ordinates of a molecule of hundreds or thousands of atoms may be available for millions of observations. It is of interest to summarize the main features of shape variability, such as describing the main modes of variability and clustering the data into several states. A key aspect is to project the data into a low dimensional space which retains the main features of the signal in the data.
The most common approaches to dimension reduction in shape analysis involve projecting the data into a tangent space to the mean shape and then carrying out principal components analysis (PCA) in this Euclidean space (Kent, 1994;Cootes et al., 1994;Dryden and Mardia, 2016, Section 7.7). This approach has been successful in many applications over the past couple of decades, although if the data are very dispersed then the method can be problematic as the data may lie far from in a flat manifold. In addition, linear variation (corresponding to geodesics in the manifold) may not be the most appropriate or efficient summary of the variability. There have been several advances in dimension reduction on manifolds, motivated by shape analysis applications. Fletcher et al. (2004) developed principal geodesic analysis to the manifold setting by finding principal directions and variances in the tangent space and projecting back to the manifold using the exponential map. The above approaches are forwards fitting, in that the lower dimension representations are fitted before the higher dimensional ones. Huckemann and Ziezold (2006) proposed a method of PCA for Riemannian manifolds based on geodesics of the intrinsic metric, and Huckemann et al. (2010) developed an algorithmic method to perform intrinsic PCA on quotient spaces based on generalized geodesics. Kenobi et al. (2010) proposed two intrinsic methods of fitting minimal geodesics through shape data, and an extension to polynomial fitting. These intrinsic methods differ in their formulation from the tangent space methods, in that tangent space PCA involves maximising the explained variance for each linear component in the tangent space, whereas the intrinsic geodesic methods involves minimizing the unexplained variance in order to fit a geodesic subspace. The intrinsic methods are also forwards fitting in the sense that the first principal geodesic is fitted before the second principal geodesic etc., although there is a different notion of a mean which is the point that minimizes the variance of the projected data in the principal geodesic. Panaretos et al. (2014) introduce principal flow which is a curve passing through the mean of the data, where a particle moves along the principal flow in a path of maximal variation, up to smoothness constraints.
A very different backwards approach to subspace fitting was considered by Jung et al. (2012) who introduced a technique called Principal Nested Spheres (PNS) for sequentially decomposing a unit sphere into subspheres of successively lower dimension. This backwards approach to dimension reduction is the approach that we consider, which requires special adaptation to be applicable to three dimensional shapes and to large datasets, and these are our main methodological contributions.
The main motivation for this work comes from biomedical sciences, where it is of great interest to study the changes in shape of a protein, as its shape is an important component of a protein's function. Studies of this type are often approached through molecular dynamics simulations, which are large deterministic simulations using Newtonian mechanics to model the movement of a protein in a box of water molecules (e.g. Salomon-Ferrer et al., 2013). We consider a dataset of 100 independent simulation runs in the study of the small alanine pentapeptide (Ala 5 ) which consists of k = 29 atoms in R 3 at 10, 000 equal picosecond (10 −12 s) time intervals. Further details on this particular peptide are given by Margulis et al. (2002). The 100 runs all start with fairly similar (but different) configurations and the subsequent simulated configurations vary considerably over the 10000 time points (10 nanoseconds). Figure 1 shows example configurations at times near the beginning and end of the sequence for run 1, after removing the effect of rotation, translation and scaling using generalized Procrustes analysis (Gower, 1975;Goodall, 1991). As seen in the figures, the shape changes quite a lot with bending and straightening over the course of time.
Shape analysis can be considered an example of Object Oriented Data Analysis (Wang and Marron, 2007;Marron and Alonso, 2014), and the key initial question to ask is: "What are the data objects?" We have several choices available in our application, for example the data objects could be (i) the shapes of each individual molecule of 29 atoms in 3 dimensions or (ii) individual runs of 29 atoms in 3 dimensions observed at all 10,000 time points. The two respective choices of object space are (i) Σ 29 3 and (ii) (Σ 29 3 ) 10000 , where Σ k m is the shape space of k points in m dimensions (Kendall, 1984). For the application in this paper we will consider (i) for our data objects, and so the full sample size is n = 10, 000 × 100 = 1, 000, 000 in our case, whereas in (ii) the sample size would be n = 100.
It is of interest to describe the shape variability using a low dimensional representation and to examine if there are preferred states, i.e. clusters of shapes which are more commonly formed by the dynamic peptide. Also, the patterns of temporal transitions between states are of interest.
2. Shape PCA, principal nested spheres and principal nested shape spaces

Shape PCA
Consider k × m configuration matricesX i , i = 1, . . . , n, where n is the number of configurations. Shape PCA is similar to classical PCA (Jolliffe, 2002), except the shapes must first be projected into the Procrustes tangent space centred at the overall estimated mean shape of the data. Technical details of the Procrustes tangent space are given in Appendix 1. The observations are optimally aligned (by translating, scaling and rotating) using generalized Procrustes analysis (Gower, 1975;Goodall, 1991) and the full Procrustes sample mean is obtained as an estimate of the overall population mean shape (Dryden and Mardia, 2016, Equation (6.11)). The procedure is described in detail in Chapter 7 of Dryden and Mardia (2016) and is implemented in the shapes package in R (Dryden, 2018). Let T i , i = 1, . . . , n, be the Procrustes tangent projection of centred, scaled and rotated configurations ofX i at their Procrustes mean (Dryden and Mardia, 2016, Equation (4.33)), which are obtained using the command procGPA with option tangentcoords="partial" from the R package shapes. The principal components are the eigenvectors of the sample covariance matrix of T i , i = 1, . . . , n, and these eigenvectors and the PC scores are given in the output of procGPA. The method is equivalent to carrying out PCA using the extrinsic Frobenius distance between matrices in the pre-shape space after removing rotations using Procrustes registration.

Principal nested spheres
The analysis of the principal nested spheres for a given data set in a unit sphere S d is introduced in Jung et al. (2012). The main idea is that a high dimensional unit sphere is decomposed into successively lower dimensional subspheres using backwards fitting, and at each level the Euclidean PNS scores are obtained from the residuals. The method can capture non-geodesic variation, so if the data lie on a curved submanifold of the sphere, then the resulting variation after fitting the PNS can be linearly represented (Jung et al., 2012). PNS is an iterative At each step k, the method captures the variation of the data set in a lower dimensional subsphere and provides the best (d − k)-dimensional approximation U d−k to the data. Since U i is a great subsphere of S d if and only if U i is a unit sphere, the method extends the existing methods of finding principal geodesics for the data. The main ingredients of the method can be summarised as follows.
Any subsphere of S i of dimension i − 1 can be characterised by v ∈ S i and r ∈ (0, π/2] as where ρ(x, y) is the spherical distance between x, y ∈ S i given by ρ(x, y) = cos −1 ( x, y R i+1 ). For a data set {x 1 , . . . , x n } ⊂ S d , the best fitting (d − 1)-dimensional subsphere is defined to beÛ are the signed residuals on the sphere, whereˆ i,d−1 = i,d−1 (v 1 ,r 1 ). SinceÛ d−1 has radius sin(r 1 ), denote by x p i the projection of x i ontoÛ d−1 divided by sin(r 1 ), so that {x p 1 , . . . , x p n } ⊂ 1 sin(r1)Û d−1 . Since 1 sin(r1)Û d−1 is isometric with the standard unit sphere S d−1 , applying the above procedure to {x p 1 , . . . , x p n } we get the best fitting (d − 2)-dimensional (sub-)sphere to the data to bê where (v 2 ,r 2 ) is determined by (2.2) using {x p 1 , . . . , x p n } with d replaced by d − 1 and S d replaced by the (d − 1)-dimensional unit sphere 1 sin(r)Û d−1 , and where A d−1 (v 2 ,r 2 ) is defined on 1 sin(r)Û d−1 in a similar fashion to that in (2.1). Moreover, the corresponding residuals are The resulting fitting sequence {Û d−k | k = 1, . . . , d − 1} obtained by repeating this procedure is the so called principal nested spheres for the given data set {x 1 , . . . , x n }. Writing E(0) for the spherical coordinates of the final projection points inÛ 1 of the data, with respect to their Fréchet mean in that sphere (assuming it is unique), the resulting combined residuals give a representation of the data, where the ith column comprises the coordinates of x i in terms of the principal nested spheres. This representation is called the PNS coordinates of the data and can be used to interpret the structure of the data.

Principal nested shape-spaces
Principal nested spheres analysis for a data set on a sphere uses the particular structure of the sphere. In principle, this can be generalised to a sequence of fitting nested sub-manifolds for a data set contained in a manifold. However, it is generally not straightforward. For the analysis of shape variation of a given set of configurations, we propose a method of applying the technique of PNS to obtain principal nested sub-shape spaces (PNSS) for shapes of m-dimensional configurations.
We wish to describe the shape variability of the peptide data, and so this requires removing information about translation, scale and rotation. For a given configurationX in R m with k > m labelled landmarks that are not all identical, the pre-shape X of the configuration is obtained fromX by removing the effects of translation and scaling. The pre-shape X can be represented by a (k − 1) × m matrix of unit norm (Kendall, 1984) and it is known that the pre-shape sphere S k m consisting of all pre-shapes of such configurations is the entire sphere S m(k−1)−1 . Then, the shape [X] ofX is the equivalence class of X under rotation, i.e. [X] = {XR | R ∈ SO(m)}.
In order to remove rotations we consider the quotient space of the pre-shape sphere S k m with respect to rotations SO(m), and this shape space is complicated for m ≥ 3 being non-homogenous and containing singularities (Le and Kendall, 1993). However, practical statistical analysis can be carried out by identifying the tangent space T X0 (S k m ) to the pre-shape sphere at a point X 0 as comprising two orthogonal real sub-spaces: the vertical and horizontal tangent spaces. The vertical tangent space V X0 contains the rotation information and the horizontal tangent space H X0 contains the shape information, and the latter is often called the Procrustes tangent space (Kent and Mardia, 2001). Working in this horizontal tangent space forms the heart of most practical statistical analyses of landmark shapes.
To develop the method of PNSS we need to make some further identifications. We consider the subset of the pre-shape sphere S k m which is orthogonal to V X0 , and denote it by S X0 which a great sphere of dimension (k−1)m−m(m−1)/2−1. In fact S X0 is the image of H X0 using the exponential map onto the sphere. If we now have a datapoint X = X 0 on the pre-shape sphere then we can obtain the Procrustes fit as the solution to minimizing the great circle distance ρ(XR, X 0 ) over rotations R. The Procrustes fit is denoted by S X = XR X with the fitted rotation matrix R X , and for all X ∈ S X0 the Procrustes fits S X lie within a half-sphere of S X0 , because the distance ρ(S X , X 0 ) ≤ π/2. The Procrustes fit will be unique if the rank of X is at least m − 1 and [X] is outside the cut-locus of [X 0 ]. In the case of uniqueness, which will often be the case for practical data, we can identify the subset of the sphere with a bijective map with the subset of shape space is non-singular and not in the cut locus of [X 0 ]}.
The subsets B X0 and B [X0] are diffeomorphic to each other. The practical implication of this identification is that any sub-manifolds in B X0 are mapped to sub-manifolds in B [X0] .
Definition: Given a sequence of principal nested spheres on S X0 (as defined in Section 2.2), the intersection of the sequence of principal nested spheres with B X0 maps to a sequence of sub-manifolds in B [X0] , which we define as a sequence of Principal Nested Shape Spaces. The final point in the sequence, with dimension 0, corresponds to the PNSS mean shape.
We now consider the practical implementation of principal nested shape spaces for the peptide shape data in m = 3 dimensions. First of all an overall Procrustes meanX is obtained using generalized Procrustes analysis. We use the function procGPA in the shapes package in R (Dryden, 2018). For our dataset each Procrustes fit is unique, as will generally be the case for most practical datasets. We then apply PNS to the Procrustes fitted data on SX , and the resulting PNS subspaces intersections with BX will be have a diffeomorphic mapping to the PNSS subspaces in B [X] , and the PNS scores are equal to the PNSS scores.
PNSS uses the relationship between the pre-shape sphere S k m and the shape space Σ k m to construct a nested sequence of sub-shape-spaces in a given shape space from a sequence of nested subspheres, as constructed in the previous section, in the corresponding pre-shape sphere. Despite not being strictly analogous to the concept of principal nested spheres, this approach does offer an extrinsic method to apply that concept. The m = 2 dimensional case was discussed by Jung et al. (2012), where complex arithmetic leads to a simple adaptation of PNS to planar shapes, where the shape space is the complex projective space CP k−2 , which is a homogeneous space (Kendall, 1984).
A full technical construction of PNSS is given in Appendix 1.

Approximate PNSS using PC scores
When a dataset is large, the numerical computation required for the principal nested spheres analysis, and so for the principal nested sub-shape-spaces analysis, usually tends to be slow. In fact the time required to compute PNSS on large datasets can be extensive. Much of the time is spent fitting the higher dimensional spheres, which are taking account of small amounts of noise. An alternative approximate approach is to first carry out PCA and retain the first p PC scores, where these scores capture a high percentage of the variability. If the Procrustes fits of the original data configurations to their mean shapeX are close to a great subsphere of a lower dimension than the dimension of BX , the speed of computation can be much improved by first using the technique of principal component analysis to determine this great subsphere and then using the projections to this subsphere as the input for the principal nested sub-shape-spaces analysis.
The technical details of the method of approximate PNSS using PC scores are given in Appendix 2.
A practical implementation of PNSS using PC scores is given in the shapes package in R using the command pns4pc (Dryden, 2018), and this code is used in our peptide application. Output from the R command pns4pc includes the PNSS scores, where the first PNSS score is a circular variable and the remaining PNSS scores are Euclidean. In our application we shall see that the clusters of peptide states are much more apparent in the PNSS scores compared to the PC scores. The percentage of shape variability explained by the PNSS scores is helpful to explore the efficiency and effectiveness of the method, and we will see that the first few PNSS scores explain much more shape variability than the first few PC scores.

Molecular dynamics data
The data set consists of a total of 100 runs, with each run consisting of k = 29 landmarks in m = 3 dimensional space from a small peptide (Ala 5 ) evaluated at consecutive n = 10000 times at picosecond intervals. As the time interval between consecutive configurations is very small, we first thin the data in time to give 100 configurations at equally spaced times from each run. The thinned times are t 1 = 1, t 2 = 102, t 3 = 203, . . . , t 99 = 9899, t 100 = 10000. We consider temporal sequences of each run and register using generalized Procrustes analysis (Gower, 1975;Goodall, 1991).
The starting configuration for each peptide is quite similar at the start of each run with the Riemannian shape distance ρ between pairs of runs primarily less than 0.3 radians, where 0 ≤ ρ ≤ π 2 . The variability in shape between runs is much larger at the end (Riemannian distances up to about 1.1), as see in Figure  2.

Shape PCA and PNSS
We first run shape PCA on the thinned 100 configurations from each of 100 runs (10000 configurations in total), Much of the data variation can be much (a) Histograms q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q explained by first several principal components as shown in Figure 3. We choose the first ten shape principal components for the computation of PNSS using PC scores since those ten explain 90.1% of the shape variation. The percentages of shape variability captured by the first three individual PC scores are 28.7%, 16.4%, 12.7% and by the PNSS scores are 65%, 9.2%, 3.9%. We carry out the PNSS analysis to sequentially reduce the dimension of the projected data from S 10 to S 1 to S 0 . The percentage variability explained by the first two PNSS scores (74.2%) is much higher than the first two PCA scores (45.1%). The PNSS score 1 is a circular variable, where the most negative score and most positive PNSS1 score are identified with each other (at the cut point of the circle). The higher PNSS scores do not have the extreme points identified with each other.
We calculate PNSS1, PNSS2 and PNSS3 coordinates of all the 1,000,000 configurations through the model constructed using the thinned 10000 configurations. The pairwise 2D plots of first three PNSS and PC scores are in Figure  4. A vertical long cluster forms all over PC2 between small range of PC1. On the other hand, in PNSS1-PNSS2 plot three salient clusters are close to each other and one cluster is positioned on the left side. For this particular peptide there are believed to be four preferred shape states, and these can be clearly seen in the first PNSS plot, whereas it is not evident with PCA. Hence, there is a clear advantage in using PNSS compared to PCA in this dataset.
The relative variation of PNSS1 to PNSS2 is much higher than that of PC1 to PC2. Since first two PNSS components account for high proportion (74.2%) of the shape variability, the subsequent components from PNSS3 are far less important. Both left and right tails in PNSS1 should be considered connected, given this is a circular variable.

Cluster analysis
Given that there are believed to be four preferred states we now consider cluster analysis. The result of four-group clustering is shown in Figure 5, where hierarchical clustering analysis using Ward's method is used (Ward, 1963) but with distances rather than squared distances (ward.D in R). In (a) the cluster analysis was performed on (PC1, PC2, PC3) and in (b) it was performed using the great circle distance on S 10 then displayed in terms of (PNSS1, PNSS2) coordinates. Three PCs in (a) are needed for separating them into four groups, on the other hand two PNSSs in (b) are enough to split them clearly. However when looking carefully at Figure 4 the four clusters do not stand out in the PCA plot even with three PCs, but they are clearly distinct in the PNSS1-PNSS2 plot. Plotting PNSS1 and PNSS2 with the colour scheme from PCA clustering is given in (c), and we see that the result is similar. The most dense part of the blue cluster in (b) is where the starting configurations for each of the runs are located. Ward's method was our preferred clustering method, giving good practical separation into four groups in our application. In (d) we plot the PNSS scores on S 2 using the PNSS clustering. The solid red line is the estimated first principal arc, on the other hand the solid-dotted red line is the estimated second principal arc where the solid line indicates the PNSS2 values. In (e) we plot the same data as in (d) except we use the PCA cluster colours, and the clustering in the plots (d) and (e) are very similar. If we carry out the clustering on other sets of variables such as PNSS1, 2 then the broad pattern is similar, particularly near the main modes for the four clusters.

Structure of PNSS variability
For each PNSS we can obtain an interval on S 10 aŝ µ ± cs j e j , j = 1, . . . , 10, ( 3.1) whereμ is the PNSS mean, c is a constant, s j are standard deviation of jth PNSS scores and e j are the direction of jth principal arc. As seen in Figure 3, s 1 = 0.5675, s 2 = 0.2131, s 3 = 0.1384 and we used c = 1. We display the first three principal arcs in landmark space as shown in Figure  6 and we display both the PNSS mean (corresponding to PNSS component 0) and the Procrustes mean for comparison. The black dots with connected black lines indicate the Procrustes mean. The PNSS mean is given by rainbow coloured points (with the rainbow colour indicating point order). We can see that the PNSS mean and Procrustes mean shapes are visibly different. In addition coloured rainbow arcs have been drawn from the PNSS mean along the PNSS score 1 direction to the ends of the interval (3.1). The grey points indicate the location at c = 1 along the arc, and these are joined by grey lines which give an idea of the structure of variability in the PNSS. This plot shows considerable amounts of twisting along the arcs in PNSS1, and a smaller amount of different bending in PNSS2 and PNSS3.
We see that the first PNSS score demonstrates an articulated type of movement, sweeping out arcs. The PNSS mean is clearly different from the Procrustes mean. The PNSS description of variation is particularly appropriate here as bonds have fixed lengths, and local rotational movement is much more practical than linear movement to and from the Procrustes mean, as in shape PCA.
In order to examine the shapes of the clusters, we plot the PNS mean and Procrustes shapes of each cluster in Figure 7, together with the PNSS 1 arcs for each cluster. The cluster 1 mean has two clear bends in the peptide, clusters 2 and 4 means have a single bend each in different positions, and cluster 3 mean is broadly straight with no bends. These four cluster means are quite different, and they illustrate the wide range of shape configurations that the peptides visit during their temporal sequences. Also, cluster 1 is clearly more variable, with larger PNSS 1 arcs than the other clusters. Also, the PNSS mean and Procrustes mean are more different in cluster 1. (c) Clustering on (PC1, PC2, PC3) and representation in terms of (PNSS1, PNSS2). PNSS1 is a circular variable, with −1.246 and 1.246 being equivalent, as the cut-point of the circle.

Temporal clustering
A graphical view of shape changing pattern within these four groups over time can be seen as in Figure 8(a) for run 55. This run changes shape regularly but some other runs have different patterns, for example run 66 changes very much during the middle times, run 25 does not visit the yellow cluster, and so on. All runs are shown in Figure 8(b). Since all 100 runs behave with different dynamic patterns, we ran cluster analysis on 100 transition matrices within these four locations. To this end we estimate transition matrices for each run, T j , j = 1, . . . , 100, using empirical probabilities and we use the Hellinger distance on transition matrices defined by The overall transition matrix P estimated from all the runs is given in Table 1, indicating the conditional probability from a cluster (rows) to another cluster (columns). The transition jumps are evaluated at each 100 time points on the original scale, i.e. for the thinned data. The overall equilibrium distribution of the chain is given in Table 2, obtained from P n as n → ∞. Overall we see that more time is spent in Cluster 3 and less in Cluster 4. There is a preference to move from Cluster 1 to either Cluster 2 or 4 first, before then transitioning to Cluster 3 as the next most likely move. So, we can see that the movement between the states is asymmetric, and there are preferred orderings of transitions between states.   According to the first panel in Figure 9 showing the average distance between clusters divided by average distance within clusters, we split the transition matrices into four groups (temporal clusters) TC1 to TC4, by hierarchical clustering using Ward's method with the Hellinger distance. Focusing on the finishing locations where runs terminate, we estimated the final probability for each location. In the bottom panel, the labels Cluster 1 to Cluster 4 on x-axis denote the finish location labels which correspond to four groups in Figure 5 (b). Again we see that Cluster 3 is seen as finish location with high probability overall and relatively fewer runs terminate at Cluster 4.
We also estimate the transition matrices with TC1,TC2,TC3 and TC4 and display the equilibrium probabilities in Table 2. Simulation runs in TC1 have a relatively strong pull to Cluster 1, runs in TC2 have a pull towards Cluster 3, runs in TC3 have a pull towards Clusters 3 and 4, and runs in TC4 have a very strong pull towards Cluster 3. This behaviour is also seen in the estimated probability of final locations in Figure 9.
Hence it is of interest that the molecular dynamics simulations do exhibit different behaviours in the runs, with TC1 being particularly different from the  Fig 9. TC1 -TC4 denote temporal cluster labels after clustering transition matrices and Cluster 1 -Cluster 4 on x-axis denote the finishing location labels after clustering on the S 10 data.
other temporal clusters. The analysis is dependent on the choice of temporal thinning. We have chosen to thin to every 100 observations due to the very fine temporal resolution in the original data. There would be some small periods spent in other clusters between the thinned values, but the choice of 100 gives a good compromise between excessive long times spent in a cluster (too little thinning) versus brief/missed visits to clusters (too much thinning).

Discussion
We have developed the method of principal nested shape spaces as an extension of principal nested spheres Jung et al. (2012), and the peptide application clearly demonstrates the utility of the method. The technique has been applied to other datasets, including molecular dynamics simulations of enzyme data. Again useful insights are obtained from the backwards method for the enzymes data that are different from using conventional tangent space shape PCA, which is a forwards method that requires estimation of the mean shape at the outset. The general approach to backwards PCA and principal nested manifold relations is discussed by Damon and Marron (2014). Further related work includes Pennec (2016) who discusses barycentric sub-spaces, where sub-manifolds are generated from weighted means of reference points.
The data are available at: http://www.maths.nottingham.ac.uk/∼ild/pnss the effects of translation and scaling. The pre-shape X is a (k − 1) × m matrix of unit norm, and the pre-shape sphere is denoted S k m . The tangent space T X (S k m ) to S k m at any X ∈ S k m is the subspace of M(k − 1, m) given by where M(k − 1, m) denotes the space of all (k − 1) × m matrices. In terms of the quotient map from the pre-shape sphere S k m to the Kendall shape space Σ k m of configurations in R m of k labelled landmarks, we can decompose T X (S k m ) into the direct sum of two orthogonal subspaces, one of which is tangent to the equivalence class of X. We denote this subspace of T X (S k m ) by V X . Its orthogonal complement H X is the Procrustean, or the horizontal, sub-tangent space at X. It is known that (Kent and Mardia, 2001) and that H X is isometric with the tangent space at [X] to the shape space Σ k m . For any 1 j 1 < j 2 m, write E j1j2 for the m × m skew-symmetric matrix with (j 1 , j 2 ) component 1, (j 2 , j 1 ) component −1 and 0 otherwise. Then, for any pre-shape X 0 ∈ S k m with rank(X 0 ) = m, so that X 0;j1,j2 = X 0 E j1,j2 / X 0 E j1,j2 ∈ S k m . Moreover, {X 0;j1,j2 | 1 j 1 < j 2 m} spans V X0 .
Clearly, X 0 ∈ S X0 and the dimension of S X0 is the same as that of the shape space Σ k m . It can be checked that S X0 is the image of H X0 under the exponential map at X 0 .
For any pre-shape X ∈ S k m \ {X 0 }, we apply the ordinary Procrustes analysis procedure to choose S X = XR X ∈ [X], where R X = argmin R∈SO(m) ρ(X 0 , XR).
• The resulting S X is usually called the Procrustes fit to X 0 of the original configuration; • S X has the same shape as X; • X 0 S X is symmetric; • ρ(X 0 , S X ) π/2; and • the spherical distance distance, ρ(X 0 , S X ), between S X and X 0 is the same as the induced Riemannian shape distance between the shapes of X 0 and X.
For given X 0 and X, the corresponding R X , and hence the corresponding S X , is not necessarily unique. The uniqueness of S X , or R X , is equivalent to the shape [X] being within the non-singular part of Σ k m but outside the cut locus of the shape [X 0 ] (Le, 1991). Hence, the subset B X0 = {S X ∈ S X0 | X ∈ S k m and S X is unique} of S X0 is topologically a ball in S X0 containing X 0 and is bijective, under the quotient map, with the subset is non-singular and not in the cut locus of [X 0 ]} of Σ k m . Note that [X] being non-singular is equivalent to it being shape of a configuration whose pre-shape matrix X has rank at least m − 1. Since the quotient map is differentiable and since B X0 and B [X0] are respectively submanifolds of S k m and Σ k m , it follows that these two sets are diffeomorphic with each other. Thus, any sub-manifold in B X0 ⊂ S X0 is mapped to a sub-manifold in B [X0] ⊂ Σ k m . In particular, the intersection of B X0 with a subsphere of S X0 is mapped to a sub-manifold of Σ k m , so that the intersection of B X0 with a sequence of nested subspheres of S X0 is mapped to a sequence of nested sub-manifolds of Σ k m . However that these maps are not isometric. Nevertheless, the great circle arc through X 0 is mapped to a geodesic in Σ k m through [X 0 ].

Appendix 2
Approximate PNSS using PCA Consider configurationsX i , i = 1, . . . , n in R m of k labelled landmarks with Procrustes mean shapeX. Their pre-shape matrices are X i , i = 1, . . . , n, and their Procrustes fits toX are S i = S Xi , i = 1, . . . , n. Let T i , i = 1, . . . , n, be the Procrustes tangent projection of X i atX (Dryden and Mardia, 2016, (4.33)), i.e. T i = S i − X , S i X = S i − tr(X S i )X ∈ HX ⊂ TX (S k m ), and v j be the unit eigenvector corresponding to the jth largest eigenvalue of the covariance matrix of T i , i = 1, . . . , n, where y = vec(Y ) is the vectorize operation of stacking the columns Y into a single vector y,T = 1 n n i=1 T i and V j = vec −1 m (v j ), where vec −1 m (y) = Y is the inverse vectorise operator forming a matrix Y of m columns. Then, X , V j = 0 and V j ∈ HX . For any 1 p < m(k − 1) − m(m − 1)/2 − 1, the p-dimensional unit sphere SX ;V1,...,Vp in the (p + 1)-dimensional linear subspace spanned bȳ X, V 1 , . . . , V p is a p-dimensional subsphere of SX . A minimal practical requirement is that the first p eigenvalues will be strictly positive. If we choose p such that the first p principal components explain a high proportion of the total variation of the data then, without much loss of information, the use of the projection of S i to SX ;V1,...,Vp as the input for the principal nested sub-shape-spaces analysis will reduce the initial dimension of the sphere and improve the speed of computation.
The projection of S i to SX ;V1,...,Vp can be approximated by using the shape principal component scores, where the shape principal component score for the ith configuration on the jth principal component is given byλ ij = T i , V j , i = 1, . . . , n, j = 1, . . . , m(k − 1) − m(m − 1)/2 − 1. For this, let Then, W i is the image of S i under the inverse exponential map of S k m atX, it lies in HX and, in particular, W i = ρ(X, S i ). By writing the projection of W i in the subspace, spanned by V 1 , . . . , V p , of the tangent space TX (S k m ) is (4.1) The image S * i of U i under the exponential map back in the pre-shape sphere is where, if U i = 0, we take S * i = (1, 0, . . . , 0). When S i is sufficiently close to SX ;V1,...,Vp , S * i gives a good approximation to the projection of S i to SX ;V1,...,Vp .