Distribution on Warp Maps for Alignment of Open and Closed Curves

Alignment of curve data is an integral part of their statistical analysis, and can be achieved using model-or optimization-based approaches. The parameter space is usually the set of monotone, continuous warp maps of a domain. Inﬁnite-dimensional nature of the parameter space encourages sampling based approaches, which require a distribution on the set of warp maps. Moreover, the distribution should also enable sampling in the presence of important landmark information on the curves which constrain the warp maps. For alignment of closed and open curves in R d , d = 1 , 2 , 3, possibly with landmark information, we provide a constructive, point-process based deﬁnition of a distribution on the set of warp maps of [0 , 1] and the unit circle S that is (1) simple to sample from, and (2) possesses the desiderata for decomposition of the alignment problem with landmark constraints into multiple unconstrained ones. For warp maps on [0 , 1], the distribution is related to the Dirichlet process. We demonstrate its utility by using it as a prior distribution on warp maps in a Bayesian model for alignment of two univariate curves, and as a proposal distribution in a stochastic algorithm that optimizes a suitable alignment functional for higher-dimensional curves. Several examples from simulated and real datasets are provided.


Introduction
In contrast to standard multivariate analysis, the concept of phase variation is a unique feature of functional data. For functional data obtained as parametric curves representing geometric objects in high-resolution images, establishing correspondence between points on the curves is an important task. Failure to isolate and quantify variation due to phase or lack of correspondence between points can be detrimental when computing descriptive summaries, and for subsequent inferential tasks. This process of isolation and quantification is referred to as registration or alignment.
There exist several approaches to alignment. One popular approach is continuous monotone alignment, which refers to the alignment of two curves g i : where D is a compact domain, by estimating a homeomorphic self map γ : D → D, known as a warp map, that best matches g 2 • γ to g 1 (or vice versa). A variational formulation quantifies the matching through a cost or energy functional, and alignment is defined as the determination of an optimal warp map γ * from a class W that minimizes the cost functional.
Alternatively, a statistical model-based formulation of the alignment task seeks to estimate the warp map γ based on a likelihood function defined using some discrepancy measure between g 1 and g 2 • γ, conditional on γ.

Motivation and related work
The focus of this work is on the construction of a probability distribution on W and development of a simple sampling scheme, which would enable quantification of uncertainty on the optimal alignment of open and closed curves using any of the available continuous monotone methods. The need for a distribution enabling stochastic approaches to alignment arises in two contexts: (1) since the parameter space of warp maps is a non-linear function space, deterministic algorithms for solving the variational problem can get stuck in local minima, and (2) in a Bayesian model a prior distribution on W is required. The construction and the desirable features of the distribution are motivated by the following common modeling scenarios that arise in the analysis of biomedical and biological datasets.
Boundaries of data objects such as organs in high-resolution medical images are often represented as parametrized open curves (connected curves that begin and end at different points) or simple closed curves (connected curves that do not cross themselves, and begin and end at the same point). In certain cases, explicit physiological information can constrain the warp maps. In Figure 1, the left panel shows two open curves (univariate functions) g i : [0, 1] → R, i = 1, 2 representing two electrocardiogram (ECG) cycles with corresponding PQRST complexes marked as fixed landmarks (maxima and minima marked on each cycle).
For example, the point Q on g 1 at t 1 is to be registered to the corresponding Q point on g 2 attained at t 2 (so do the feature points P, R, S and T). This introduces constraints on any warp map γ : [0, 1] → [0, 1]: γ(t 1 ) = t 2 to ensure the matching of the Q point across the two functions (additional constraints are used to ensure the matching of the P, R, S and T points as well); between the landmarks, γ is unconstrained. The example in the right panel shows the outlines of two stingrays, represented as embeddings of the circle of unit circumference S in R 2 , or equivalently as planar closed curves g i : S → R 2 , i = 1, 2. The snout of the stingrays (marked in red) as well as other landmark points (marked in green) are to be matched. This again imposes constraints on a warp map γ : S → S requiring exact matching of landmarks with unconstrained matching in-between.
There are two complementary requirements of the alignment procedure in the above scenarios: (1) uncertainty around the 'best' alignment needs to be captured, and (2) constraints due to landmark points need to be automatically incorporated into the alignment procedure.
Operationally, this demands a probability distribution on W that satisfies the following.
(i) An alignment task with m constraints can be decomposed into m+1 unconstrained ones by performing independent alignment of the curves across m + 1 subsets of the domain D. This requires the initial constrained alignment problem to satisfy the desirable subset invariance condition: if γ * : D → D is the optimal warp map and if γ * B : B → B, B ⊂ D is the optimal map when alignment is performed only on a subset B induced by landmarks, then γ * = γ * B on B ⊂ D (Trouvé and Younes, 2000). In the presence of landmark-induced constraints, a probability distribution π on W is said to satisfy subset invariance if its restriction π B to warp maps of B ⊂ D is a suitably re-scaled version of π, and is independent of π D\B .
(ii) For efficient exploration of W , the distribution should be flexible enough to be centred at any warp map of choice. For example, in the Figure 1 examples, as a first step, one can align the two curves only at the constraint points with a piecewise linear (PL) warp map. Alignment of the remaining regions can be carried out by sampling in the neighborhood of the PL warp map within a stochastic algorithm, or by employing a prior distribution centred at the PL warp map in a Bayesian model.
Literature on alignment has mostly focused on functional data defined on a closed interval [a, b]: for frequentist approaches see Kneip and Gasser (1992); Gervini and Gasser (2004); Gasser and Kneip (1995); Zhou et al. (2014) and Tang and Müller (2008); for Bayesian approaches see Telesca and Inoue (2008); Cheng et al. (2016); Claeskens et al. (2010); Lu et al. (2017) and Kurtek (2017). Alignment methods for closed curves with D = S are conspicuous by their absence within statistics literature; a notable exception is the differential-geometric approach of . For a good account of variational strategies for alignment of closed and open curves, we refer the reader to Srivastava and Klassen (2016).
Landmark-constrained alignment, under a geometric, non-stochastic framework, was only recently studied (Strait et al., 2017;Bauer et al., 2017). In those methods, it is not possible to capture or model uncertainty in the optimal alignment. As far as we know, stochastic approaches to curve alignment in the presence of landmarks have not been considered before.

Contributions
In a Bayesian model for alignment, Cheng et al. (2016) define a finite-dimensional Dirichlet distribution on discretized warp maps as a prior distribution; this results in a finitedimensional specification of the prior instead of a functional one. We first prove that their construction results in a limiting degenerate distribution on W as dimensionality increases (Theorem 1). We further establish degeneracy of the limiting distribution for a class C of finite-dimensional distributions that generalize the Dirichlet (Corollary 1).
Employing point process methods, we propose a modification of their finite-dimensional specification that results in a constructive definition of a non-degenerate distribution on the set of warp maps of [0, 1] and S that satisfies requirements (i) and (ii) (Theorem 2). We show that the distribution is a canonical one in the sense that it remains the limiting distribution for all finite-dimensional specifications from a class C. The distribution is related to the Dirichlet process (Ferguson, 1973) on the set of probability measures on [0, 1] (Remark 2). Our approach provides an explicit link between the self-similarity-and Markov-type properties of the Dirichlet process and requirements (i) and (ii) (Proposition 1). The distribution can be parametrized by a concentration parameter θ > 0 whose value determines how close the probability mass is distributed around the chosen average warp map. The warp maps sampled from the distribution are discontinuous with probability one. We show that this is unavoidable if one insists on subset invariance in requirement (i). The constructive definition, along with the concentration parameter θ, identifies the finite-dimensional projections (coordinates) of the distribution with a Dirichlet distributed random vector, with parameters that depend intimately on the discretization and the choice of the average warp map. This leads to a simple algorithm for sampling PL warp maps.
Finally, we propose a novel stochastic algorithm based on the proposed distribution for alignment of open and closed curves in R k , k = 1, 2, 3, possibly with landmark constraints.
We elucidate on the importance of requirements (i) and (ii) on the distribution through several simulation and real-data examples. In addition, we employ the proposed distribution in a Bayesian model for alignment, similar to the one used by Cheng et al. (2016).
The rest of this paper is organized as follows. Section 2 examines the algorithm of Cheng et al. (2016) and details its shortcomings. Section 3 modifies the preceding construction, and proposes a theoretically-justified approach for warp maps of [0, 1]; the properties of the constructed distribution are studied in Section 4. Section 5 extends the construction mechanism to a corresponding distribution on warp maps of S. Section 6 presents sample warp maps under different settings, and results from different analyses of real open and closed curve data, possibly with landmark constraints, under a Bayesian model and using a novel stochastic algorithm (Section 6.2.2). Section 7 discusses extensions of the proposed methods.
The Supplementary Material contains proofs of all results, an alternative construction of a distribution on warp maps of S, and detailed descriptions of the datasets used in this work.

Construction using fixed partitions and issues
We first consider the construction of a distribution on the set of warp maps of a closed subinterval of the real line, which, without loss of generality, can be assumed to be [0, 1].
The possibility of landmarks on the observed curve data implies that the class of smooth warp maps is inappropriate. Instead, consider the class given by W I := {γ : [0, 1] → [0, 1], increasing, continuous, γ(0) = 0, γ(1) = 1}. In a recent paper on Bayesian alignment of curves, Cheng et al. (2016) proposed a simple method to obtain a continuous random warp map in W I . The following is a summary of their algorithm.
Algorithm 1. Fixed partition-based sampling of warp maps.
2. Sample an n-dimensional Dirichlet distributed random vector with all parameters set to the same value α > 0.
3. Construct a warp map on [0, 1] by linear interpolation of the increments.
The resulting warp maps are continuous and are elements of W I . The idea behind this approach is based on the fact that for γ ∈ W I , its increments p i := γ(t i ) − γ(t i−1 ) are positive and satisfy n i=1 p i = 1. Hence, p i can be identified with coordinates of the simplex ∆ n−1 : The parameter space of warp maps generated in this fashion is essentially finite-dimensional, since a warp map is fully determined by its values at the ordered set of points t i . Cheng et al. (2016) state (without proof) that as n → ∞, this results in a Dirichlet process.
For a fixed partition T n , Algorithm 1 recommends simulating a Dirichlet random vector with all n parameters equal to α > 0, or equivalently α(1, 1, . . . , 1). The parameter vector is hence independent of the size n of partition. The natural way to incorporate information from the partition T n is to assume that the parameter vector is of the form α * ( 1 n , . . . , 1 n ) where α * = nα. This can be generalized to an arbitrary deterministic partition T = {B 1 , . . . , B k , k ≥ 1} of [0, 1] by simulating a |T |-dimensional Dirichlet random vector with parameters α(µ(B 1 ), . . . , µ(B k )), where α > 0, µ is a finite measure on [0, 1], and |T | denotes the cardinality of the set T . An equi-spaced or uniform partition arises by setting T = T n with t i − t i−1 = 1/n, i = 1, . . . , n, and taking µ as the Lebesgue measure.
From a practical perspective, Algorithm 1 is appealing since the partition T n can be chosen in various ways; the increments can be sampled from distributions different from the Dirichlet on ∆ n−1 , and linear interpolation results in continuous maps. The two pertinent questions are: (1) What is the corresponding distribution on W I as n → ∞? and (2) can the sampled Dirichlet random vector be identified with the finite-dimensional distributions of a stochastic process? The lack of partition information in the parameter vector has a significant implication on the answers to questions (1) and (2).
The algorithm constructs a warp map by linear interpolation of the increments obtained from a fixed partition T n . The natural setting for its examination is the space C([0, 1]) of real-valued, continuous functions on [0, 1], with the linearly interpolated process based on the partial sum of the increments: (1) Clearly, Y n (0) = 0, Y n (1) = 1, and Y n is continuous, increasing in (0, 1) and an element of W I . The case when α = 1 and (p 1 , . . . , p n ) is uniform on ∆ n−1 (for a given partition of T n ) is particularly instructive as it captures the key features of the algorithm.
Theorem 1. Suppose (p 1 , . . . , p n ) based on a fixed equi-spaced partition T n is uniformly distributed on ∆ n−1 , and γ id : , equipped with the uniform topology, Y n converges in probability to the identity warp map γ id . The process √ n(Y n (t) − γ id (t)) converges in distribution to a standard Brownian Bridge process.
Theorem 1 states that for large n, when α = 1, the sampling algorithm proposed by Cheng et al. (2016) results in a distribution on W I that is degenerate at the identity warp map. The conclusion also remains true with γ id replaced by another deterministic warp map for a fixed non-equi-spaced partition T n (see Supplementary Material). Moreover, the fluctuations away from the identity warp can be captured by the behavior of a standard Brownian Bridge.
This result suggests that the resulting distribution on the class W I , in the limit, is governed only by the value of the Dirichlet scalar concentration parameter α, and concentrates on the identity warp map. See Section 6.1 for numerical illustrations of such degenerate behavior.
In fact, such an uninteresting distribution on W I resulting from Algorithm 1 is not restricted to the case where the increments are uniformly distributed on ∆ n−1 ; this phenomenon is applicable to a rather large class of distributions on ∆ n−1 based on spacings of random variables. Suppose x 1 , . . . , x n are independent from an absolutely continuous distribution function F on [0, 1], with density f and quantile function Q on (0, 1). Extend the definition of Q to [0, 1] by setting 0 =: Q(0) = lim t→0 Q(t) and 1 =: Q(1) = lim t→1 Q(t). Set x 0:n := 0, x n:n := 1, let 0 < x 1:n < x 2:n < . . . < x n−1:n < 1 a.s. denote the corresponding order statistics, and let p i = x i:n − x i−1:n for i = 1, . . . , n be the spacings. Since n i=1 p i = 1, (p 1 , . . . , p n ) is a random vector on ∆ n−1 . When f is the uniform density on [0, 1], (p 1 , . . . , p n ) is a Dirichlet distributed random vector with each parameter equalling one. This class of distributions is hence a natural extension of the one used in Algorithm 1.
Corollary 1. Suppose (p 1 , . . . , p n ) ∈ ∆ n−1 based on any fixed equi-spaced partition T n is the vector of spacings of i.i.d. random variables with a twice differentiable distribution function F and quantile function Q. If f is the corresponding probability density, assume that inf 0≤x≤1 f (Q(x)) > 0 and sup 0≤x≤1 |f (Q(x))| < ∞. Then, Y n converges in probability to Q in C([0, 1]) equipped with the uniform topology.
The conditions on F in Corollary 1 are not too restrictive and are satisfied by several densities with support on [0, 1]. For example, one can easily check that if F is the distribution function of a non-central Beta (Hodges, 1955) with both shape parameters equal to one, and a non-centrality parameter κ > 0, then F is twice differentiable with bounded second derivative, and inf 0≤x≤1 f (x) is e −κ/2 (κ/2 + 1) > 0, which is attained in the limit at zero. As with Theorem 1 the conclusion of Corollary 1 holds for a non-equi-spaced partition as well (See Supplementary Material).

Improved construction via random partitions
While the simplicity of Algorithm 1 is attractive, the degeneracy of the resulting distribution on W I (as n → ∞) is disconcerting. In this section, we offer a simple modification of the previous approach that salvages the situation. The sampling method in Algorithm 1 depends on the choice of the partition T n of [0, 1]. We demonstrate that choosing a random partition T n (H) based on order statistics of an i.i.d. sample from distribution H on [0, 1], in conjunction with a point process representation, results in a limit process with sample paths that lie in W I centred at a desired warp map. The motivation for using partitions induced by order statistics is the fact that conditional on n, t i:n have the same distributions as the order statistics of an where h is the intensity of a non-homogeneous Poisson process on [0, 1]. Next, we provide a new algorithm for sampling warp maps in W I , and study the theoretical properties of the associated distribution.

Construct a warp map on [0, 1] by linear interpolation of the increments.
Algorithm 2 is easy to implement, and extends Algorithm 1 to one based on random partitions (and a subsequent change in Dirichlet parameters).

Theoretical support for Algorithm 2
As n → ∞, the limiting distribution associated with Algorithm 2 can be identified in two ways. The first approach is to start with finite-dimensional distributions at chosen time points and posit the existence of a process with the chosen finite-dimensional projections based on Kolomogorov's consistency theorem (see von Renesse and Sturm (2009)). With this approach, it is then difficult to centre the distribution at a desired warp map. The approach we adopt in this paper is to constructively define a distribution that arises as a limit (as n → ∞) based on a point process formulation using transformed increments.
We first review the Gamma subordinator process. A process G(t), t ∈ [0, 1] is a Gamma with shape parameter t − s and scale parameter equal to one. It is a Levy process with Levy measure λ(dy) = y −1 e −y dy and sample paths that are discrete with probability one, and thereby allows for a point process representation: Using the representation of a pure jump Levy process by Ferguson and Klass (1972), we examine the existence of a limit process with sample paths in W I with 'finite-dimensional' Dirichlet distributions with appropriate parameters. The following result is formulated for the general class of distributions on ∆ n−1 induced by spacings of i.i.d. random variables on [0, 1] with density f and distribution function F , of which the Dirichlet with all parameters set to the same value is a special case. Consider a random partition T n (H) of [0, 1] based on order statistics 0 =: t 0:n < t 1:n < . . . < t n−1:n < t n:n := 1 from an i.i.d sample {t i , i = 1, . . . , n − 1} with absolutely continuous distribution function H on [0, 1]. Independent of t i , consider (p 1 , . . . , p n ) ∈ ∆ n−1 obtained as spacings of an i.i.d sequence x i from a density f chosen as described in Corollary 1. Let F and Q be the corresponding distribution and quantile functions of x i , respectively. Set v 1 = p 1 and v i = p 1 + . . . + p i , i = 2, . . . , n, and consider the transformed random variables (1) Suppose p i for each i = 1, . . . , n possesses a unimodal density. The sequence P n converges in the vague topology to the Poisson point process P with intensity measure H(dt)×λ(dy), where λ(dy) = y −1 e −y dy.
(2) If P n converges in the vague topology to P, then the sequence of processes G n (t) := i λ −1 (z i,n )I t i ≤t , t ∈ [0, 1] converges weakly to the time-changed Gamma process G • H in the Skorohod J 1 topology. However, the linearly interpolated version of G n converges to G • H in the Skorohod M 1 topology.
Evidently, the limit process P is the Gamma process G • H, and P n , when normalized, converges weakly to a limit, which we refer to as the Dirichlet process D • H with dH as the base measure, where dH is the Lebesgue-Stieltjes measure corresponding to the distribution function H. The probability measure D • H on the class W I is constructed only using the where t n:n = x n = 1 and t 0:n = x 0 = 0. A few remarks are in order at this stage.
Remark 1. The Ferguson-Klass representation is based on the transformation of a homogeneous Poisson random measure on R + × R + under the inverse of the tail of a Levy measure g on R + . Theorem 2 uses this representation with Levy measure λ to (1) retain the simplicity of Algorithm 1, (2) ensure that the distribution on W I satisfies subset invariance, and (3) ensure that the distribution on W I can be centred at a desired warp map. Proposition 1 in the next section clarifies (2) in view of Lukac's characterization of the Gamma distribution; in other words, the subset invariance requirement fixes the Levy measure to be the Gamma  (2009)).
Importantly, when considering warp maps of S, the relationships to distribution or quantile functions of measures on S are unavailable.
Remark 3. Part (2) of Theorem 2 is striking: linear interpolation of the sample paths of G n does not affect the limit process. This has an important implication for Algorithm 2 using random partitions, while not giving up continuity of obtained warp maps. The weaker Skorohod's M 1 topology is used since the linear interpolation of G n implies that we seek convergence to a limit jump process with unmatched jumps in the converging sequence of processes; this cannot be achieved under the usual J 1 topology. Based on a vector of increments (p 1 , . . . , p n ) uniformly distributed on ∆ n−1 , we consider the 'large jumps' process defined as the real-valued partial sum process Y n (t) : , where ξ i,n := np i − log n is a triangular array sequence. The definition of ξ i:n is motivated by the fact that the largest jump amongst the p i is of size O P (log n/n) when (p 1 , . . . , p n ) is uniform on ∆ n−1 (Devroye, 1981). Jumps of smaller order can also be studied using intermediate spacings amongst the p i under a different normalizing transformation (see Nagaraja et al. (2015) for details).
Theorem 3. Let Y (t), t ∈ [0, 1] be a real-valued Levy jump process with Levy measure ν(dy) = e −y dy. The sequence Y n converges weakly to Y in D([0, 1]) equipped with the Skorohod J 1 topology. Theorem 3 is a functional limit theorem for the process Y n , and describes the probabilistic behavior of Algorithm 2 in generation of warp maps which contain regions of high warping.
The centering term in Theorem 3 cannot be dispensed with, although other equivalent terms can be chosen. The arrivals of the large increments are governed by a finite activity Levy process since ∞ 0 e −u du < ∞, which implies that large jumps occur infrequently; Y is hence a compound Poisson process. This ensures that under Algorithm 2, once an H that generates the random partition and sets the average warp map is chosen, we are not likely to sample warp maps from D • H that contain far more large jumps relative to those in the average warp. Thus, Algorithm 2 offers automatic regularization toward the average warp map.

Landmark constraints and a global concentration parameter
In practice, in the presence of landmarks, one decomposes the unconstrained registration problem into multiple sub-problems corresponding to intervals formed due to the landmark constraints. For example, in the case of m = 2 landmarks, suppose that the landmark locations on the domain of two open curves g 1 and g 2 are at t i , t k ∈ [0, 1], and at t j , t l ∈ [0, 1], respectively, with 1 ≤ i < k ≤ n − 1 and 1 ≤ j < l ≤ n − 1. This induces three intervals of interest on each curve: The points t i and t k can now be exactly matched to t j and t l using a PL warp map, resulting in two new points t * i and t * k . This leads to three classes of warp maps: The original registration problem on [0, 1] has thus been decomposed into three similar ones: ; such relationships are true for all warp maps in the classes W 1 , W 2 and W 3 .
The above decomposition requires a property of warp maps called subset invariance. The corresponding requirement on a probability measure on W I is the following. Consider the A parametrized probability measure P θ , θ ∈ Θ on W I is said to satisfy subset invariance if its push-forward under the map S([a, b]), S([a, b]) # P θ is P θ(b−a) . To facilitate subset invariance, and to ensure that the three sub-problems borrow strength, we introduce a concentration parameter θ > 0, which allows the distribution to interpolate between the indicator (step) function and the average warp map. Under the notation employed in the preceding section, consider the parametrized Levy measure λ θ (y) = θ ∞ y e −t t −1 dt. The limit process P in Theorem 2 is then a Gamma process with Levy measure λ θ , such that, for any Borel set A and any 0 ≤ s < t ≤ 1, Then D θ (t) = G(θt)/G(θ), t ∈ [0, 1] is the corresponding Dirichlet process. (1) Concentration around the mean: For the partition based on H, E θ (γ t ) = H(t) and Var θ (γ t ) = 1 1+θ H(t)(1 − H(t)), t ∈ [0, 1], θ > 0.
(2) Subset invariance: For the map S([a, b]) in Equation (3), the push-forward measure The proof of (1) follows from direct computation using Equation (4). Proofs of (2) and (3) are easily obtained from the representation of the Dirichlet process as a normalized Gamma process, D θ (t) = G(θt)/G(θ), and the fact that D θ (t) is independent of G(θ) for every t ∈ [0, 1] and θ > 0. The independence and self-similarity properties are hence based on Lukac's characterization of the Gamma distribution: if X 1 , . . . , X n are independent Gamma random variables, then Y = n i=1 X i and the vector (X 1 /Y, . . . , X n /Y ) are independent. For proofs of (3) and (4), we refer the reader to the proof of Proposition 3.14 by von Renesse and Sturm (2009). Properties (1), (4) and (5) ensure that the distribution on W I is centred at H. The parameter θ behaves like a concentration parameter: varying θ moves mass away from or toward H, and offers a rich class of probability models for W I . Property (6)  With the introduction of θ > 0, the distribution D θ • H has finite-dimensional projections interpreted in the following manner: conditioned on a partition T n (H) = {0 =: t 0:n < t 1:n < . . . < t n−1:n < t n:n := 1}, Remark 5. Part (2) of Proposition 1 identifies D θ • H with the subset invariance property.
In other words, if subset invariance is a requisite property for a distribution on warp maps, with or without landmarks, it is not possible to construct one that does not concentrate on discontinuous warp maps. This stands in contrast to the usual smoothness assumptions associated with warp maps (Ramsay and Silverman, 2005;Claeskens et al., 2010), but is rarely an issue in practice. Intuitively, decomposition of alignment based on landmarks can be linked with an independent increments property of the stochastic process. Furthermore, under the popular square-root velocity transform that we employ in the applications for alignment (see Section 6.2 for details), Lahiri et al. (2015) proved that the optimal warp map is PL when at least one of the two curves to be aligned is PL (see Theorem 6). The lack of smoothness of the sample paths of D • H is thus not unrealistic.

Extension to distributions on warp maps of S
Our aim in this section is to construct a probability measure on the set W S := {γ : S → S : continuous, orientation preserving} of warp maps of S. From a practical perspective, we wish to develop an easy-to-implement sampling method, similar to Algorithm 2, for generating random warp maps of S based on a suitable discretization. Our approach 'unwraps' S at a specific point c and proceeds to identify W S with the product space W I × S through the identification of S with [0, 1]. This amounts to using the probability measure D θ • H on W I along with one on S, based on viewing the circle of unit circumference S as the quotient group S = R/2πZ with the addition operation inherited from R. We thus move from [0, 1] to the circle with unit length S by identifying the endpoints of the interval.
Through the identification of S with [0, 1], every continuous mapping β : R → R induces a continuous mapping of S onto itself such that β(t + j) = β(t) + j for all t ∈ R, where j is an integer (β is unique up to addition of an integer and β(t) − t is periodic with period j). If β is monotone increasing and j = +1, we say that the induced map on S is orientation-preserving to γ s . Figure 2 offers an illustration of this approach for γ(t) = t 2 with c = 0.94, leading to t c = 0.24. Proposition 2 formalizes this for a random γ s generated in this fashion.
Proposition 2. Conditional on c from µ, for each γ s from D θ • H, the following hold with probability one: (1) γ s (0) = c. Material contains an 'intrinsic' construction that circumvents the need to unwrap S. But, the construction provided above is easier to implement in practice, and is thus the only one used in subsequent applications.

Numerical illustrations
The degeneracy phenomenon in Algorithm 1 can arise due to two reasons: (1)

Illustrations on real data
For detailed descriptions of the datasets used throughout this section, please refer to the Supplementary Material. We illustrate the utility of the proposed distribution and associated sampling scheme in two pairwise alignment tasks: (1) Bayesian alignment of univariate functions with and without landmark constraints, and (2) unconstrained alignment of univariate functions and higher-dimensional open and closed curves using a novel Simulated Annealingbased algorithm (see Robert and Casella (2005) for details). While the distribution D θ • H for sampling warp maps can be used with any alignment method, we use the framework based on the square-root velocity function (SRVF) representation of curves in R d , d ≥ 1: f → q :=ḟ (|ḟ |) −1/2 , whereḟ is the derivative of f and | · | is the Euclidean norm in R d .
We use this representation due to its many nice properties for the registration problem (see Kurtek et al. (2012); Lahiri et al. (2015)).
Under this representation, warping of a function f → f • γ is given by q → (q • γ) √γ .

Bayesian alignment of curves
Suppose we have two functions g i : [0, 1] → R, i = 1, 2. A Bayesian registration model can be defined using their SRVF representations. The alignment problem then centres around an R-valued, square-integrable, separable stochastic process X(t) := q 1 (t) − q 2 (t) for t ∈ [0, 1] with law P and density p = dP dµ with respect to a σ-finite measure µ on L 2 ([0, 1]). For a fixed γ ∈ W I , assume that the law P γ of the process X γ (t) = q 1 (t) − q 2 (γ(t)) γ(t) is absolutely continuous with respect to P with density p γ (see Theorem 6.4.5 in Bogachev (1998) Lahiri et al. (2015). Thus, in this context, the finite dimensional restriction of the probability measure D θ • H is well-suited for defining priors on warp maps.
The Bayesian model employed here is very similar to the one presented in Cheng et al. (2016) and Kurtek (2017). In short, the likelihood is a zero-mean multivariate Gaussian distribution with a diagonal covariance matrix. We select a conjugate, vague Gamma prior for the likelihood precision and analytically integrate it out of the posterior. While Cheng et al. (2016) sampled from the posterior distribution using Markov chain Monte Carlo (MCMC), we use a simple sampling importance resampling (SIR) algorithm, with the importance function set to the prior distribution as in Kurtek (2017). This may not be the best approach to sample from the posterior, but it provides a fast approximation and seems to work well in the settings we considered. We assess the utility of the proposed prior distribution on W I in two settings: (1) unconstrained, and (2)  In order to sample from the distribution D θ • H on W I via the easy-to-implement Algorithm 2, we require three specifications: (1) choice of partition that determines the average warp map H, (2) n, which controls the size of the partition, and (3) θ, which controls the spread around the average warp. We set H(t) = t, the uniform partition, which ensures regularization toward identity warping. We resample all functions with 100 points and choose n = 20, θ = 10. This gives flexibility in the prior to explore extreme warpings (small θ) while also ensuring that the resulting warp maps are fairly regular (smaller partition prevents many small jumps). In the case of landmark-constrained alignment, owing to the properties of D θ •H in Proposition 1, we re-scale the θ and n proportionally to the length of each function segment, and consider each sub-problem independently. As with the Bayesian model, we base our stochastic alignment algorithm on the SRVF representation of curves. The energy functional that we seek to optimize is E(γ) = q 1 − (q 2 • γ) √γ 2 , for γ in W I or W S . When the domain of q 1 and q 2 is [0, 1], a solution to this optimization problem can be obtained using a Dynamic Programming (DP) algorithm (Robinson, 2012). The resulting solution is deterministic and depends on the fineness of the discretization and the size of the neighborhood search. In the case of closed curves, one has to either resort to a gradient descent algorithm , which has the obvious limitation of getting stuck in a local solution, or a DP approach with an additional seed (the point at which S is unwrapped to [0, 1]) search, which only gives an approximate solution. The energy functional E is a natural choice for curve registration, because (1) the L 2 distance in the energy corresponds to an elastic metric on the space of curves, and (2)  The algorithm is fairly robust to the choices of n, θ, T and c. Numerical illustrations investigating robustness are presented in the Supplementary Material. In the first step of the algorithm, we propose a candidate warp map that is a linear combination of a random warping sampled from the distribution D θ • H centred at the previously accepted warp map and the identity warping; while not necessary, this ensures extra regularization toward identity warping. The key here is that efficient sampling from D θ •H centred at an arbitrary warp map is easily enabled through Algorithm 2. Furthermore, the relative values of the concentration parameter θ (neighborhood size) and the T control the dynamics of the algorithm.
Next, we briefly comment on how Algorithm 4 is extended to the case of alignment of open and closed curves for the purpose of shape analysis. In the case of shapes, translation, scale and rotation variations are nuisances and have to be removed in addition to alignment via warp maps. Translation is removed automatically through the SRVF representation; scale variation is removed by normalizing all curves to unit length. Rotation variability is  Figure 6. accounted for by adding a Procrustes step at each iteration of the algorithm (see Dryden and Mardia (1998) for details on Procrustes alignment). In the case of closed curves, we must additionally propose the seed point, which is used to unwrap S to [0, 1] and eventually sample from D θ • H; this is accomplished via a random proposal, in the neighborhood of the current seed point, from the von-Mises distribution on S.
We begin with six examples that consider univariate functional data. The alignment results are presented in Figure 6. In all cases, the computed warp map provides a nice alignment of features across functions. For example, in panel (2), we consider alignment of two PQRST complexes without imposing landmark constraints on the warp maps. Originally, the PQRST peaks and valleys are not well aligned; this is especially evident in the case of the R peaks (highest peak in each function). The proposed method is able to align the peaks very well. This is also the case in the more complex example (6) that considers alignment of two signature tangential acceleration functions. These functions contain many peaks and valleys that are not in correspondence before alignment. The proposed method is able to effectively align all of the peaks and valleys via a suitable warp map. Table 1 provides a numerical evaluation of our approach. For each of the six examples, we compare three different distances between the functions: (a) distance before alignment, (b) distance after alignment using DP, and (c) distance after alignment using Simulated Annealing. The proposed method provides comparable performance to DP.
Next, we present several results of registering shapes of 3D open curves. In this case, we use two datasets that were previously considered in Kurtek et al. (2012): (1) simulated spirals, and (2) fibers extracted from diffusion tensor magnetic resonance images (DT-MRIs).
The results are presented in Figure 7. For each example, we show the optimal warp map, the evolution of the energy E(γ) as a function of the number of iterations, and the geodesic path (shortest distance deformation under the L 2 metric on SRVF representations) between the two shapes before and after Simulated Annealing-based alignment. For the simulated spirals, the additional alignment via warp maps results in a much more natural geodesic deformation between them, where the shapes of the individual spirals are better preserved. This is also the case for the DT-MRI fibers, albeit not as clear. The top portion of Table 2 provides a quantitative comparison of DP-based alignment and the proposed method. In three out of the four given examples, the proposed method results in a significantly shorter distance between the considered shapes (the maximum distance on this shape space is π/2).
We close this section with four examples of registering closed curves from the MPEG-7 dataset using Simulated Annealing; these curves represent fairly complex shapes including a cup with a handle and a stingray. Recall that in the case of closed curves, we must optimize over the seed placement (point at which S is unwrapped to [0, 1]) on the curve in addition to the warp map. The results are presented in Figure 8. We provide the same displays as in the open curve examples. As previously, the geodesic paths after alignment represent more natural deformations between the shapes than those before alignment. This is especially evident in the cup example. We present our quantitative assessment in the bottom portion of Table 2. Here, we compare to a DP approach with an additional seed search. The

Discussion
The class of warp maps of [0, 1] can be identified with the set of distribution or quantile functions on [0, 1]. Sample paths of Levy subordinators normalized to obtain normalized random measures (Dirichlet process is a special case), and variants thereof (Hjort, 1990;Regazzini et al., 2003;Nieto-Barajas et al., 2004), can be used to define distributions on warp maps along with numerous corresponding sampling schemes (see Griffin (2016) and references therein). However, warp maps of S cannot be identified with distribution functions, and an 'intrinsic' distribution based on arc lengths (as described in the Supplementary Material) is not easily obtainable as laws of normalized random measures.
We have given two methods for pairwise matching of curves: (1) a Bayesian registration model, and (2) a stochastic search algorithm via Simulated Annealing. In many applications, it may be of interest to match multiple curves simultaneously, also termed multiple registra- tion. This is usually accomplished by joint estimation of a template curve and an additional matching step. The proposed Simulated Annealing-based alignment can be easily incorporated into a multiple registration algorithm by replacing the commonly used DP approach.
Multiple alignment via a formal Bayesian model requires a prior on the template curve in addition to the warp maps. Nonetheless, our approach can be readily built into existing Bayesian multiple registration models such as the one presented in Cheng et al. (2016).
Although unexplored in this paper, the proposed distribution on warp maps is well-suited for curve registration with landmarks observed with uncertainty in their placement. The desirable properties allow us to develop priors centred at piecewise linear warp maps that match the landmarks exactly. This allows for incorporating prior information into the problem by regularizing the warp maps toward a landmark induced warping.