A diffusion process associated with Fr\'{e}chet means

This paper studies rescaled images, under $\exp^{-1}_{\mu}$, of the sample Fr\'{e}chet means of i.i.d. random variables $\{X_k\vert k\geq 1\}$ with Fr\'{e}chet mean $\mu$ on a Riemannian manifold. We show that, with appropriate scaling, these images converge weakly to a diffusion process. Similar to the Euclidean case, this limiting diffusion is a Brownian motion up to a linear transformation. However, in addition to the covariance structure of $\exp^{-1}_{\mu}(X_1)$, this linear transformation also depends on the global Riemannian structure of the manifold.

1. Introduction. It has become increasingly common in various research areas for statistical analysis to involve data that lies in non-Euclidean spaces. One such an example is the statistical analysis of shape; cf. [4] and [7]. Consequently, many statistical concepts and techniques have been generalised and developed to adapt to such phenomena.
Fréchet means, as a generalisation of Euclidean means, of random variables on a metric space have been widely used for statistical analysis of non-Euclidean data. A point µ in a metric space M with distance function ρ is called a Fréchet mean of a random variable X on M if it satisfies Influenced by the structure of the underlying spaces, Fréchet means, unlike their Euclidean counterparts, exhibit many challenging probabilistic and statistical features. Various aspects of Fréchet means have been studied for non-Euclidean spaces, including Riemannian manifolds and certain stratified spaces. Among others, the strong law of large numbers for Fréchet means on general metric spaces was obtained in [11]. The first use of Fréchet means to provide nonparametric statistical inference, such as confidence regions and two-sample tests for discriminating between two distributions, was carried out in [2] and [3] for both extrinsic and intrinsic inference applied to manifolds. When M is a Riemannian manifold with the distance function being that induced by its Riemannian metric, the results on central limit theorems for Fréchet means can be found in [3] and [8]. The results in both papers imply that, since manifolds are locally homeomorphic to Euclidean spaces, the limiting distributions for sample Fréchet means on Riemannian manifolds are usually Gaussian, a phenomenon similar to that for Euclidean means.
In the case of Euclidean space, the link between the sample means of i.i.d. random vectors and random walks leads to the fact that the rescaled sample means converge weakly to Brownian motion, possibly up to a linear transformation associated with the covariance structure of the random vectors. On the other hand, the authors of [1] constructed a stochastic gradient algorithm from a given sequence of i.i.d. random variables on a Riemannian manifold where, under certain conditions, the random sequence resulting from the algorithm converges almost surely to the Fréchet mean µ of the given random variables. Moreover, it showed that, if one rescales the images, under exp −1 µ , of the random walks associated with the algorithm, they converge weakly to an inhomogeneous diffusion process on the tangent space of the manifold at µ. The following questions are raised from this paper: if one rescales the images, under exp −1 µ , of the sample Fréchet means of the random variable, will they converge weakly? If they do, do they converge to the same diffusion process as the one given in [1]? If not, what is the limiting diffusion process? This paper addresses these questions. We show that the rescaled images of the sample Fréchet means of i.i.d. random variables {X k |k ≥ 1} on a Riemannian manifold converge weakly to a diffusion process which is a Brownian motion up to a linear transformation. Moreover, in addition to the covariance structure of exp −1 µ (X 1 ), this linear transformation also depends on the global Riemannian structure of the manifold. For this we first, in the next section, construct a sequence of simpler inhomogeneous Markov processes, each of which is also a martingale, and consider the behaviour of their weak convergence. In addition to their intrinsic interest, the results in this section also form a basis for our investigations of "rescaled" sample Fréchet means in the following section. In particular, we relate the constructed sequence of processes to the "rescaled" sample Fréchet means in such a way that the result for the latter is a direct consequence of the former.
2. An auxiliary weakly convergent sequence of Markov chains. Let M be a complete Riemannian manifold of dimension d with covariant derivative D and Riemannian distance ρ, whose sectional curvature is bounded below by κ 0 ≤ 0 and above by κ 1 ≥ 0. For any x ∈ M, we denote by C x the cut locus of x. Note that, for any fixed x 0 , the squared distance function ρ(x 0 , x) 2 to x 0 is not C 2 on C x 0 .

3
For a fixed y ∈ M, consider the vector field on M \ C y defined, at x / ∈ C y , by exp −1 x (y) ∈ τ x (M), where τ x (M) denotes the tangent space of M at x, and then define the linear operator H x,y on the tangent space τ x (M) by The operator H x,y so defined will play an important role in the following study of the asymptotic behaviour of sample Fréchet means on M. Note first that H x,y is closely linked with Hess( 1 2 ρ(x, y) 2 ), the Hessian of the function 1 2 ρ(x, y) 2 , as follows (cf. [6], page 145): , for any x / ∈ C y and any tangent vectors u, v ∈ τ x (M), and so the assumption on the bounds for the sectional curvature of M implies that, for any unit where we require that if κ 1 > 0, √ κ 1 ρ(x, y) < π/2 for the first inequality to hold; cf. [6], page 203. In contrast to Euclidean means, there is generally no closed form for Fréchet means. On the other hand, the result of [9] implies that the Euclidean random variable exp −1 µ (X) is almost surely defined, where µ is a Fréchet mean of the random variable X on M. Then, since , where grad 1 denotes the gradient operator acting on the first argument of a function on M × M and since by the definition of Fréchet means, the Fréchet mean µ satisfies the condition that Thus µ is linked to the Euclidean mean in the sense that the origin of the tangent space of M at µ, τ µ (M) is the Euclidean mean of the Euclidean random variable exp −1 µ (X). Let {X k |k ≥ 1} be a sequence of i.i.d. random variables on M, and for a fixed x 0 ∈ M, assume that E[ρ(x 0 , X 1 ) 2 ] < ∞. This assumption ensures the existence of Fréchet means of X 1 . For simplicity, in the following, we shall assume that the Fréchet mean µ of X 1 is unique. However, we do not require the support of the probability measure of X 1 to be contained in any 4 H. LE geodesic ball. Note that the result of [9] ensures that P(X 1 ∈ C µ ) = 0 under some mild condition on M. We further assume that These two assumptions ensure that the linear operator E[H µ,X 1 ] is well defined and nonsingular. For each fixed n ≥ 1, consider the time-inhomogeneous Markov chain One may check that {V n k |k ≥ 0} is a martingale. We are interested in the asymptotic behaviour of {V n [nt] |t ≥ 0} as n tends to infinity. Firstly, for this, the following lemma gives an upper bound for the sequence {V n [ε 0 n] |n ≥ 1} for ε 0 > 0. Lemma 1. Suppose that the assumptions (4) hold. Then there is a constant c 0 > 0 such that, for any ε 0 > 0 and n 0 > 0, Under the given conditions, there is a constant β ≥ 0 such that, for any v ∈ τ µ (M), v, Bv ≤ (β + 1)|v| 2 . Thus, using the facts that E[exp −1 µ (X 1 )] = 0 and that X k+1 is independent of V n k , we have Noting that {V n k |k ≥ 0} is a martingale with zero expectation, the above implies that Hence, by induction, we have the above implies, in particular, that The required result then follows from the fact that ∞ j=1 (1 + β j 2 ) < ∞.
The next lemma gives various bounds on the differences V n k+1 − V n k for sufficiently large n and k.
Lemma 2. In addition to the assumptions in (4), assume that, for some δ > 0, E[ρ(µ, X 1 ) 2+δ ] < ∞. Then, for any ε 0 > 0 and r > 0, there are constants c 1 , c 2 , and c 3 depending on ε 0 and r such that, when n is sufficiently large, for k ≥ ε 0 n and for v ∈ τ µ (M) with |v| ≤ r: Thus if v = 0, it follows from Chebyshev's inequality that when n is sufficiently large. Note that the assumption that v = 0 implies that k ≥ 1. If v = 0, a modified argument will show that the above still holds for k ≥ 1. Hence, (5) follows. Similarly, using the definition of V n k , we have Thus, under the given conditions, result (i) also implies that, for any r > 0 and some constant c 2 depending on ε 0 and r, we have for k ≥ ε 0 n, for sufficiently large n and for all v ∈ τ µ (M) such that |v| ≤ r, so that (6) holds.
To show (7), we note that there are positive constants a, b, c independent of n and k such that, for given V n k = v, Thus, by result (i), for some constant c ′ > 0 dependent on |v|, so that the required result follows.
Corollary. Under the assumptions of Lemma 2, for any ε 0 > 0 and r > 0, the following limits hold uniformly in k ≥ ε 0 n: Proof. By (5), for any k ≥ ε 0 n, when n is sufficiently large. Thus (i) holds. Noting that E[V n k+1 |V n k ] = V n k , (ii) follows from (6). Since which follows from (7).
The properties that we have obtained so far on {V n k |k ≥ 0} enable us to prove the weak convergence of {V n [nt] |t ≥ 0} as follows.
Proposition. In addition to the assumptions in (4), assume that, for some δ > 0, E[ρ(µ, X 1 ) 2+δ ] < ∞. Then the sequence of processes {V n [nt] |t ≥ 0} converges weakly in D([0, ∞), τ µ (M)), the space of right continuous functions with left limits on the tangent space of M at µ, to {V t |t ≥ 0} as n → ∞, where V t is the solution of the stochastic differential equation with V 0 = 0, B t a standard Brownian motion in R d and Γ is defined by (8).
Proof. LetṼ n k = ( k n , V n k ). Then {Ṽ n k |k ≥ 0} is a time-homogeneous Markov chain. For each n ≥ 1, write P n for the transition probability distribution associated with {Ṽ n k |k ≥ 0}, that is, where B is any Borel set in (0, ∞) × τ µ (M). For any ε 0 > 0, the result of Lemma 1 implies that {Ṽ n [ε 0 n] |n ≥ 1} is tight. Hence, there is a subsequence {Ṽ n j [ε 0 n j ] |j ≥ 1} that converges weakly in τ µ (M) to a random variableξ ε 0 = (ε 0 , ξ ε 0 ). Then it follows from Corollary 7.4.2 in [5] (pages 355-356) that the results of the Corollary imply that the sequence of processes {Ṽ with the initial condition that V ε 0 has the same distribution asξ ε 0 and where V t satisfies the stochastic differential equation (9). This implies (cf. [5] To show the required result, it is now sufficient to show that, for any subsequence of {V n [nt] |t ≥ 0}, there is a further subsequence which converges weakly to {V t |t ≥ 0}. Without loss of generality, we may rename the subsequence as {V n [nt] |t ≥ 0} and apply the above to ε 0 = 1/m. For each m ≥ 1, this gives a subsequence {V 3. The main result. We now return to consider the sample Fréchet means of {X k |k ≥ 1}. For this, we denote by µ k a sample Fréchet mean of X 1 , . . . , X k for each k, so that µ k converges to µ almost surely (cf. [11]). It follows from (3) that µ k satisfies the condition Thus the origin of the tangent space of M at µ k , τ µ k (M), is the sample Euclidean mean of the Euclidean random variables exp −1 µ (X i ), i = 1, . . . , k. Nevertheless, although these relations resemble those for Euclidean means, these conditions are generally imposed on different tangent spaces, resulting in the difficulty in obtaining a usable form of the relation between consecutive sample Fréchet means. Moreover, the usual difference "µ k − µ" makes no sense here. However, in the context of manifolds, exp −1 µ (µ k ) plays a similar role to µ k − µ in the Euclidean case. This leads us to consider, for each n ≥ 1, the re-scaled sequence H. LE It is clear from (10), which the sample Fréchet means must satisfy, that µ k+1 cannot generally be expected to be determined by µ k and X k+1 alone so that in particular, {µ k |k ≥ 1}, and so {W n k |k ≥ 1}, is in general not a Markov chain. However, the following result shows that, for sufficiently large n and k, the behaviour of {W n k |k ≥ 1} is close to that of a Markov chain.
Lemma 3. In addition to the assumptions in (4), assume that where Π x,y denotes the parallel transport from x to y along the geodesic between the two points. Then, for any ε 0 > 0, r > 0, and T > 0, where {V n k |k ≥ 0} are the Markov chains defined in the previous section and σ r n = inf{t ≥ ε 0 ||W n [nt] | ≥ r or |W n [nt]−1 | ≥ r}.
Note that, when x is sufficiently close to µ, the geodesic between the two points is unique so that the above parallel transport is well defined.
Note also that Hess x ( 1 2 ρ(x, y) 2 ) is, as a mapping from τ x (M) × τ x (M) → R, smooth with respect to x if y / ∈ C x and, by (2), it is positive-definite provided √ κ 1 ρ(x, y) < π/2. Thus the relationship between H x,y and Hess x ( 1 2 ρ(x, y) 2 ) ensures that all three assumptions required for Lemma 3 are satisfied if the support for the distribution of X is a compact subset of the open ball ball(µ, π/(2 √ κ 1 )).