Consensus based optimization via jump-diffusion stochastic differential equations

We introduce a new consensus based optimization (CBO) method where interacting particle system is driven by jump-diffusion stochastic differential equations. We study well-posedness of the particle system as well as of its mean-field limit. The major contributions of this paper are proofs of convergence of the interacting particle system towards the mean-field limit and convergence of a discretized particle system towards the continuous-time dynamics in the mean-square sense. We also prove convergence of the mean-field jump-diffusion SDEs towards global minimizer for a large class of objective functions. We demonstrate improved performance of the proposed CBO method over earlier CBO methods in numerical simulations on benchmark objective functions.


Introduction
Large-scale individual-based models have become a well-established modelling tool in modern science and engineering, with applications including pedestrian motion, collective animal behaviour, swarm robotics and molecular dynamics, among many others.Through the iteration of basic interactions forces such as attraction, repulsion, and alignment, these complex systems of exhibit a rich self-organization behaviour (see e.g.[BFM97, CS07, CFRT10, MT14, BRSW15, ABF + 19]).
Over the last decades, individual-based models have also entered the field of global optimization and its many applications in operations research, control, engineering, economics, finance, and machine learning.In many applied problems arising in the aforementioned fields, the objective function to be optimized can be non-convex and/or non-smooth, disabling the use of traditional continuous/convex optimization technique.In such scenarios, individual-based metaheuristic models have been proven surprisingly effective.Examples include genetic algorithms, ant colony optimization, particle swarm optimization, simulated annealing, etc. (see [HKS89,DB05,Ken10] and references therein).These methods are probabilistic in nature which set them apart from other derivative-free algorithms [CSV09].Unlike many convex optimization methods, metaheuristic algorithms, are relatively simple to implement and easily parallelizable.This combination of simplicity and effectiveness has fuelled the application of metaheuristic in complex engineering problems such as shape optimization, scheduling problems, and hyperparameter tuning in machine learning models.However, it is often the case that metaheuristics lack rigorous convergence results, a question which has become an active area of research [GP21,GHPQ21].
In [PTTM17], the authors introduced a optimization algorithm which employs an individualbased model to frame a global minimization min ∈R ( ), where ( ) is a positive function from R to R, as a consensus problem .In this model, each individual particle explores the energy landscape given by ( ), broadcasting its current value to the rest of the ensemble through a weighted average.This iterated interaction generates trajectories which flock towards a consensus point which correspond to a global minimizer of ( ), hence the name Consensus Based Optimization (CBO).We refer to [Tot22,GHPQ21] for two recent surveys on (iii) As illustrated in the numerical experiments, the addition of a jump-diffusion process in the particle system leads to a more effective exploration of the energy landscape.This particularly relevant when a good prior knowledge of the optimal solution for initialization of the CBO is not available.
As was highlighted in [CCTT18, Remark 3.2], it is not straightforward to prove convergence of the interacting particle system towards its mean-field limit, even after proving uniform in moment bound of the solutions of the SDEs driving particles system.Convergence results of this type have been proved for special cases of compact manifolds (see [FHPS20] for compact hypersurfaces and [HKK + 22] for Stiefel manifolds) and globally Lipschitz continuous objective functions.In this case, not only the objective function is bounded but also particles are evolving on a compact set.Under the assumptions on the objective function as in our paper, in the diffusion case weak convergence of the empirical measure of a particle system to the law of the corresponding mean field SDEs has been proved in [GHPQ21,HQ21] exploiting Prokhorov's theorem.Here we prove convergence of the particle system to the mean-field SDEs in the mean-square sense for a quadratically growing locally-Lipschitz objective function defined on R .Furthermore, practical implementation of the particle system corresponding to a CBO model needs a numerical approximation in the mean-square sense.We utilize an explicit Euler scheme to implement the proposed jump-diffusion CBO model.This leads to the question whether the Euler scheme converges to the CBO model taking into account that the coefficients of the particle system are not globally Lipschitz and the Lipschitz constants grow exponentially when the objective function is not bounded.At the same time, the coefficients of the particle system have linear growth at infinity.In the case of jump-diffusion SDEs, earlier works either showed convergence of the Euler scheme in the case of globally Lipschitz coefficients [PBL10] or proposed special schemes in the case of non-globally Lipschitz coefficients with super-linear growth, e.g. a tamed Euler scheme [DKS16].Here we prove mean-square convergence of the Euler scheme and we show that this convergence is uniform in the number of particles , i.e. the choice of a discretization time-step ℎ is independent of .Our convergence result also holds for earlier CBO models [PTTM17,CCTT18,CJLZ21].
In Section 2, we first present a review of existing CBO models and then describe our CBO model driven by jump-diffusion SDEs.We also formally introduce mean-field limit of the new CBO model.In Section 3, we focus on well-posedness of the interacting particle system behind the new CBO model and its mean-field limit.In Section 4, we discuss convergence of the mean field limit towards a point in R which approximates the global minimum, convergence of the interacting particle system towards mean field limit, and convergence of the implementable discretized particle system towards the particle system.We present results of numerical experiments in Section 5 to compare performance of our model and the existing CBO models.
Throughout the paper, is a floating constant which may vary at different places.We denote ( • ) as dot product between two vectors, , ∈ R .We will omit brackets () wherever it does not lead to any confusion.

CBO models : existing and new
In Section 2.1, we review the existing CBO models.In Section 2.2, we introduce a new CBO model driven by jump-diffusion SDEs and and discuss potential advantages of adding jumps to CBO models which are confirmed by numerical experiments in Section 5.The numerical experiments of Section 5 are conducted using the Euler scheme presented in Section 2.2.
Each particle at time is assigned an opinion ( ( )).The lesser the value of for a particle, the more is the influence of that particle, i.e. the more weight is assigned to that particle at that time as can be seen in (2.2) of the instantaneous weighted average.If the value ( ( )) of a particle at time is greater than the value ( ¯ , ( )) at the instantaneous weighted average ¯ , ( ) then the regularised Heaviside function forces the particle to drift towards ¯ , .If the opinion of -th particle matters more among the interacting particles, i.e. the value ( ( )) is less than ( ¯ ( )), then it is not beneficial for it to move towards ¯ , .The noise term is added to explore the space R and to avoid non-uniform consensus.The noise intensity induced in the dynamics of the −th particle at time takes into account the distance of the particle from the instantaneous weighted average, ¯ , ( ).Over a period of time as the particles start moving towards a consensus opinion, the coefficients in (2.1) go to zero.
One can observe that the more influential opinion a particular particle has, the higher is the weight assigned to that particle in the instantaneous weighted average (2.2).Based on this logic, in [CCTT18] the authors dropped the regularised Heaviside function in the drift coefficient and the model (2.1) was simplified as follows: with , , ¯ , as in (2.1)-(2.2).The major drawback of the consensus based models (2.1) and (2.3) is that the parameters and are dependent on the dimension .To illustrate this fact, we replace ¯ , in (2.3) by a fixed vector ∈ R .Then, using Ito's formula, we have (2.4) As one can notice, for particles to reach the consensus point whose position vector is , one needs 2 > 2 .To overcome this deficiency, the authors of [CJLZ21] proposed the following model which is based on component-wise noise intensity instead of isotropic noise used in (2.1) and (2.3): where , , and ¯ , are as in (2.1)-(2.2),and Diag( ) is a diagonal matrix whose diagonal is a vector ∈ R .Now, if we replace ¯ , by a fixed vector and then use Ito's formula for (2.5), we get where ( ( ) − ) denotes the −th component of ( ( ) − ).It is clear that in this model there is no dimensional restriction on and .
Other CBO models [HJK20,HJK21] are based on interacting particles driven by common noise.Since the same noise drives all the particles, the exploration is not effective.Therefore, they are not scalable with respect to dimension and do not perform well in contrast to the CBO models (2.1), (2.3), (2.5) and model introduced in Section 2.2.This fact is demonstrated in experiments in Section 5.

Jump-diffusion CBO models
Let us consider the following jump-diffusion model: with where ( ), = 1 . . ., are −independent Poisson processes with jump intensity and = ( ,1 , . . ., , ) ⊤ are i.i.d.-dimensional random variables denoting −th jump by −th particle and ∼ .The distribution of is called as jump size distribution.For the sake of convenience, we write as the -th component of vector .We assume that each component of is also i.i.d.random variable and distributed as whereis an R−valued random variable whose probability density is given by - We also denote the probability density of as ( ) = =1 --( ).Note that E( ) is a −dimensional zero vector, since each is distributed as -.The Wiener processes ( ), the Poisson processes ( ), = 1 . . ., and the jump sizes are assumed to be mutually independent (see further theoretical details concerning Lévy-driven SDEs in [App04]).Also, ( ), ( ), ( ) are continuous functions and , (2.10) with > 0. Note that we have omitted and of ¯ , in the notation used in (2.7) for the simplicity of writing.
We recall the meaning of the jump term where denotes the time of -th jump of the Poisson process ( ).Thanks to the assumption that E( -) = 0 which in turn implies E( , ) = 0, = 1, . . ., ( ), = 1, . . ., , = 1, . . ., , the above integral is a martingale, and hence (similar to Ito's integral term in (2.7)) it does not bias trajectories of ( ), = 1, . . ., .The jump diffusion SDEs (2.7) are different from (2.5) in the two ways: • The SDEs (2.7) are a consequence of interlacing of Ito's diffusion by jumps arriving according to the Poisson process whose jump intensity is given by .
• We take ( ) as a continuous positive non-decreasing function of such that ( ) → > 0 as → ∞, ( ) as a continuous positive non-increasing function of such that ( ) → > 0 as → ∞ and ( ) as a continuous non-negative non-increasing function of such that ( ) → ≥ 0 as → ∞.
Although we analyse CBO model (2.7) with time-dependent parameters, a decision to take parameters time-dependent or not is problem specific.Note that the particles driven by SDEs (2.7) jump at different times with different jump sizes and jumps arrive according to the Poisson process with intensity .
We can also write the jump-diffusion SDEs (2.7) in terms of Poisson random measure [App04] as where ( , ), = 1, . . ., , represent the independent Poisson random measures with intensity measure ( ) and ( ) is a Lévy measure which is finite in our case (2.7).Although for simplicity we introduced our model as (2.7), in proving well-posedness and convergence results we will make use of (2.11).We can formally write the mean field limit of the model (2.7) as the following McKean-Vlasov SDEs: where , ( ) is a Poisson process with intensity , and E − ( ( )) , (2.13) with ℒ ( ) := Law( ( )).We can rewrite the mean field jump diffusion SDEs (2.12) in terms of Poisson random measure as (2.14)

Other jump-diffusion CBO models
Although the aim of the paper is it to analyse the CBO model (2.11), we discuss three other jump-diffusion CBO models of interest.
Additional Model 1 : Writing (2.7) in terms of Poisson random measure suggests that we can also consider an infinite activity Lévy process, e.g. an −stable process, to introduce jumps in dynamics of particles.We can write the CBO model as However, numerical approximation of SDEs driven by infinite activity Lévy processes is computationally more expensive (see e.g.[PBL10,DMT21]), hence it can be detrimental for the overall CBO performance.
Additional Model 2 : In the SDEs (2.7), the intensity of Poisson process is constant.If we take jump intensity as ( ), i.e. a function of then the corresponding SDEs will be as follows: where all the notation are as in (2.7) and (2.10) except here the intensity of the Poisson processes ( ) is a time-dependent function ( ).It is assumed that ( ) is a decreasing function such that ( ) → 0 as → ∞.Also, in comparison with (2.7), there is no ( ) in the jump component of (2.16).Note that, the compound Poisson process with constant jump intensity is a Lévy process but with time-dependent jump intensity ( ), it is not a Lévy process, rather it is an additive process.Additive process is a generalization of Lévy process which satisfies all conditions of Lévy process except stationarity of increments [KI99].The SDEs (2.16) present another jump-diffusion CBO model driven by additive process.The analysis of model (2.16) follows similar arguments since the jump-diffusion SDEs (2.16) can also be written in terms of the Poisson random measure with intensity measure ( ) , where ( ) ≥0 is a family of Lévy measures.
Additional Model 3 : In model (2.11), the particles have idiosyncratic noise which means they are driven by different Wiener processes and different compound Poisson processes.Instead, we can have a different jump-diffusion model in which the same Poisson noise drives particle system but jumps sizes still independently vary for all particles.This means jumps arrive at the same time for all particles, but particles jump with different jump-sizes.We can write CBO model as (2.17) We compare performance of the jump-diffusion CBO models (2.11) and (2.17) in Section 5.

Discussion
Firstly, we will discuss dependence of the parameters ( ), ( ), ( ) and on dimension .The independent and identical distribution of , which denotes the −th component of , result in the non-dependency of parameters on dimension in the similar manner as for the model (2.5).
In the previous CBO models, there were only two terms namely, the drift term and the diffusion term.The drift tries to take the particles towards their instantaneous weighted average.The diffusion term helps in exploration of the state space with the aim to find a state with better weighted average than the current one.The model (2.7) contains one extra term, which we call the jump term.Jumps help in intensifying the search in a search space and aids in avoiding premature convergence or trapping in local minima.This results in more effective use of the interaction of particles.
Moreover, the effect of jumps decays with time in (2.7) by virtue of decreasing ( ).The reason for considering the model (2.7) where jumps affect only the initial period of time is that we want particles to explore more space faster at the beginning of simulation and, as soon as the weighted average of particles is in a vicinity of the global minimum, we do not want jumps to affect convergence of particles towards that consensus point lying in the close neighbourhood of the global minimum.Therefore, the time-dependent parameters and degeneracy of the coefficients help in exploiting the searched space.
As a consequence, the jump-diffusion noise and degenerate time-dependent coefficients in model (2.7) may help in keeping the balance of exploration and exploitation by interacting particles over a period of time.We will continue this discussion on exploration and exploitation in Section 5, where the proposed CBO method is tested.

Well-posedness results
In Section 3.1, we discuss well-posedness of the interacting particle system (2.11) and prove moment bound for this system.In Section 3.2, we prove well-posedness and moment bound of the mean field limit (2.14) of the particle system (2.11).

Well-posedness of the jump-diffusion particle system
This section is focused on showing existence and uniqueness of the solution of (2.11).We first introduce the notation which are required in this section.
Let us denote x := ( 1 , . . ., Let us represent ℓ ( ) as the Lebesgue measure of , and for the sake of convenience we will use in place of ℓ ( ) whenever there is no confusion.We can write the particle system (2.11) using the above notation as In order to show well-posedness of (3.1), we need the following natural assumptions on the objective function .Let := inf . (3.2) Assumption 3.2.: R → R is locally Lipschtiz continuous, i.e. there exists a positive function ( ) Assumption 3.2 is used for proving local Lipschitz continuity and linear growth of and , Lemma 3.1.Under Assumptions 3.1-3.2,the following inequalities hold for any x , y ∈ R satisfying sup =1,..., | |, sup =1,..., | | ≤ and for all = 1, . . ., : where Proof.Let us deal with the first inequality above.We have .
Using Jensen's inequality, we have Using the Cauchy-Bunyakowsky-Shwartz inequality, we get The second inequality directly follows from where ( ) means the −th component of -dimensional vector and means the −th component of −dimensional vector .Therefore, from Lemma 3.1, we can say that we have a positive function ( ) of > 0 such that where is some positive constant independent of |x |.Then the proof immediately follows from [GK80, Theorem 1].
In the last step of proof above, we highlighted that may depend on .However, for convergence analysis in later sections we need an uniform in bound for sup =1,..., ≥ 1 which we prove under the following assumptions as in [CCTT18].
Assumption 3.3.There exists a positive constant such that Assumption 3.5.There exists constants > 0 and > 0 such that As one can see, we need a stronger Assumption 3.3 as compared to Assumption 3.2 to obtain a moment bound uniform in .The Assumptions 3.4-3.5 are to make sure that objective function has quadratic growth at infinity.From [CCTT18, Lemma 3.3], we have the following result under Assumptions 3.1, 3.3-3.5: where 1 = 2 + 2 and 2 = 2 1 + 1 2 , is from Assumption 3.5.
where ( ) is from (2.11) and is a positive constant independent of .
Proof.Let be a positive integer.Using Ito's formula, we have First taking supremum over 0 ≤ ≤ and then taking expectation, we get To deal with the second term in (3.5), we use Young's inequality and obtain To ascertain a bound on | ¯ ( )| 2 , we first apply Jensen's inequality to , , which on applying the elementary inequality, ( + ) ≤ 2 −1 ( + ), , ∈ R + and Jensen's inequality, gives As a consequence of the above calculations, we get where is a positive constant independent of .Using the Burkholder-Davis-Gundy inequality, we get , which on applying generalized Young's inequality ( where in the last step we have utilized Holder's inequality.Now, we move on to obtain estimates which are required to deal with fourth and fifth term in (3.5).Using Young's inequality, we have In the same way, applying Young's inequality, we obtain Following the same procedure based on (3.4), which we followed to obtain bound (3.6), we also get where is a positive constant independent of .It is left to deal with the last term in (3.5).Using the Cauchy-Bunyakowsky-Schwartz inequality, we get We have and hence Taking supremum over {1, . . ., }, we obtain sup =1,..., which gives our targeted result for positive integer valued by applying Grönwall's lemma (note that we can apply Grönwall's lemma due to (3.3)).We can extend the result to non-integer values of ≥ 1 using Holder's inequality.

Well-posedness of mean-field jump-diffusion SDEs
In this section, we first introduce Wasserstein metric and state Lemma 3.4 which is crucial for establishing well-posedness of the mean-field limit.Then, we prove existence and uniqueness of the McKean-Vlasov jump-diffusion SDEs (2.12) in Theorem 3.5.
Let D([0, ]; R ) be the space of R valued cádlág functions and (R ), ≥ 1, be the space of probability measures on the measurable space (R , ℬ(R )) such that for any and which is equipped with the -Wasserstein metric and the simple rearrangement together with Assumption 3.4, gives where > 0 is a constant.We will also need the following notation: where ∈ 4 (R ).
, where > 0 is independent of and .Proof.Let ∈ ([0, ]; R ).Consider the following SDEs: where is a positive constant depending on and , and ℒ ( ) represents the law of ( ).
We define a mapping where Let ∈ (0, 1).For all , + ∈ (0, ), Ito's isometry provides where is a positive constant independent of .Using Lemma 3.4 and (3.15), we obtain where is a positive constant independent .This implies the Hölder continuity of the map → ¯ ( ).Therefore, the compactness of T follows from the compact embedding 0, 1 2 ([0, ]; R ) ↩→ ([0, ]; R ).Using Ito's isometry, we have where is a positive constant independent of .Moreover, we have the following result under Assumptions 3.1, 3.3-3.5 [CCTT18, Lemma 3.3]: where 1 and 2 are from (3.4).Consider a set The set is non-empty due to the fact that T is compact (see the remark after Theorem 10.3 in [GT83]).Therefore, for any ∈ , we have the corresponding unique process ( ) ∈ D([0, ]; R ) satisfying (3.13), and ℒ ( ) represents the law of ( ), such that the following holds due to (3.17): for all ∈ [0, ].Substituting (3.18) in (3.16), we get , which on applying Grönwall's lemma gives where is independent of .Due to (3.18) and (3.19), we can claim the boundedness of the set .Therefore, from the Leray-Schauder theorem [GT83, Theorem 10.3] there exists a fixed point of the mapping T. This proves existence of the solution of (2.14).
Let 1 and 2 be two fixed points of the mapping T and let us denote the corresponding solutions of (3.13) as 1 and 2 .Using Ito's isometry, we can get Note that is a bounded set and by definiiton 1 and 2 belong to .Then, we can apply Lemma 3.4 to ascertain Using the above estimate, Grönwall's lemma and the fact 1 (0) = 2 (0) in (3.20), we get uniqueness of the solution of (2.14).
Proof.Recall that under the assumptions of this theorem, Theorem 3.5 guarantees existence of a strong solution of (2.14).Let be a positive integer.Let us denote = inf{ ≥ 0 ; | ( )| ≥ }.Using Ito's formula, we obtain First taking suprema over 0 ≤ ≤ ∧ and then taking expectation on both sides, we get To deal with the second term in (3.21), we use Young's inequality and ascertain Using Burkholder-Davis-Gundy inequality, we have We apply generalized Young's inequality ≤ ( 1 )/ 1 + 2 /( 2 / 1 2 ), , 1 , 2 > 0, 1/ 1 + 1/ 2 = 1 and Holder's inequality on the right hand side of (3.23) to get We have the following estimate to use in the fourth term in (3.21): We make use of Minkowski's inequality to get Now, we find an estimate for the last term in (3.21).Using the Cauchy-Bunyakowsky-Schwartz inequality, we obtain Using Doob's optional stopping theorem [App04, Theorem 2.2.1], we get We have the following result under Assumptions 3.1, 3.3-3.5 [CCTT18, Lemma 3.3]: where 1 and 2 are from (3.4).Substituting (3.22), (3.24)-(3.28) in (3.21), using Holder's inequality, we arrive at the following bound: which on using Grönwall's lemma gives where is independent of .Then, tending → ∞ and applying Fatau's lemma give the desired result.

Convergence results
In Section 4.1, we prove the convergence of ( ), which is the mean field limit of the particle system (2.11), towards global minimizer.This convergence proof is based on the Laplace principle.Our approach in Section 4.1 is similar to [CJLZ21, Appendix A].The main result (Theorem 4.3) of Section 4.1 differs from [CJLZ21] in three respects.First, in our model (2.11), the parameters are time-dependent.Second, we need to treat the jump part of (2.11).Third, the analysis in [CJLZ21] is done for quadratic loss function but the assumptions that we impose on the objective function here are less restrictive.In Section 4.2, we prove convergence of the interacting particle system (2.11) towards the mean-field limit (2.14) as → ∞.In Section 4.3, we prove uniform in convergence of the Euler scheme (2.19) to (2.11) as ℎ → 0, where ℎ is the discretization step.

Convergence towards the global minimum
The aim of this section is to show that the non-linear process ( ) driven by the distribution dependent SDEs (2.12) converges to a point * which lies in a close vicinity of the global minimum which we denote as min .To this end, we will first prove that Var( ) := E| ( ) − E( ( ))| 2 satisfies a differential inequality which, with particular choice of parameters, implies exponential decay of Var( ) as → ∞.We also obtain a differential inequality for ( ) := E − ( ( )) .
The approach that we follow in this section is along the lines of [CCTT18, CJLZ21] but with necessary adjustments for the jump term in (2.12).Lemma 4.1.Under Assumptions 3.1, 3.3-3.5, the following inequality is satisfied for Var( ): Proof.Using Ito's formula, we have Taking expectation on both sides, we get We also have We estimate the term |E( ( )) − ¯ ( )| 2 using Jensen's inequality as where 3) and (4.4) in (4.2) gives the targeted result.
To prove the main result of this section, we need an additional inequality, which is proved under the following assumption.Assumption 4.1.
∈ 2 (R ) and there exist three constants 1 , 2 , 3 > 0 such that the following inequalities are satisfied for sufficiently large : where is a d-dimensional random vector andis real valued random variable introduced in Section 2.2.
We note that for The conditions ( ) and ( ) are straightforward to verify for 1 + | | 2 .This implies the existence of a function satisfying the above assumption.This ensures that the class of functions satisfying the above assumption is not empty and is consistent with Assumptions 3.1, 3.3-3.5.The most important implication is that the above assumption allows to have quadratic growth which is important for several loss functions in machine learning problems.
In [CCTT18], the authors assumed ∈ 2 (R ), the norm of Hessian of being bounded by a constant, and the norm of gradient and Laplacian of satisfying the inequality, Δ ≤ 0 + 1 |∇ | 2 , where 0 and 1 are positive constants.Therefore, in Assumption 4.1, we have imposed restrictions on similar to [CCTT18] in the essence of regularity but adapted to our jump-diffusion case with component-wise Wiener noise.
From (4.3) and (4.4), we have This implies which is what we aimed to prove in this lemma.
Our next objective is to show that E( ( )) converges to * as → ∞, where * is close to min , i.e. the point at which ( ) attains its minimum value, .Applying Laplace's method (see e.g. [FW12, Chap.3] and also [PTTM17,CCTT18]), we can calculate the following asymptotics: for any compactly supported probability measure ∈ (R ) with min ∈ supp( ), we have Based on the above asymptotics, we aim to prove that where a function Γ( ) → 0 as → ∞.
We also introduce where is introduced in Section 2.2, and 1 , 2 and 3 are from Assumption 4.1.The next theorem is the main result of this section.We will be assuming that ≤ 3/4 which can always be achieved by choosing sufficiently small Var(0).Theorem 4.3.Let Assumptions 3.1, 3.3-3.5 and 4.1 hold.Let us also assume that ℒ (0) is compactly supported and min ∈ supp(ℒ (0) ).If ≤ 3/4, then Var( ) exponentially decays to zero as → ∞.
Taking expectation on both sides of (2.14) (recall that E-= 0), applying Holder's inequality and using (4.3) gives where is a positive constant independent of .

Convergence to the mean-field SDEs
In the previous section, we showed convergence of the non-linear process ( ) from (2.14) towards the global minimizer.However, the CBO method is based on the system (2.7) of finite particles.This means there is a missing link in the theoretical analysis which we fill in this section by showing convergence of the particle system (2.7) to the mean-field limit in mean-square sense (2.14) as the number of particles tends to infinity.The proof of this result has some ingredients inspired from [MT05] (see also [MT21]), precisely where we partition the sample space (cf.Theorem 4.7).Further, it is clear from the proof that we need stronger moment bound result like in Lemmas 3.3 and 3.6, as compared to [CCTT18,Lemma 3.4].We first discuss some concepts necessary for later use in this section.We introduce the following notation for the empirical measure of i.i.d.particles driven by the McKean-Vlasov SDEs (2.14): where is the Dirac measure at ∈ R .We will also need the following notation: . (4.12) Using discrete Jensen's inequality, we have which, on rearrangement and multiplying both sides by − , gives where we have used Assumption 3.4 for the second inequality.We recall that a random variable ( ) is a.s.finite if there is an increasing sequence { } ∈N with → ∞ as → ∞ such that Let ( ) be an increasing continuous function of ∈ R then ( ( )) is a.s.finite random variable as well.Also, if 1 ( ) and 2 ( ) are a.s.finite random variables then 1 ( ) ∨ 2 ( ) is also an a.s.finite random variable.If ( ) is a.s.finite then by continuity of probability we have [Shi13]: We know that ( ), governed by the McKean-Vlasov SDEs (2.14), are i.i.d.random variables for every ≥ 0, therefore using Chebyshev's inequality, we get where we have used Lemma 3.6, = | ( )| 2 − E| ( )| 2 and is independent of .We take ∈ (0, 1) and define The Borel-Cantelli lemma implies that the random variable for all ∈ [0, ].Using (4.15) in (4.13) and Lemma 3.6, we get This show that lim Lemma 4.4.Let Assumptions 3.1, 3.3-3.5 be satisfied.Let E| (0)| 4 < ∞ and E| | 4 < ∞.Then, the following bound holds for all ∈ [0, ] and sufficiently large : where ¯ ℰ ( ) is from (4.12), ¯ ( ) is from (2.13), ( , ) is an . .finite ℱ − measurable random variable and ∈ (0, 1).
Proof.We have Note that E ( ) is a −dimensional zero vector and E( ( ) • ( )) = 0, ≠ .Then, using Theorem 3.6, we obtain where is a positive constant independent of .As a consequence of above estimate and using Chebyshev's inequality, we get Therefore, by the Borel-Cantelli lemma there exists an a.s.finite ℱ -measurable random variable 2 ( , ) such that the following bound holds: In the same manner, we can ascertain and Lemma 4.5.Let Assumptions 3.1, 3.3-3.5 be satisfied.Then, the following inequality holds for all ∈ [0, ]: where is from (4.24), ¯ ( ) is from (2.10), ¯ ℰ ( ) is from (4.12), > 0 is independent of and .
Proof.We have .
Using the discrete Jensen inequality, we get where is a positive constant independent of .Applying Assumptions 3.3-3.4,the Cauchy-Bunyakowsky-Schwartz inequality and Young's inequality, ≤ 2 /2 + 2 /2, , > 0, we obtain On squaring both sides, we ascertain Using Holder's inequality, we have Therefore, where > 0 is independent of and .
Proof.We have Using Jensen's inequality and squaring both sides, we get where is a positive constant independent of .Applying Assumption 3.4, we ascertain Hence, using Theorem 3.6, we obtain , and is independent of and .We have Note that E 1 ( ) • 1 ( ) = 0 for ≠ and ∧ is a bounded stopping time then E 1 ( ∧ ) • 1 ( ∧ ) = 0 for ≠ because of Doob's optional stopping theorem [App04, Theorem 2.2.1].Using Theorem 3.6, we deduce where is independent of .In the similar manner, we can obtain where is independent of .Using (4.31) and (4.32), we get the following estimate: , where is independent of and .
We get the following estimate for 1 ( ) by applying Lemma 3.3 and Theorem 3.6: where is a positive constant independent of and .Now, we estimate 2 ( ).We have E(| Ito's formula, we have Substituting (4.27) and (4.30) in (4.39), we obtain , where > 0 is independent of and .Taking supremum over = 1, . . ., , we get sup =1,..., where > 0 and > 0 are constants independent of and .In the above calculations, we have used the facts that < 2 (4.41) The term (4.34) and the choice of provide the following estimate: where > 0 is independent of and .where ℰ = 1 =1 ( ) .△ Remark 4.3.Theorem 4.7 implies weak convergence of the empirical measure, ℰ of interacting particle system towards ℒ ( ) which is the law of the mean-field limit process ( ) (see [Shi13,Szn91]).△

Convergence of the numerical scheme
To implement the particle system (2.7), we have proposed to utilize the Euler scheme introduced in Section 2.2.3.The jump-diffusion SDEs (2.7), governing interacting particle system, have locally Lipschitz and linearly growing coefficients.Due to non-global Lipschitzness of the coefficients, it is not straightforward to deduce convergence of the Euler scheme to (2.7).In this section, we go one step further and prove this convergence result uniform in .To this end, we introduce the function ℎ ( ) = , ≤ < +1 , where 0 +1 − = ℎ for all = 0, . . ., − 1.We write the continuous version of the numerical scheme (2.19) as follows: where ℎ → 0 means that keeping fixed the time-step of uniform partition of [0, ] goes to zero.
Let Assumptions 3.1-3.2hold.Let E| (0)| 2 < ∞ and E| | 2 < ∞, then the particle system (4.44) is well-posed (cf.Theorem 3.2).Moreover, if E| (0)| 2 < ∞ and E| | 2 < ∞ for some ≥ 1, then, due to Lemma 3.3, the following holds: where we cannot say that is independent of ℎ.However, to prove the convergence of numerical scheme we need the uniform in ℎ and moment bound, which we prove in the next lemma.
Proof.Let be a positive integer.Using Ito's formula, the Cauchy-Bunyakowsky-Schwartz inequality and Young's inequality, we have First taking supremum over 0 ≤ ≤ and then expectation, we obtain where is independent of ℎ and .Using the Burkholder-Davis-Gundy inequality (note that we can apply this inequality due to (4.46)) and the fact that E| | 2 < ∞, we get . Applying Young's inequality and Holder's inequality, we ascertain Using Jensen's inequality and (3.4), we have where > 0 is independent of ℎ and .Taking supremum over = 1, . . ., , we get sup =1,..., where > 0 is independent of ℎ and .Using Grönwall's lemma, we have the desired result.

Numerical Examples
In this section, we conduct numerical experiments on the Rastrigin and Rosenbrock functions by implementing the models (2.5), (2.7), (2.17) and model with common noise introduced in [HJK20, HJK21].We use the Euler scheme for implementation with ℎ = 0.01.We run 100 simulations and quote the success rates.We call a run of particles a success if | ¯ ( ) − min | ≤ 0.25.Defining success rate in this manner is consistent with earlier CBO papers.where we take = 20.The minimum is located at (0, . . ., 0) ∈ R 20 .In this experiment for the Rastrigin function, the initial search space is [−6, 6] 20 and final time, = 100.We take = 1, = 5.1 for CBO, CBOwCWN, JumpCBO and JumpCBOwCPN models.We take ( ) = 1 when ≤ 20 and ( ) = 1− /20 when > 20 for JumpCBO and JumpCBOwCPN models.Also,is distributed as standard Gaussian random variable and we choose jump intensity, , of Poisson process equal to 20.In the case of Rastrigin function, the performance of JumpCBO model (2.7), JumpCBOwCPN model (2.17) and CBO model (2.5) is comparable.However, CBOwCWN of [HJK20,HJK21] does not perform well.As the alpha is increased from 20 to 30, the success rates are fairly improved.We have taken constant and , and decaying for the jump-diffusion CBO models.As one can see, jumps have impacted the performance positively in CBO when = 20.Another fact to be noticed is that performance of the jump-diffusion models with common or independent Poisson processes is very similar.It is also clear from the experiment that CBOwCWN model of [HJK20,HJK21] does not induce enough noise in the dynamics of the particle system sufficient for effective space exploration.

=1
[100( +1 − 2 ) 2 + ( − 1) 2 ]/ , (5.2) where we take = 5.The minimum is located at (1, . . ., 1) ∈ R 5 .In this experiment for the Rosenbrock function, the initial search space is [−1, 3] 5 and final time, = 120.We take = 1, = 5 for CBO as well as CBOwCN models.We take ( ) = 2 − − /100 , ( ) = 4 + − /90 and ( ) = 1 for ≤ 90 and ( ) = 1− /90 for > 90.Note that (0) = 1 and (0) = 5 which are same as parameters and for the CBO and CBOwCN models.Also,is distributed as standard Gaussian random variable and we choose jump intensity, , of Poisson process equal to 90.In the case of Rosenbrock function, there is a significant improvement in finding global minimum when using the jump-diffusion models (2.7) and (2.17) in comparison with (2.5) and CBOwCWN of [HJK20,HJK21].As is the case with the Rastrigin funciton, for the Rosenbrock funciton, both jump-diffusion models have similar performance.We note that the Rosenbrock function has quartic growth.We take time-dependent ( ), ( ) and ( ) for the jump diffusion models so that ( ) is increasing function, ( ) is a decreasing function, and ( ) is constant for some period of time and then starts decreasing exponentially.This experiment illustrates a good balance of exploration and exploitation delivered by the proposed jump-diffusion models.The particles explore the space until = 90 and after that particles start exploiting the searched space.

Concluding remarks
We have developed a new CBO algorithm with jump-diffusion SDEs, for which we have studied its well-posedness both at the particle level and its mean-field approximation.The key feature of the jump-diffusion CBO is a more effective energy landscape exploration driven by the randomness introduced by both Wiener and Poisson processes.In practice, this translates into better success rates in finding the global minimizer, and a more robust initialization, which can be located far away from the global minimizer.A natural extension of the current work is a systematic study of CBO with constraints in the search space as recently discussed in [GP21, CTV21, FHPS21, BHK + 22].This is particularly challenging because of the need to accurately treat boundary conditions for the SDEs (see e.g.[MT21]).Another interesting research direction is the exploration of jump-diffusion processes in the framework of kinetic-type CBO models [BBP22,KHJK22].
Note that ( ) is a deterministic function of , therefore the coefficients of SDEs (3.13)only depend on and .The coefficients are globally Lipschitz continuous and have linear growth in .The existence and uniqueness of a process ∈ D([0, ]; R ) satisfying SDEs with Lévy noise (3.13) follows from [App04, pp.311-312].We also have .13) for any ∈ [0, ].