Motif-based mean-field approximation of interacting particles on clustered networks

Interacting particles on graphs are routinely used to study magnetic behaviour in physics, disease spread in epidemiology, and opinion dynamics in social sciences. The literature on mean-field approximations of such systems for large graphs is limited to cluster-free graphs for which standard approximations based on degrees and pairs are often reasonably accurate. Here, we propose a motif-based mean-field approximation that considers higher-order subgraph structures in large clustered graphs. Numerically, our equations agree with stochastic simulations where existing methods fail.

Interacting particles on graphs are routinely used to study magnetic behaviour in physics, disease spread in epidemiology, and opinion dynamics in social sciences. The literature on mean-field approximations of such systems for large graphs typically remains limited to specific dynamics, or assumes cluster-free graphs for which standard approximations based on degrees and pairs are often reasonably accurate. Here, we propose a motif-based mean-field approximation that considers higher-order subgraph structures in large clustered graphs. Numerically, our equations agree with stochastic simulations where existing methods fail.
With applications in as disparate branches of science as statistical physics [1], epidemiology [2][3][4][5][6][7][8][9], chemistry and systems biology [10,11], social science [12,13], and computer science [14][15][16], interacting particles on complex networks constitute an important class of models in the mathematician's and physicist's toolkit [17][18][19][20]. They describe systems where individual entities (particles), endowed with local states, interact with a subset of other entities (neighbors) and transition from one state to another as time evolves. For instance, in epidemiology the local state space consists of immunological statuses, such as susceptible, infected, removed etc. Who interacts with whom defines a graph with the particles as the vertices.
The time evolution of the ensemble of particle states is often described by a continuous-time Markov jump process, for which discrete-time analysis can be insufficient [21]. As the number of particles increases, the exponentially growing combinatorial state space renders exact stochastic analysis prohibitive. To this end, the standard mean-field theoretic approach has been to describe the non-equilibrium dynamics of interacting particle systems via Ordinary Differential Equations (ODEs) for the proportions of particles in each state. Together with control [22], learning-based methods [23,24] and graph limit theory [25][26][27][28], mean-field models can enable analysis of otherwise intractable settings [29][30][31]. More advanced mean-field approximations, such as heterogeneous meanfields [32], pair approximations [33][34][35][36][37] or approximate master equations (AME) [38][39][40][41] and extensions thereof [42][43][44][45], acknowledge the heterogeneity of the particles' behaviors due to the graph structure and incorporate vertices' degrees and edge counts (pairs). Though they provide reasonable accuracy for a number of applications, they are generally not asymptotically exact in that they do not agree with the Functional Law of Large Numbers (FLLN) limits of the corresponding stochastic processes, agreeing only in certain special cases [46,47]. Even for calculations of critical parameter values, standard mean-field * kai.cui@bcs.tu-darmstadt.de † wasiur.khudabukhsh@nottingham.ac.uk ‡ heinz.koeppl@bcs.tu-darmstadt.de approximations are often inaccurate [48]. Nevertheless, their simplicity and intuitiveness have commonly justified mean-field approaches despite their inexactness.
In this paper, we propose a simple and elegant derivation of a general motif-based mean-field approximation for interacting particles on bounded-degree graphs to address two crucial shortcomings of the state-of-the-art: (i) The implicit assumption of cluster-free graphs [49]. In practice, graphs encountered are far from cluster-free and exhibit complex structures [14,[50][51][52] (e.g., neural and transportation networks [53]), which greatly affect e.g. cascades in correlated networks [54]. Here, we go beyond correlation coefficients and account for arbitrary subgraph structures called motifs [55] beyond standard degree and edge-based calculations. (ii) The restriction to special cases (e.g. SIR epidemics, [56]) or dynamics driven by simple neighborhood counts. For instance, infection rates are often assumed to depend only on the number of infected neighbors, while in practice shared connections among neighbors and the shape of the induced neighborhood subgraph are too important to neglect (e.g., simplicial dynamics [14,57]). Though there exist a multitude of works on the analysis of clustered graphs [49,[58][59][60][61], to the best of our knowledge, we provide the first general approximation that takes into account both of these aspects into a single coherent mean-field framework. We now introduce the mathematical model before explaining how our approximation addresses the above two issues.
a. Model A convenient way of generating random graphs is via the Configuration Model (CM) [20,62], which allows specifying either a degree sequence or probability law from which the degrees are sampled. Each vertex is assigned as many half-edges as its degree. We may need to add or drop a parity edge if the degree sequence is not graphical, but its contribution is negligible in large graphs [20,Section 7.6,pp. 239]. The configuration model graph is then constructed by uniformly-at-random matching of all available half-edges. As N , the number of vertices, grows to infinity, the numbers of self-loops and multiple edges have independent Poisson limits whose means depend only on the first two moments of the degree distribution [19,Theorem 3.1.2]. Therefore, their contributions to the limits of various counts scaled by 0, 0, 1, 0). C: Simplicial SIS dynamics [14] as an example of general neighborhood-dependent dynamics. Susceptible vertices are infected at rate τ by infected neighbors, and additionally at rate τ for each shared neighbor.
1/N (standard mean-field scaling) vanish in the limit.
To introduce higher-order structure, we adopt the Extended Configuration Model (ECM) [63] -also known as hyperstub configuration model [56,64]. Denoting vertices and edges of graphs H by V (H) and E(H) respectively, and given M graphical network motifs G (1) , . . . , G (M ) with N 1 , . . . , N M vertices, we construct an ECM on N vertices by specifying higher-order motif participation , and d i,j ∈ N 0 denotes the number of participations (hyperstubs) as the j-th vertex (role) in the motif G (i) (see Figure 1). As in the standard CM, hyperstubs are first generated for each node in accordance with a limiting hyperstub degree distribution P (d). Subsequently, for each possible motif, we iteratively sample hyperstubs of each motif vertex role and add edges wherever the underlying motif has an edge, repeating until no hyperstubs are left.
To describe the dynamics of the interacting system, we will consider time-indexed colored ECM graphs {G t } t≥0 . Each vertex is endowed with a local (finite) state space X . Denote the state of vertex v at time t -interpreted as color -as G t [v], and define the colored neighborhoods N (v) t as colored subgraphs of G t with fixed center vertex v, induced by the set of all vertices participating in motifs with v. Treated as a stochastic process, G t is a Markov jump process with infinitesimal rates λ x→y (N (v) t ), depending on v only via its colored neighborhood configuration, i.e. the rate for vertex v to jump from state x to y ( = x) is given by Note that the rate functions λ x→y depend on the entire subgraph and its coloring up to isomorphism (not only neighbor state counts), and therefore generalize those considered in standard mean-field approximations. To illustrate this, define the neighbor evaluation function (2) for any f : X → R and colored neighborhood N (v) . Then, the simplicial susceptible-infected-susceptible (SIS) model [14], which imposes additional higher-order terms on the infection rates of vertices, can be modeled as for pairwise infection rate τ , triangle (clique) infection rate τ , recovery rate γ and indicator function 1 A . Here, the summation is over all unique triangles ∆ v involving v. This model is more realistic than the standard SIS model when shared acquaintances meet more often (see Figure 1, [50]). In our experiments, we also consider the standard SIS model where τ = 0, which can also be understood as a result of microscopic contact processes [65], for which we similarly imagine higher-order interactions to be of interest. For a susceptible-infected-removed (SIR) model, λ I→S is replaced by jumps to a third terminal state R. Finally, we consider the Ising Glauber dynamics [1] with states {U, D} and for interaction strength J > 0 and temperature T > 0. b. Mean-Field Approximation While the exact colored graphs G t can be evolved through their probability laws or their associated operator semigroup M , an exact analysis is typically prohibitive due to the combinatorial state space. In the limit of large graphs (N → ∞), our aim will thus be to approximate by a system of ODEs M the evolution of certain population fractions, obtained by aggregating the colored graphs via some aggregation function ϕ -e.g. densities of different colors commutes: The goal is to find a system of ODEs M that accurately models the evolution of population fractions ρ, such that aggregating population fractions through ϕ and then applying M is equivalent to first exactly evolving the system and then aggregating.
Since the degrees are bounded, the jumps of G t are also bounded. Therefore, one expects the jumps of various (1/N )-scaled fractions to vanish in the limit because their quadratic variations (e.g., the running sum of squared jump sizes) vanish over finite time horizons. Consequently, even though the scaled proportions are not necessarily Markovian, their large-graph limits have continuous paths and can be described using ODEs by first performing the Doob-Meyer decomposition, which intuitively separates out a stochastic process that captures the mean of the scaled proportions and a zero-mean martingale (a stochastic process that acts like an error process or fluctuations around the mean process), and then invoking the FLLN for martingales [66,67] to claim that the fluctuations around the mean process vanish in the limit.
Denote the set of non-negative integer solutions to the Diophantine equation y 1 + y 2 + . . . + y n = k by Θ(n, k). It is useful to think of k → Θ(n, k) as the equivalence class of a vector in N n 0 whose elements sum up to k (where two vectors are equivalent if their elements have the same sum). For motifs G (i) , consider their set of distinct colorings G (i) and C i ≡ |G (i) | = |X | Ni . For a vertex with hyperstub degree d, the possible counts of each neighboring motif coloring where the vertex participates as the j-th vertex role in a motif G (i) are elements of Θ(C i , d i,j ). Therefore, all colored neighborhoods N (v) will belong to an equivalence class corresponding to a count vector (configuration) z ∈ Z ⊂ × M i=1 × Ni j=1 N Ci 0 under an appropriate equivalence relation ∼, such that z ≡ (z 1,1 , . . . , z 1,N1 , z 2,1 , . . . , z M,1 , . . . , z M,N M ) ∈ Z, z i,j ≡ (z i,j k , . . . , z i,j Ci ), and z i,j k ∈ {0, 1, . . . , d i,j } denotes the number of participations as role j in neighboring motifs G (i) currently in the k-th motif coloring X i,k ∈ X V (G (i) ) .
to y. Moreover, each z ∈ [x, d] determines the colored neighborhood (up to isomorphism) of a center vertex with color x and hyperstub degree d.
Aggregating colored ECMs over equivalence classes from the quotient space G/∼, where G is the space of all colored ECMs, is tantamount to keeping track of proportions ρ t (x, d, z) of vertices in G t with color x ∈ X , hyperstub degree d, and counts of neighboring motif colorings z ∈ [x, d]. Note that although z already contains all information about x, d, for notational convenience we track proportions of (x, d, z). As N → ∞, these proportions can be described by deterministic ODEs, which we shall call the motif-based mean-field (MMF) equations.
This leads us to our main result: The MMF master equations for the limiting proportions ρ t constitute a system M of ODEs in (6) with an accuracy going beyond existing mean field approximations, and are given bẏ where we aggregate rates λ x→y (z) and z i,j kλ →y i,k,v over equivalence classes corresponding to each center vertex configuration z (since z uniquely defines the colored neighborhood up to isomorphism) and each coloring k of neighboring motifs G (i) respectively. Here, we defined unit operators I and influx step operators Λ ←y , Λ ←y i,j,k,v acting on functions f (x, d, z, y), f (x, d, z, y, k) such that we have influx by center vertex jumps from configurations Π →y [z] [Λ ←y f ] (x, d, z, y) = f (y, d, Π →y [z], x) (8) and similarly influx by jumps of all neighboring motifs' vertices that are not the center vertex (v = j) (9) where Ω →y i,v [k] denotes the motif coloring resulting from changing the color of vertex v to y in motif i with coloring k. The jump rates of any neighbors in role v of motif G (i) with coloring k from the corresponding statex = X i,k v to y are approximated by the averaged jump rate over all such colored motif occurrenceŝ since a vertex in configuration z participates z i,v k times in the considered motif coloring. See Figure 2 for a visualization. Finally, sampling i.i.d. initial states from some P 0 : X → [0, 1], the initial conditions are given by where 0 0 ≡ 1. The fractions of vertices in any state x are then given by ρ t (x) = d,z ρ t (x, d, z).
The biggest appeal of the MMF equations (7) is their simplicity and intuitiveness. While they may generally not be asymptotically exact, experimentally we find that they are quite accurate. Note that as a special case, we obtain classical approximations such as AME [38] and thereby coarser approximations [39] for degree distributionsP : N 0 → [0, 1] by considering only the edge motif G (1) , assuming binomial role distributions and aggregating equivalent terms, i.e. P (d 1,1 , d 1,2 ) = P (d 1,1 + d 1,2 ) · d 1,1 +d 1,2  c. Numerical Evaluation For numerical purposes, we generate equations only for P -supported hyperstub degrees d and simulate rescaled proportions ρ t (x, z | d) ≡ ρ t (x, d, z)/P (d). For fast ECM graph generation, we drop leftover hyperstubs (in our experiments, this amounts to less than 0.5% of all generated stubs, leading to only slight inaccuracies) instead of resampling until cardinality constraints are satisfied and allow but ignore self-loops and multi-edges. We use a third-order numerical integrator and compare MMF against the approximate master equations (AME) [38], the heterogeneous pair approximation (HPA) [35], the heterogeneous mean-field approximation (HMF) [32] and exact Gillespie simulations on graphs of size N = 100000. For use by the wider community, Python code is available at [68].
For two given, arbitrary network motifs G (1) , G (2) we consider the three parametrized families of antidiagonal, uniform and diagonal hyperstub degree distributions P a,θ , P u,θ and P d,θ with parameter θ ∈ N: For P a,θ , we put uniform mass 1/(θ + 1) on each case where j d 1,j = k and j d 2,j = θ − k for k = 0, 1, . . . , θ. In each case, we shall assume a uniform distribution over motif roles, resulting in a product of multinomials P a,θ (d) ≡ Ni 1 Ni ), where 1 Ni is the N idimensional one-vector. For P u,θ and P d,θ we similarly put equal probability mass whenever j d 1,j + j d 2,j ≤ θ and j d 1,j = j d 2,j = θ respectively.
On the ECM graphs with edge and triangle motifs (G (1) , G (2) from Figure 1), we find that our approximation matches well with the numerical Gillespie simulation. For the SIS dynamics (3,4) in Figure 3, our approximation outperforms other approximation methods over a range of (hyperstub) degree distributions and dynamics parameters. Similar assertions hold for the Ising Glauber dynamics (5) in Figure 4, where existing mean-field approximations become highly inaccurate near the critical point due to the high clustering of the considered graphs. Furthermore, our approximations remain quite accurate also e.g. for graphs with edge and square motifs (G (1) , G (3) in Figure 1) as seen in Figure 5. For the simplicial version of the SIS dynamics, in Figure 6 we find that the accuracy of our approximations is acceptable, while existing degreebased approximations are unable to handle simplicial dynamics by design. Finally, we verify the accuracy of our proposed framework on the SIR dynamics model in Figure 7 with non-binary states, where the Gillespie simulation for N = 100000 is almost indiscernible from the predicted mean-field proportions, showing the generality of our approach.
d. Discussion We have proposed motif-based meanfield equations for arbitrary neighborhood-dependent jump dynamics on a highly adjustable random graph model, considering both higher-order graph structures and dynamics. Numerical examples show that our approximations are quite accurate. Potential extensions include the consideration of general k-hop neighborhoods with k > 1, control and lumping of equations [69,70] under additional assumptions on motif roles to improve tractability. Finally, for applications, estimating hyperstub degree distributions constitutes another important problem, as an identifiability problem arises from counting larger motifs that include smaller motifs.

ACKNOWLEDGMENTS
This work has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center. HK acknowledges support by the German Research Foundation (DFG) via the Collaborative Research Center (CRC) 1053 -MAKI. WRK received no specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors.