Cluster integrals and virial coeﬃcients for realistic molecular models

We present a concise, general and eﬃcient procedure for calculating the cluster integrals that relate thermodynamic virial coeﬃcients to molecular interactions. The approach encompasses non-pairwise intermolecular potentials generated from quantum chemistry or other sources; a simple extension permits eﬃcient evaluation of temperature and other derivatives of the virial coeﬃcients. We demonstrate with a polarizable model of water. We argue that cluster-integral methods are a potent yet underutilized instrument for the development and application of ﬁrst-principles molecular models and methods.

More than a century of effort has been put toward bridging nanoscopic and macroscopic behavior, with the aim of predicting and understanding thermophysical properties quantitatively from molecular considerations. Such a capability can have immense value: reliable thermophysical models are essential to engineering design, optimization, and control applications [1]; conversely well-structured thermophysical models provide a conduit for macroscopic experiments to inform molecular modeling, guiding the development of ab initio methods and semi-empirical force fields. Only two general approaches are available to bridge these scales rigorously. The first, molecular simulation, is very widely used. It is a type of surrogate for experiment, with many of the good and bad features this entails. It is computationally expensive, and hence cannot be employed on-the-fly as part of a larger calculation. Also, despite the detailed molecular information it can yield, it is still in some respects a blunt instrument-when simulation disagrees with experiment, it is difficult to know what features of the molecular model led to this failure.
The second rigorous nano/macro bridge is much less commonly applied. Cluster integrals reformulate the partition function into computationally tractable pieces. These integrals enter as parameters in models for the thermodynamic behavior, such as the equation of state, in which the integrals appear as the familiar virial coefficients [2,3]. Cluster-integral approaches are (as yet) inapplicable to condensed or ordered phases, hence they apply primarily to vapor and supercritical states, but otherwise they are distinct from all other thermophysical models in being explicit, rigorous, and applicable to arbitrarily complex molecular species.
Moreover, their hierarchical structure, in which 2-, 3-, and generally n-body interactions can be studied independently, coupled with the information obtainable from their temperature dependence, provides a rich source of focused information for assessing the strengths and weaknesses of molecular models.
Cluster-integral methods received considerable attention in the decades following their introduction by Mayer and Mayer in the 1930's and 40's [4,5], but in subsequent years interest waned due to the difficulty in computing coefficients for even the simplest models.
However, the past decade has brought renewed attention, with considerable advances seen in theory, methods and applications [6][7][8][9][10][11][12]. While the approach remains inapplicable to condensed phases, there is an enormous range of non-trivial and technologically important behavior encompassed by the supercritical region to which it can apply [13][14][15][16][17][18][19][20][21]. Considering that first-principles calculations of ideal-gas properties now routinely outperform experiment in accuracy and cost [22][23][24][25][26], it should be self-evident that cluster integrals-which proceed naturally and methodically from the ideal-gas starting point-form the next natural frontier to advance ab initio computational chemistry. Indeed, in a recent study of helium [27], virial coefficients computed from first-principles force fields yielded supercritical property data with accuracy and precision that rivals-and probably exceeds-the best experimental measurements.
Recently, one of the co-authors of this Communication presented a recursive algorithm for calculation of the integrand of the cluster integrals for the virial coefficient [11]. The algorithm drastically reduces the time required to compute the integrand for a given configuration, and consequently it has made possible calculation of virial coefficients to higher order than previously thought possible. Following convention, the treatment was developed and derived for pairwise-additive models. Here, we present a fully general algorithm for calculating arbitrary virial coefficients for non-additive potential energy surfaces. This algorithm provides a better basis for framing the development of cluster integrals-it is not only more general, it is simpler too. The approach is developed in the form of a recursion, and it makes no mention of pair bonds or biconnected graphs.
Non-reliance on pairwise additivity is a crucial feature of this scheme. Although the assumption of pairwise additivity is broadly practiced in molecular simulation, it is wholly incompatible with a framework that purports to provide first-principles properties with accuracy that rivals experiment (see, e.g. [27,28]). Virial coefficients have been computed for non-pairwise potentials previously [29][30][31][32][33][34], but no methods for multibody potentials exploit the recursion developed in [11]. Instead they rely on explicit enumeration of the clusters that sum to yield the coefficient; while progress has been made in economizing the process [35], the approach quickly becomes unwieldy to apply for increasing order of the virial coefficient. Moreover, important related quantities, and in particular temperature derivatives of the integrals, are not handled at all.
Each coefficient appearing in density series for thermodynamic properties (see Supporting Information [36]) is expressible as an integral over configurations of N molecules: where Λ is the de Broglie wavelength. We introduce here do; both f C and f B are defined for a given r N as a complicated sum of products involving functions on subsets of the coordinates, with a number of terms that grows faster than exponentially with N . Here we present a means to accomplish these evaluations in a way that is efficient and easily implemented.
The inversion giving the f C in terms of the f Q is [39] f where the sum is over all partitions λ of the set N , with p the number of blocks λ i in each partition. The first three formulas are: It is useful to discuss these functions in reference to a representation as graphs. Equation (3c) is depicted in Fig. 1 as an example. The faces (and lines) join vertices that represent the labeled coordinates, with color used to indicate the function.
A computationally expedient means to obtain the f C from the f Q is via the recursion: where the sum over S includes all possible labelled subsets of sizes 1 to N − 1, from the set of N molecules, such that every subset S contains a chosen particle (labeled 1), and S * is the complement of S. Since the f Q are known, this expresses f C (N ) entirely in terms of f C 's for sets of size smaller than N , so the recursion can be started from trivially small subsets, for example f C = f Q = 1 (Eq. (3a)) for one particle, and the f C 's then calculated for subsets of increasingly large size up to the required size N . The proof that (4) is consistent with (2) is given in the Supporting Information [36].
Stell [40] and Sherman [41] independently derived the "cluster-star inversion" and showed that f B (N ) could be written as a weighted sum over connected tree graphs formed from f C faces. While in principle this result provides the connection needed to complete the sequence [7,8] 6 , f B, [6−8] FIG. 2: Graphs representing some of the terms that sum to give f B (8). Green regions and lines represent f C faces. Orange number above and left of each graph is its weight as given by (6).
Beneath each graph is shown the functions to which it contributes (with v ranges shown in square brackets, when needed).
properties, their result does not suggest an efficient computational route to realize this connection. We develop this result here, again using recursion.
The cluster-star inversion may be written as: Examples are shown in Fig. 2. Pairs of f C faces may be joined at a single vertex they have in common, which then represents an articulation point, and the tree structure requires that no chain of joined faces can form a ring. More than two f C faces can be joined by the same vertex, and we define the multiplicity p(k) as the number of faces sharing (joined at) vertex k; p(k) = 1 if k is not an articulation point. The weight associated with each graph G formed this way is: Given this result, we propose the following recursion to compute the f B from the f C : Here we have introduced the auxiliary functions f A,v and f B,v to support the recursion. for each node v from information already calculated for smaller subsets, then calculating f B,v for increasing v from f C , which is already known from (4), and the f A 's. The computation time taken to calculate the cluster integrand for N particles scales exponentially with N using this new algorithm. This is close to the theoretical maximum efficiency, since the number of input (sub-)cluster energies itself scales exponentially with N .
Proof that the recursion (7) yields f B , timings for N = 4 to 12, comparison with the direct sum over graphs, and links to an implementation in software, are presented in the Supporting Information [36].
Temperature derivatives of the virial coefficients are needed to describe thermal properties, such as the heat capacity, the Joule-Thomson coefficient, and the speed of sound [12].
They are also useful in providing an accurate representation of the temperature dependence of the coefficients, and to aid in interpolation and extrapolation [42]. The conventional pairwise-additive treatment requires a tedious accounting for the unique irreducible graphs that can be generated from Mayer bonds and derivative bonds, and the complexity increases rapidly with derivative order (second derivatives are needed, for example, for the heat capacity). In contrast, these derivatives are given almost trivially from the recursion formulation, and moreover they remain applicable to the general case involving multibody potentials.
From (1c), where the superscript in parentheses represents the n th derivative with respect to the inverse temperature β. For a given configuration, f  (4) and (7). The sum over products introduces another layer of recursion. For example, from (4): (7b) is treated similarly. This culminates in the f Q derivative, which we write first for the general case of a temperature-dependent potential: and in the more typical case where E is independent of β: Many other derivatives of the virial coefficients are of practical interest, and can be evaluated using an approach analogous to that outlined here for temperature derivatives.
Derivatives with respect to an electric field, for example, are needed to develop virial series for the dielectric constant [43,44]. Alternatively, derivatives with respect to intermolecularpotential parameters can be computed, providing an effective means to adjust molecular models to better match experimental data [42].
As an example, we consider an application to a model for water. The most widely used molecular models for water have been formulated to describe behavior in the liquid phase at ambient conditions, largely due to its biological importance. Almost all such models are pairwise additive with fixed electrostatic features. Unsurprisingly, such models perform poorly when describing the vapor or supercritical phases, and are unable to reproduce the temperature dependence of the lower-order virial coefficients [8,29,45]. Explicit two-and three-body potentials for rigid water molecules have been developed through fits to ab initio data [46] for water dimers and trimers, and these do very well in describing experimental second and third virial coefficients [47].
An alternative approach to non-additive interactions is the use of polarizable electrostatics. The Gaussian-charge polarizable model (GCPM) of Pericaud et al. [48] was developed 8 using data from both ab initio calculation and from experiment (notably, not experimental virial coefficients), so its parameterization implicitly includes nuclear quantum effects and 3-body dispersion, induction, and exchange (for water, it has been shown that these effects contribute at least 50% of the overall 3-body interaction [49], so if they were not already implicit to the model they would need to be handled explicitly). Virial coefficients up to B 5 (but not temperature derivatives) have been computed for GCPM water previously [29][30][31], and while these data are not given with the same precision as obtained here, they can serve to partially validate our methods and calculations.
We computed virial coefficients B N (T ), N = 2, . . . , 6 for this model for several temperatures T from 270 K to 1500 K. At each temperature, we computed the coefficient itself and its first three temperature derivatives. These data were fit to a polynomial form that provides a convenient analytic expression for B N (T ) for all temperatures in the range (and beyond, given that extrapolation using a form fit to many derivatives can be very reliable [42]). Two independent sets of B N values were computed, one based on Mayer sampling [6][7][8] and the other using nested sampling [50], both computing f B for sampled configurations as described above. Details are given in the Supporting Information [36].
We used the coefficients to compute the pressure and the isochoric heat capacity c V as a function of temperature and density. The pressure can be given via the usual virial equation of state, and in addition we examine a "parametric approximant" [51]. This is an equation of state that is formulated to capture the non-analytic behavior known to prevail in the vicinity of a critical point, while also matching the zero-density limiting behavior as quantified by the virial series. The model takes the critical temperature T c and density ρ c as input parameters, and provides a much better and more regular convergence of the virial series; we use "VN " and "AN " respectively to indicate the virial series and the approximant based on virial coefficients up to B N . As input parameters, we use T c = 642.21 K and ρ c = 0.3344 g/cm 3 previously determined [48] for GCPM; the critical pressure is obtained as an output of the approximant, and the value from A6 (24.3 ± 0.4 MPa, with error indicating difference from A5) is in good agreement with molecular simulation for GCPM (24.56 MPa [48]). The estimated GCPM critical temperature, density, and pressure reported in [48] differ from experimental values [52] by −0.7%, +3.8%, and +11.3%, respectively. Figure 3 shows the performance of the VN and AN series for three supercritical isotherms (viz., 650 K, 700 K, and 800 K, which are, respectively, approximately 1%, 10%, and 25% above T c ); data are plotted up to about twice the critical density. Models are compared to data taken via isobaric Monte Carlo simulations of GCPM water, and also to data for real water [52,53]. Uncertainties in the isotherms were determined by propagating the uncertainties in the coefficients, either analytically (for VN ) or via bootstrapping (for AN ).
One of the most valuable features of the series-based approaches is their ability to selfassess their accuracy through comparison of isotherms for successive orders of the series [12].
This feature is demonstrated for the 800 K isotherm, which presents the VEOS estimates for different orders. We see that each VN curve peels away from those at higher order at about the point where it deviates from molecular simulation data. Uncertainties in V6 prevent V5 from being assessed this way, highlighting the need to know confidence limits on the VN isotherms when evaluating series convergence in this manner.
The parametric approximant has a single parameter (labeledã) that is independent of T and series order N , and we adjust this parameter to a value that gives the best overall convergence of the approximant series. Notably,ã is not adjusted via comparison to simulation or experimental data; rather it is given by considerations completely internal to the parametric-approximant model. We observe that the value ofã that yields the best convergence is also the value that results in the best agreement with molecular simulation data, for the given values of T c and ρ c . As seen in Fig. 3, both the convergence of the series and its agreement with simulation data are remarkable. Still, the systematic disagreement at higher density when using simulation-based critical properties-even when the approximant apparently is converged-is interesting, and this behavior is worthy of further study.
Isochoric heat capacity is examined in the Supporting Information [36].
The methods presented in this Communication open a new frontier in computational quantum chemistry, moving beyond single-molecule properties to those that are based in multi-molecular interactions. Cluster-integral treatments not only provide a full spectrum of thermodynamic properties for the vapor and supercritical-fluid phases from intermolecular energies, but they can also support systematic development of intermolecular-energy methods and models via comparison to experimental temperature-dependent virial coefficients. Condensed-phase properties are still out of reach for the conventional virial series, but attention is being given to this issue [10,54,55]. In the meantime, a reliable way to predict properties of just supercritical fluids and their mixtures -ultimately to compute them with accuracy exceeding experiment -would be a major advance for science and  [51] using values of T c and ρ c from [48]; curves are shown for A3 to A6, but are not distinguishable on the plot. Pink lines labeled A6* show the A6 parametric approximant using T c = 647 K and ρ c = 0.37 g/cm 3 . VEOS lines are shown only for the T = 800 K isotherm, and are truncated where they start to diverge; this is representative of behavior at other temperatures. Experimentally derived data are as represented by an accurate correlation [52,53] (black dot-dashed lines), and filled symbols are results of NPT Monte Carlo simulations for GCPM water. Error bars indicate uncertainties at 68% confidence level, and for AEOS are smaller than the line thickness.
engineering. Supercritical water (as studied here) is itself important to diverse areas such as geophysics [13] and gasification of biomass [14], while applications of supercritical fluids more generally encompass a broad range of science and technology, including separations [15], reaction engineering [16], nanotechnology [17], crystallization [18], power generation [19], carbon capture [20], and more [21]. These fields could more rapidly advance with access to predictive (ideally first-principles) methods for computing properties of arbitrary compounds and their mixtures.