. Molecular simulations and visualization: Introduction and overview. Faraday Discussions, 169, 9-22. https:

Here we provide an introduction and overview of current progress in the ﬁ eld of molecular simulation and visualization, touching on the following topics: (1) virtual and augmented reality for immersive molecular simulations; (2) advanced visualization and visual analytic techniques; (3) new developments in high performance computing; and (4) applications and model building.


Introduction
It is increasingly the case that biology, chemistry and materials science make extensive use of computational methods.The applications are diverse, spanning a range of timescales and lengthscales, from the cellular level (e.g. in systems biology) all the way down to detailed atomistic simulations of molecular assemblies, materials or small molecules.The 2013 Nobel Prize in Chemistry awarded to Martin Karplus, Michael Levitt and Arieh Warshel for the development of multiscale models for complex chemical systems is an indication of the achievements of such simulations.Molecular simulations and visualization offer fertile territory where research in human-computer interaction (HCI) and virtual reality may interact with and provide substantial benet to computational molecular sciences.One of the principal challenges of this research area is that it is inherently cross-disciplinary and therefore requires deep exchanges beyond the boundaries of each discipline.In what follows, we outline progress in this emerging eld, and offer a glimpse of potential directions.The discussion that follows is broken down into four interconnected topics.The rst topic focuses on the use of virtual and augmented reality in the context of immersive molecular simulations.Progress here requires (a) advanced visualization and visual analytic Parallelisation therefore plays an important role in interactive simulation, because it can give substantial increases in the performance of molecular dynamics simulation engines.Stream architectures like GP-GPUs (as described below) provide one way to exploit parallelisation and achieve performance increases. 12,15Compared to standard single-core environments, data transfer tends to present a signicant bottleneck in massively parallel environments, resulting in a number of challengese.g., extracting atomic positions without degrading performance in a stream parallelized soware package is not trivial.More generally, MD simulations can produce a large quantity of simulation data, and transfer of this data may generate bottlenecks for coupling with visualization and interaction modules.Interactive all-atom simulations have recently been reported for systems up to 2 million atoms. 16For the exploration of large systems, coarse graining offers another well-balanced alternative to speed up the physical engine driving an immersive simulation experiment. 17here are a wide variety of potential applications for interactive molecular simulation frameworks, including structural modelling, conformational searching, and interpretation of the mechanisms that drive function in complex biological models.Drug design is another active area in which new interactive simulation strategies are under investigation (including haptics interaction, virtual reality, and 3D printing 18 ), in part motivated by the very high cost of bringing new medicines to market.Interactive simulations have been used to facilitate forms of nano-manipulation and even to prototype and design nanorobots. 19,20Interactive simulation frameworks already offer considerable potential for providing microscopic insight into experiments; however, an even closer linking with experiment is likely to emerge in the near future.][23] It will soon be possible to extend immersive approaches to explore not only simulation data, but experimental information as well (e.g. from NMR spectroscopy 24 ) and to subsequently build models from such data under direct human supervision. 25Another application of interactive molecular simulation involves reconstituting molecular assemblies from cryo-EM data. 26[29]

Advanced visualization and visual analytics
One of the cornerstones of modern molecular simulation concerns the visual representation of the structure of a molecule and its properties.Visual representation is particularly important in guiding the manner in which scientists think about atomic and molecular structure, which is partly a result of the fact that our human perception requires some form of augmentation in order to 'see' this world. 30Attempts to conceptualize and visualize molecules reach far back in the history of chemistry.In terms of three-dimensional molecular structures, a notable milestone along this path goes back to the early 1940s when Roger Hayward depicted the arrangement of atomic assemblies in collaboration with Linus Pauling in both a scientically accurate and aesthetically pleasing way. 31ince this era (and in particular recently), technical progress has considerably improved our ability to visualize the molecular world.Nowadays, molecular graphics are ubiquitous and every scientist can display the structure of a biomolecule on his/her personal computer 32,33 or tablet device. 34To ensure that the enormous quantity of information contained in molecular simulation data (interactive or not) furnishes maximum insight into microscopic phenomena, the investigation of new visualization and visual analysis methods is an area of active research.One of the primary focuses of this emerging area concerns the development of new ways to understand and rapidly process the radically expanding deluge of data which molecular simulations are capable of generating.Visualization assists in grasping the complexity of these data and identifying emerging properties.
It is now possible to interactively visualise very large molecular assemblies, and new developments (including the use of GPU programming) are driving performance gains and opening up new possibilities for visualization. 35Nowadays, millions of atoms and their bonds can be depicted interactively, [36][37][38] with considerable speed-ups in secondary structure representation. 39Calculating molecular surfaces on-the-y is more demanding, [40][41][42][43] but realistic rendering including the effects of lighting and ambient occlusion 44 reaches real-time refresh rates. 45Impressive progress has been achieved with interactive raytracing of molecular systems on the GPU (http://www.molecular-visualization.com/#!home/ mainPage).Using ray-casted instancing, even whole-cell simulation data may be visualized smoothly 46 on stereoscopic displays, 47 allowing the reconstruction of 3D cellular complexes built from proteins and DNA molecules. 481][52] Visual analytics (http://www.visual-analytics.eu)have great potential to aid in understanding the increasing number of simulation datasets, and have been applied in a few cases. 53,54Simplifying large quantities of complex data like those produced by MD simulations may be achieved by appropriate abstractions.7]112 A stimulating recent example is the continuous abstraction of a molecular illustration 58 to yield a continuum of molecular depictions.Another challenge that arises in particular for the visualization of molecular simulations concerns the depiction of molecular exibility. 59In fact, chemical reactions themselves are difficult to render for many visualization tools.More generally, the visualization of dynamic molecular interaction networks is a very active eld of research, [60][61][62] but is beyond the scope of this short introduction.
The ubiquity of molecular images and associated visualization tools is in part a consequence of the fact that it has been beneted from other high-growth economic areas.7][68] The availability of such tools has enabled the use of molecular visualization in collaborative structural biology, for example using TV-based 69 or web-based 70 solutions.A similar cross-fertilization is observed for GPUs, originally used in the consumer graphics market and nowadays omnipresent in high performance computing and scientic visualization.
Computing power revolution and new algorithms: GP-GPUs, clouds and more The eld of molecular simulation and visualization intrinsically depends on highperformance computing (HPC) to ensure the underlying calculations can be carried out in real time and on a broad range of hardware including commodity computers.In this context, general-purpose-GPUs, 71 cloud computing, 72 and many-core architectures 73 are nding their way into the molecular simulation community.Multi-core architectures are evolving quickly, with massive-parallelism and massive-threading available on machines like the 1.3 million thread Blue Waters supercomputer.Bespoke architecture development, like that available on the Anton machine, is similarly allowing researchers to push the boundaries of simulation 74 and new techniques based on cloud-based methods and ultrafast high-performance networking are just around the corner.These and likely future developments in HPC are making massively parallel computations viable, and are stimulating innovation across hardware, soware, and hardware/ soware integration, much of which is aimed at tackling the main challenges of molecular simulation: the size of systems which can be simulated, the time-scales which it is possible to simulate, the ability to sample large regions of molecular phase space, and the rigor of the underlying physics within the models.Lane et al. 75 have touched on many of these aspects in the context of protein folding.
The advent of programmable GPUs 76 using high-level languages like C, in conjunction with the NVIDIA CUDA (Compute Unied Development Architecture) tools, OpenCL and other frameworks has been instrumental in porting soware and developing new algorithms.The rise of GPUs offers another example of how advances in high-growth consumer markets (namely video gaming) has been exploited for the purposes of scientic simulation.As a result of the power of GPUs, and the fact that they are relatively inexpensive, much academic and commercial molecular dynamics (MD) soware (e.g., GROMACS, NAMD and AMBER) 77,113 has been GPU-accelerated.The enhancement in speed can vary, in part due to the specic algorithms used, and also as a result of the particular GPU hardware architecture on which the code is run, both of which lead to different scalability and execution time.Adding many-body terms to potentials used in classical simulations is a case where the computational cost has been mitigated through exploitation of new hardware, by the development of a shared-memory force-decomposition algorithm. 78Calculations using ReaxFF, which is a reactive force eld, have been accelerated using GPUs. 79Aspects of reliability and reproducibility have been studied in the context of error-correcting code. 80or quantum chemistry, soware adoption of GPUs has been slower than for MD simulations, but building on initial work [81][82][83] many electronic structure packages now have GPU-enabled codes, and there is signicant interest in utilizing fast quantum chemical methods, for example, to investigate reaction mechanisms. 84Recent work has investigated GPU acceleration in a range of contexts, for example: (1) double precision matrix multiply operations within legacy quantum chemistry codes; 85 (2) ONETEP, a linearly-scaling plane wave density functional theory (DFT) code; 86 (3) BigDFT, a hybrid DFT code based on Daubechies wavelets; 87 (4) VASP, GPU-accelerated electronic structure calculations; 88 (5) real-space DFT implementations within the Octopus code; 89 and ( 6) semiempirical methods. 90Very recently, Sisto et al. have outlined fragment-based quantum chemical methods which rely on both distributed and shared memory GPU parallelism to carry out very large excited state time-dependent DFT (TDDFT) calculations using the TeraChem soware framework. 91loud computing [92][93][94][95] is a relatively recent approach to molecular simulation that builds on distributed computing approaches like FightAIDS@home, SETI@home, and Folding@home.Distinct from other high-performance and distributed paradigms, it provides large-scale compute infrastructure on demand.In many respects, cloud-based approaches are still in their infancy, but are attracting growing attention.For applications of molecular simulation and modelling, cloud computing can offer large-scale data and compute capability for a short 'burst' phase.Cloud computing provides another example wherein molecular simulation benets from exploiting approaches which have applications in other sectors: for example, cloud-based computing has appeal to small and medium biotech start-ups where continuous in-house HPC facilities would be under-utilised.Embarrassingly parallel tasks, like the generation of combinatorial databases, virtual screening of millions of compounds, and the analysis of the huge genome datasets, are well suited to existing cloud provisions.A workow system called AutoDockCloud 96 enables distributed screening on a cloud platform using the molecular docking program AutoDock.For applications with greater demands for inter-processor communications, scalability is a key issue.A plugin 97 for the popular VMD soware 98 (a front-end for NAMD 10 ) allows one to (1) create a cloud-compute cluster on Amazon EC2; (2) submit a parallel NAMD job; (3) transfer the results back for subsequent post-processing; and (4) shutdown and terminate the compute cluster on Amazon EC2.These and other case studies of molecular modelling using cloud computing have been reviewed by Ebejer et al. 72 Crowd-sourcing and serious games: from docking to protein folding Molecular simulation, like many areas of computational science, involves a tradeoff between user control and automation.Users usually have a deeper understanding and context for the problem at hand, but limited speed and memory.Computational systems, on the other hand, excel in memory and speed, but are limited when it comes to understanding and context.Even with the tremendous advances in computation discussed in the previous section, it is likely the case that there will always be a limit to the size and accuracy of models that can be built for a particular system, and therefore some level of human understanding will always be required.It is therefore of fundamental interest to consider radically new approaches to molecular modellingi.e., utilizing paradigms that do not rely exclusively on ever-faster computational frameworks.
Very recently, there has been a great deal of interest directed at investigating whether human intuition and problem solving skills can be effectively mobilized (usually via the internet) as a new resource for solving research questions. 99The interest in these solutions is such that participants may be stimulated by the prospect of being remunerated. 100Success in this area requires that the research approach or proposition is cast in a way that is sufficiently engaging, entertaining, or educational.Along these lines, the Defense Advanced Research Projects Agency (DARPA) recently developed a challenge to see how quickly it is possible to involve a large number of people to full a particular task. 101Such 'crowd-sourced' research approaches 102 have received increasing attention.For example, one particularly successful example is the Galaxy Zoo 103 project, which transforms a potentially mundane, but difficult computer vision task (classifying images of galaxies) into an attractive challenge.When it comes to solving scientic research problems, collective and intrinsic motivation can marshal large communities of volunteers.This requires a high-level of visibility, which social media and modern communication technologies can successfully facilitate.Once a volunteer community is established, strategies and structures must be in place to maintain the ongoing engagement of the community.In many cases, crowd-sourced scientic computing paradigms raise interesting questions related to data analysis and data integrity.
Crowd-sourced approaches to research can generate useful insight owing to user intuition as a solution to cope with complex data and unveil emerging properties: a striking example is the game Foldit. 104,105This project presents protein folding as a sort of three-dimensional jigsaw puzzle, where players are invited to shake and wiggle the three-dimensional structure of proteins in order to nd the most stable conformations.Since May 2008, when the rst beta version of this game was released, the project has gathered a large community.In some cases, Foldit players have been able to nd optimal structures that automated search strategies failed to sample.Players do not necessarily require signicant knowledge of biology to play the game and to nd stable protein congurations.It is more a matter of spatial representation in three dimensions, as well as collaboration between players.The rst 'levels' of the Foldit game are designed to train the players in order to accomplish increasingly complicated tasks.Interactions among players have led to remarkable results from a biological point of view [106][107][108] and also led players to collaboratively develop new algorithms to solve a particular problem. 109ot only do interactive and video game interfaces offer the potential for crowdsourced research studies, they also offer an engaging medium for scientic education, helping students of all ages learn scientic principles and knowledge.As a consequence, educational games are ourishing.For example, the Spectral Game 110 seeks to teach quite advanced concepts in spectroscopy, specically proton nuclear magnetic resonance (NMR).In addition to meeting specic targets, educational games and interactive molecular simulation platforms (like the distributed computing projects discussed above 111 ) have a more general effect i.e., they engage the public and thereby increase public awareness and understanding of scientic problems.New channels for engaging the public with scientic ideas are also emerging in less traditional venuesi.e., on the frontiers of aesthetic imagination and scientic visualization.As art moves increasingly toward digital mediums, artists have become fascinated with the glimpse into the invisible atomic world provided by molecular simulations and visualizations, to the extent that it has inspired new forms of artistic expression and aesthetic content. 30his journal is © The Royal Society of Chemistry 2014

Conclusion
Molecular simulation and visualization represent a vibrant melting pot of many scientic disciplines that both benets from and drives signicant progress across a range of elds.New hardware architectures, new soware algorithms, and new technological developments inspire this evolution and herald an exciting era of increasingly sophisticated and perhaps unconventional molecular simulations.The potential for these new simulation frameworks is extremely exciting: they will allow us to obtain unprecedented new research insights, develop new ways for interacting with and imagining the microscopic world, drive progress in HCI and computer science, and ultimately have profound effects beyond the scientic realm within the broader culture.