Project B7: Automated model building and representation learning for multiscale simulations

Project B7 addresses applications of machine learning techniques to multi-scale simulation of soft-matter systems. Multi-scale methods address the problem that the complexity of high-resolution base-line models grows too quickly for problems at relevant scales. Thus, they assumed that there is a coarser-resolution structure emerging from the details that can be efficiently computed with many fewer operations but that can still inform us about relevant behavioural aspects of the system. Machine learning can help in discovering such simplified surrogate computations by fitting a restricted computational model (such as a parametrized model, a kernel regressor, or a deep feed-forward network) to example results obtained from a full-resolution simulation.

Conceptually, this involves two aspects: The first is to build a coarser-grained (CG) model. Learning of a CG model can take the form of just parametrizing a force-field or a mapping procedure motivated by physicochemical intuition, but also more generic approaches are possible, such as using generic function approximators such as kernels or deep networks to learn a latent representations the coarser level dynamics from data. The latter class of methods shine if the complexity of the emergent structure is high, and thus evasive to manual modelling. The second aspect is back-mapping, i.e., the reconstruction of a high-resolution simulation state corresponding to the coarse-grained state. As coarse-graining comes with a loss of information, this recon- struction step is inherently statistical: The goal is to match the statistics at the fine-grained level rather than trying to predict every parameter exactly. Backmapping is optional for some applications, but an important,
fundamental building block to multi-scale modelling as it captures the statistics of the lost information and permits further analysis and processing of simulation results in their original representation.

In the second funding period of the TRR, we have addressed both aspects with different approaches. The main results are new schemes to learn coarse-grained force fields from physically inspired kernel-based approximators, a new deep network architecture for predicting geometric quantities in a symmetry-preserving way, and a new algorithm for backmapping based on autoregressive models and generative adversarial networks (GANs). The work has been carried out by an interdisciplinary team of researchers from physics and computer science, which has allowed us to tackle problems from both a physical and algorithmic perspective simultaneously.

In the next funding period, we propose to dive deeper in these two aspects: Improving coarse-graining and backmapping through machine learning methods. A major obstacle, for both aspects, is efficiency: On the coarse graining side, the complexity of the surrogate computation has to be low in order to stay competitive with standard molecular dynamics (MD) simulations (e.g., pairwise potentials). On the side of backmapping, the high demands of the employed generative statistical models in terms of memory and computation time are still a major hurdle towards further improvements in accuracy. It is important to stress that the efficiency limitations are not details to be optimized but are fundamental issues: On the coarse graining side, surpassing the base-line costs of MD simulations is hard (and far out of scope of kernel- and network-based approaches). On the backmapping side, feed-forward networks do already beat the previous state-of-the-art, but currently, the strong growth of memory costs with resolution for generative networks prevents further improvements in accuracy. Correspondingly, we set efficiency and practicality as the main goal for the next funding period. For coarse-graining, we follow the approach of inferring more traditional, pairwise models with coarse-grained (CG) beads (thus being able to provide significant speed-ups over plain MD), but using modern machine learning methods to automate and fine-tune the procedure. In addition, we are targeting a new class of CG models: both broadly transferable and structure-based. We make use of an interpolation of the chemical space considered, identify representative molecules, run a large number of reference liquid simulations, and perform a large-scale extended-ensemble CG parametrization. This will open the door to CG models to be used in the context of high-throughput compound screening. On the backmapping side, the key objective is to make the underlying generative statistical models more scalable such that large systems, potentially with long-range effects, can be taken into account. In addition, we will explore new applications for backmapping, where the coarse-scale information might come from different data sources.

From an organization standpoint, we will again approach the problem from a physics and an algorithmic perspective. The work on automating the creation of traditional CG-models will have a stronger background in computational physics, with computer scientists contributing some additional ideas for ML and data analysis algorithms. Addressing the efficiency issues in backmapping requires more algorithmic improvements (scalability of deep generative models is a longstanding problem in machine learning) but also requires significant input of ideas and background from physics.

Overall, our goal is to contribute to making machine learning techniques practical for speeding-up large-scale simulations by efficient, learned multi-scale representations. We believe that research in automated modelbuilding and representation learning has the potential to fill an important gap in the broader landscape of multiscale modelling of soft-matter systems for complex effects and systems that evade analytical modelling.

Toward a structural identification of metastable molecular conformations
Simon Lemcke, Jörn H. Appeldorn, Michael Wand, Thomas Speck
J. Chem. Phys. 21 September 2023; 159 (11): 114105, (2023)
see publication

Adversarial reverse mapping of condensed-phase molecular structures: Chemical transferability
Marc Stieffenhofer, Tristan Bereau, Michael Wand
APL Materials 9 (3), 031107 (2021)
see publication

Adversarial reverse mapping of equilibrated condensed-phase molecular structures
Marc Stieffenhofer, Michael Wand, Tristan Bereau
Machine Learning: Science and Technology 1, 045014 (2020)
see publication

Kernel-Based Machine Learning for Efficient Simulations of Molecular Liquids
Christoph Scherer, René Scheid, Denis Andrienko, Tristan Bereau
Journal of Chemical Theory and Computation 16 (5), 3194-3204 (2020)
see publication