Digital Twin for Chemical Science: a case study on water interactions on the Ag(111) surface
From Theory Twin to Digital Twin
The primary systems of interest in DTCS v.01 involve reactive dynamics in homogeneous systems as well as on surfaces that are spectroscopically measurable. Using ambient-pressure X-ray photoelectron spectroscopy (APXPS) as an example, given only the experimental conditions—such as temperature, pressure and the Miller index of the sample surface—constructing a spectrum ab initio, or a Theory Twin, remains a nontrivial challenge. Therefore, specialized electronic structure theory has been developed to capture the orbital relaxation effects upon the ejection of an excited core hole22,23,24,25 in the final state. As the first step in constructing the APXPS Theory Twin, a ΔSCF approach is used to calculate the difference between the final state and the initial state (ground state) for the core electron binding energy (CEBE) of a given chemical species. Here, ‘chemical species’ refers to surface adsorbates that can be spectroscopically resolved26. Obtaining CEBEs from electronic structure theory is frequently the rate-limiting step. To achieve an on-the-fly feedback loop, precomputed CEBEs are assigned to an enumerated list of potentially observable chemical species. After this first step, we arrive at a stick plot that matches the individual chemical species to its specific location on the spectra (Fig. 2a). As the second step in constructing the spectra ab initio, we aim to predict the ratio of the species concentration reflected as the peak height and predict how the ratio changes with respect to time (Fig. 2b). We explicitly propose a CRN connecting the chemical species. This step requires an expert’s judgment and a careful balancing of CRN completeness and representativeness, as recognized and discussed thoroughly in the literature26,27,28,29. The transition state barriers connecting the reactants, intermediates and products are exponentially expanded according to the Arrhenius equation and converted into a rate constant, k, for the individual reactions in the CRN. Temperature dependence of k follows the roto-translational corrections for surface adsobates30,31. Pressure dependence of k is reflected by the prefactor scaling of collision frequency as well as the pressure-dependent entropy term30,31. Lastly, experimentation specializations are modularly integrated to reflect the diversity of measuring instrumentation (Fig. 2c). For example, a full width at half maximum of 1.3 eV and Gaussian broadening were used to mirror a prototypical APXPS measurement (such as Berkeley Lab’s Advanced Light Source beamline BL 9.3.2), which serves as the ‘Physical Twin’ in this study. Although not applicable to the current work, a detailed discussion on depth profile intensity as a function of photon energy is available in our previous publication32, which is more relevant to a solid–liquid interface than to the present solid–gas interface. The concentration-scaled Gaussian peaks are summed to produce the final envelope signal, which can be directly compared with the Physical Twin. Similar computational processes for predicting experimental observables (turnover frequency33, Tafel slope34 and so on) have been previously referred to as in silico, ab initio lab.

This Theory Twin is a generalizable computational chemistry workflow that connects the parameters to be explored in the experiments (for example, temperature, pressure and sample surface’s Miller index) to a simulated spectrum. The first step involves computing the CEBE using electronic structure theory methods. The concentrations as a function of time obtained by solving the CRN are used to compute the peak intensity of the spectra. Finally, the experimental instrumentations are taken into account to assign shapes to the individual peaks and generate the final simulated spectra. Four hypothetical example chemical species i, j, l and m, are represented with purple, blue, black and red colors, respectively. In the left box, the equation computes the CEBE by subtracting the ground-state energy from the excited-state energy. In the middle box, k represents the rate constants, where the subscripts represent a specific rate constant of conversion. For example, kij represents the rate constant converting chemical species i to j. In the middle and right boxes, t1, t2 and t3 represent three different time steps where simulated spectra are constructed as well as measured in experiments, respectively. © The Regents of the University of California, Lawrence Berkeley National Laboratory.
However, it is important to note that, with the highest level of fidelity of an in silico, ab initio lab (for example, with precision enhancement from hybrid functionals35,36, roto-translational corrections for surface adsorbates31, and a superiorly trained chemical intuition), there is no guarantee that one can always predict the experimental observables correctly using the Theory Twin alone. A challenging problem to address is to inversely solve for the CRN, the Gibbs free energy and its associated rate constants. Therefore, the key difference between in silico, ab initio lab and the present Digital Twin approach is the bidirectional interaction between the Theory Twin and the Physical Twin. On-the-fly analysis and updates are in demand, particularly when Scientific User Facilities, such as synchrotron and X-ray free electron laser beamtime, are scarce and tend to cluster in a matter of a couple of hours to days. In the Digital Twin approach, as operationalized by the DTCS v0.1 software platform (Methods), we incorporate the AI acceleration schemes with which we constructed the inverse problem solver powered by our tailored Gaussian process and basin hopping algorithms to find the match between experimental results and theoretical simulations, ensuring a nondegenerate solution.
Application example
A ubiquitous scenario in corrosion, electrocatalysis or battery systems concerns the interaction between water molecules and a metal surface17,30. Being able to address the atomic-scale dynamics upon water initiation on a metal surface can provide insights into the downstream chemical reactions. The Ag/H2O system30 is an optimal example demonstrating the results and capabilities of DTCS. Rate constants were previously computed by density functional theory, and the CRN was experimentally validated, thereby serving as a resource for benchmarking. Leveraging this system as an example, we provide a walkthrough of the forward problem calling out the software modules designed in DTCS v.01 (Methods).
Using module dtcs.spec, we first define a set of oxygen-containing chemical species involved in this system, namely, gaseous water (H2Og), adsorbed water (H2O*), adsorbed oxygen (O*), oxygen gas (O2g), hydroxide (OH*), hydrogen-bonded water (such as O–H2O* and OH–H2O*) and multilayer water (multiH2O* or OH–H2O–H2O*). Each of these chemical species has unique attributes. In the bulk CRN solver, these attributes are reflected by the binding energy location information, whereas in the surface CRN, both the binding energy location and site information can be specified. Following dtcs.spec, we define the translational rules connecting the above chemical species with precomputed rate constants. A complete CRN includes diffusion, adsorption, desorption and reaction steps that can be found in ‘Detailed explanation for the bulk CRN’ section in the Supplementary Information. The user shall ensure the mass balance in both the bulk and surface CRN solvers. In addition, site balance will be enforced when defining the translational rules in the surface CRN solver, so that we can explicitly track the available sites. The definitions of translational rules are followed by the boundary conditions. This can be an estimate of the initial concentration of one or multiple species. If undefined, the code assumes that the species concentration is zero when t = 0. In the present example, our sample is covered by a trace amount of surface oxygen (O*), and we initiate the system with an inlet of H2Og. Figure 3 demonstrates the standardized syntax for the bulk and surface CRNs, and this information can be encoded as an attribute for the system when scaling up DTCS as a central platform for chemical characterization. The writeup of dtcs.spec can serve as a standardized format for reporting mechanisms that can be cross referenced.

From the left to the right, a standardized syntax describing the mechanism as an attribute of a specific chemical system has been established as the input for dtcs.spec. The dynamic concentration profile solutions for the bulk versus surface CRN are summarized under dtcs.sim. Concentration is depicted in unit of mono layers (ML) for bulk CRN and sites for surface CRN. The comparison between the Theory Twin and Physical Twin is shown under dtcs.twin. Together, these three steps (dtcs.spec, dtcs.sim and dtcs.twin) complete the forward problem in DTCS. The magenta, pink and orange boxes represent dtcs.spec, dtcs.sim and dtcs.twin, respectively, using the same color scheme as Fig. 1 and Supplementary Fig. 1. In the left box, blue, green and red represent functions/modules, keywords/constants and labels, respectively. In the middle and right box, purple, blue, black, red and cyan represent the chemical species H2O_multi, H2O*, H2O_OH_hb, OH* combined and O* combined, respectively. Experimental (Exp.) spectra subfigure adapted with permission from ref. 30. © 2019 American Chemical Society.
Source data
With module dtcs.sim, we proceed to compare the results of the bulk CRN and surface CRN solvers. In principle, one should not expect dramatically different results from the two solvers because they are simulating the same experiment. However, the surface CRN results are deemed as a more realistic reflection of the interfacial conditions in the experiments than the bulk CRN results, which are better suited for simulating reactor and combustion experiments. Bulk CRN simulators can also serve as an approximation for an averaged-out surface system. The interfacial reaction system (such as heterogeneous catalysis) is, in principle, not well mixed as described in bulk CRN, where chemical transitions can happen only when two chemical species are at the adjacent sites with a quantum mechanically derived probability. Therefore, the much more sophisticated surface CRN algorithm (Supplementary Fig. 1 and ‘Detailed explanation for the surface CRN’ section in the Supplementary Information) is considered as a realistic reflection of this heterogeneous phenomena. We thus compare the results from the surface CRN with those obtained from the bulk CRN. As a result, shown in Fig. 3 (middle), the bulk CRN solver outputs smooth, definitive dynamic concentration profiles of the chemical species involved. Meanwhile, the surface CRN solver generates a stochastic dynamic concentration profile that can be used to estimate the magnitude of uncertainty due to the randomness in the Markov chain. We also observed that the surface CRN required more time than the bulk CRN to reach a similar level of equilibrium. This observation is to be expected, as surface species will spend more time diffusing around the given interfacial map via random walks before reactions occur.
Lastly, with module dtcs.twin, we complete the steps for the forward problem. We found that both the bulk CRN and the surface CRN solutions are sufficiently comparable to the experimental results, with the surface CRN showing a slightly better match. In both the bulk and the surface CRN results, OH* is the dominant species detected in the Theory Twin and Physical Twin, followed by H2O*, O–H2O* and OH–H2O*, and multiH2O* or OH–H2O–H2O*. We also found that the O* peak is not present in either the bulk or surface CRN spectra, which is consistent with the Physical Twin. This is because the O* + H2Og → OH* + OH* reaction is one of the fastest reactions in the CRN, and all initially present O* is rapidly consumed.
Inverse solution for on-the-fly analysis
Given an observed spectrum or multiple spectra, how can one obtain information regarding the kinetic terms of interest, such as the diffusion constants; adsorption, reaction and desorption rates; and the interaction strength between a chemical species and the surface? Furthermore, because multiple numerically acceptable fits exist for one spectrum, how many different spectral measurements must be conducted to gain unambiguous, physically correct insight into a given reaction system?
The dtcs.optim module facilitates the connection and feedback between the Theory Twin and the Physical Twin, making it a crucial component in constructing a Digital Twin. The dtcs.optim output functions as a real-time AI navigator, guiding the exploration of the phase space. To test the module’s efficacy, we provided DTCS with a series of well-controlled, correlated spectra, as shown in Fig. 4. For all tests, we allowed 1,000 iterations for both Gaussian process and basin hopping to inversely solve for the ground truth. We reported the timestep (in hours, minutes and seconds) for both algorithms in Fig. 4. The middle column shows the top result from both algorithms, while the right column reports the acceptable result with a relative error cutoff of 0.5 (a unitless term, see loss function definition in ‘Detailed explanation for basin hopping’ and ‘Detailed explanation for Gaussian process’ sections in the Supplementary Information) to assess the rate constant sensitivity. In all tests, the ground truth, shown in green, constitutes the quantum mechanics data, which were used to generate the target spectra in the leftmost column.

a–i shows testing results of dtcs.optim with 1, 2 and 6 randomly selected goal spectra shown in the first (a, d and g), second (b, e and h) and third (c, f and i) rows, respectively. The leftmost column (a–c) displays the goal spectra. The middle column (d–f) presents the top results and the runtime for 1,000 iterations using Gaussian process and basin hopping, which successfully resulted in an accuracy convergence between the basin hopping and quantum mechanics datasets. The rightmost column (g–i) illustrates the sensitivity of the basin hopping solution with an imposed error cutoff. The number of points, minima, maxima, center, bounds of box, whiskers of the quartile box plot used in g–i can be found in ‘Quartile box plot statistics’ section in the Supplementary Information.
Source data
In the first test (first row of Fig. 4), we provided DTCS with a randomly selected, equilibrated spectrum obtained at 0.1 torr and 298 K (Fig. 4a). Neither algorithm provided an accurate answer (Fig. 4d). This motivated us to supply dtcs.optim with additional data points, such as the isothermal spectra obtained at 0.03 torr and 0.1 torr shown in the second row (Fig. 4b) and, eventually, the six spectra shown in the third row (Fig. 4c). In all three tests (the three rows in Fig. 4), the CRN retains the same number of reactions as well as connectivity, whereas the pressure variance is built into a proportional change in collision frequency and entropic term obeying the ideal gas law30. The rate constants for reactions within the CRN pertaining to the desorption or adsorption of gas-phase species (such as H2Og → H2O*, H2O* → H2Og, O* + H2Og → OH* + OH* and so on) were expressed as a pressure-dependent corrected term, whereas the rate constants for reactions involving only surface species (such as OH-H2O* → OH* + H2O* and so on) were unchanged.
The differences between Gaussian process or basin hopping and quantum mechanics (Fig. 4d) indicate that experimentalists need to collect additional data to improve accuracy and reduce degeneracy. This on-the-fly feedback offers critical insights for navigating the Physical Twin. The second-row results (Fig. 4e) showed improved accuracy, especially when basin hopping was applied, while the third-row results (Fig. 4f) demonstrated precise agreement between the basin hopping and quantum mechanics datasets. Consequently, we proceeded with the basin hopping dataset in the third column (Fig. 4g–i). Despite the top value showing precise agreement, it is common for certain spectra to have multiple fits in experiments. Thus, understanding degeneracy is an added challenge that can be addressed in the DTCS inverse problem solver with the supply of additional experimental data. We observed a spreading of acceptable values reported for the CRN, indicating a potential degeneracy. Interestingly, this degeneracy can be reduced with additional cross-correlated spectra, leading to a uniquely accurate answer, as shown in the third-row results (Fig. 4i). Taken together, these results indicate a satisfactory stopping condition to conclude the experiment, and the resulting experimentally verified CRN embedded in dtcs.spec can serve as standardized mechanistic insights ready for publication, versioning and flexible dissemination.
link
