# Deep-prior ODEs augment fluorescence imaging with chemical sensors

### Physical model

We set out from the chemical reaction (1), which models the binding process. We consider that the total number of sensors *s*_{tot}(**x**) = *s*(**x**, *t*) + *s*_{b}(**x**, *t*) is constant over time. We also assume that the spatial diffusion of the sensor is negligible. Both assumptions are common for fluorescent sensors^{31,67,68}. The temporal evolution of *s*_{b} is then given by

$$\frac\rmds_b(\bfx,t)\rmdt=-k_bs_b(\bfx,t)+k_fs(\bfx,t)c(\bfx,t)^n_\rmH.$$

(4)

Rewriting (4) in terms of *s*_{b} and dividing it by *s*_{tot} leads to

$$\frac\rmd\tildes_b(\bfx,t){{\rmd}t}=-k_b\tildes_b(\bfx,t)+k_f(1-\tildes_b(\bfx,t))c(\bfx,t)^n_\rmH,$$

(5)

where \(\tildes_b(\bfx,t)=\fracs_b(\bfx,t){s_\rmtot(\bfx)}\) denotes the proportion of bound sensors.

We use *g*(**x**, *t*) to denote the fluorescence received by the imaging setup, which is emitted by the sensors. We model it as

$$g(\bfx,t)=g_0(\bfx)+q_e(\bfx)\tildes_b(\bfx,t),$$

(6)

where *g*_{0}(**x**) and *q*_{e}(**x**) are the fluorescent background and a concentration-to-fluorescence factor, respectively^{31}. These variables indirectly account for several unknown factors such as the quantum yield or the small emissions of the unbound sensor.

Together, Eqs. (5) and (6) relate the CSI concentration to the fluorescence measurements. Our physical model is then

$$\beginarrayrc\mathcalH (\bfx,t;c,g_0,q_e)=g(\bfx,t;\tildes_b,g_0,q_e),\\ \mboxwhere\,\,\tildes_b(\bfx,t;c)\,\mboxsatisfies\,\,\,(5).\endarray$$

(7)

Equation (7) can be understood as the following algorithm. (i) Solve (5) for the proportion \(\tildes_b(\bfx,t;c)=\tildes_b(\bfx,t)\) of bound sensors starting from a given concentration *c*. (ii) Compute the fluorescence \(g(\bfx,t;\tildes_b,g_0,q_e)=g(\bfx,t)\) using (6) given the solution \(\tildes_b\), as well as *g*_{0}(**x**), and *q*_{e}(**x**). The resulting \(\mathcalH(\bfx,t;c,g_0,q_e)\) is the spatiotemporal fluorescence predicted by the model given *c*, *g*_{0}, *q*_{e}.

### Deep spatiotemporal prior

In the main text, we formulated our inverse problem as

$$\beginarrayrclll\left(c^\star ,g_0^\star ,q_e^\star \right)\in \arg \min_c,g_0,q_e\,& \mathcalD\left(\mathcalH(c,g_0,q_e),\,g_m\right)\\ & \hskip -29pt +\mathcalR(c,g_0,q_e).\endarray$$

(8)

Our alternative regularization strategy consists of reparameterizing the distribution of the concentration as the output of a neural network *c*(**x**, *t*) = *f*_{θ}(**x**, **z**(*t*)) with the parameters ** θ**, and the time-dependent latent vector

**z**(

*t*). The minimization problem (8) then becomes

$$\beginarrayrlll\hfill\left(\boldsymbol\theta ^\star ,\bfz^\star ,g_0^\star ,q_e^\star \right) &\in & \arg \min _\boldsymbol\theta ,\bfz,g_0,q_e & \mathcalD\left.\right({\mathcalH}\left(\left(f_\boldsymbol\theta (\bfx,\bfz(t)),g_0,q_e\right),g_m\right)\\ && &+\mathcalR_p(g_0,q_e),\hfill \\ \hfill c^\star (\bfx,t) &= & f_\boldsymbol\theta ^\star (\bfx,\bfz^\star (t)).&\endarray$$

(9)

Note that we apply the deep prior to the concentration alone because the background and the scaling do not have a temporal component; we only regularize these two over space with \(\mathcalR_p\).

The benefits of this method are twofold. First, a spatial prior is enforced by an untrained deep-image prior^{69,70}. Second, we enforce temporal regularity on the sequence by restricting the latent variables to a manifold. A single neural network generates the whole sequence, while the design of the latent vectors encodes the temporal proximity of consecutive frames.

We remark that the resulting CSI concentration is in arbitrary units. This is due to a lack of information such as the conversion efficiency or the quantum yield. If at any point in space or time the concentration can be measured, this information can be readily incorporated into DUSK for automatic calibration. Alternatively, experimental procedures of calibration may be used to recover the exact concentration afterwards^{30,50,51,52,53}.

#### Parametric latent space

We propose a parameterization of deep-image priors that adapts to the fast and slow dynamics that appear in biological signaling. To this end, we represent our latent space with a flexible parametric curve

$$\bfz(t)=\sum _k=0^K-1\bfb_k\varphi \left(\fract\Delta t-k\right).$$

(10)

Here, the latent vector is parameterized by a number *K* of shifted basis functions *φ*( ⋅ ) with coefficients \(\bfb_k\in \mathbbR^L\)^{71}. We chose to use a basis of cubic B-splines with a stepsize *Δ**t*. The coefficients **b**_{k} are directly optimized in place of **z** when solving (9). The number of basis functions (i.e., knots) determines the flexibility of the curve and, in consequence, how much temporal regularity is enforced. The possibility of this intuitive tradeoff is similar to that in more traditional regularization methods such as total variation^{72}.

### Discretization of the physical model

To discretize (5) and (6), we sample \(\tildes_b,c\), and *g* on an equispaced spatial grid with *N*_{x} × *N*_{y} points and at *N*_{t} = *T*/Δ*t* time points with time step Δ*t*. Here, we assume that all the sampled functions are compactly supported in the domain \(\Omega \times \left[0,T\right)\) with \(\Omega \subset \mathbbR^2\). The samples of \(\tildes_b,g\) and *c* are concatenated into the matrices \(\bfS,\bfG,\bfC\in \mathbbR^N\times N_t\) with *N* = *N*_{x}*N*_{y}, respectively. Similarly, the samples of *g*_{0} and *q*_{e} are concatenated into the vectors \(\bfg_0,\bfq_e\in \mathbbR_\ge 0^N\). To solve (5), we use a backward Euler scheme. We then obtain the discrete forward model \(\bfH:\mathbbR^N\times N_t\to \mathbbR^N\times N_t\) defined as

$$\bfH_\bfp(\bfC^\odot n_\rmH)=\bfg_0\bf1_N_t^T+\bfdiag(\bfq_e)\bfS,$$

(11)

where \(\bfp=[\bfg_0^T,\bfq_e^T]^T\in \mathbbR_\ge 0^2N\), the symbol \((\cdot )^\odot n_\rmH\) denotes the Hadamard power (i.e., each matrix element is raised to the power of *n*_{H}), and the entries of **S** are computed recursively

$$s_n,n_t=\frac{s_n,n_t-1+\Delta tk_f(c_n,n_t)^n_\rmH}{1+\Delta t(k_f(c_n,n_t)^n_\rmH+k_b)},$$

(12)

for *n* = 1, …, *N* and *n*_{t} = 1, …, *N*_{t}.

### Problem formulation in the discrete domain

Now that we are equipped with a discretized physical model, we present our variational framework to recover the concentration distribution from measured fluorescence images. In practice, we may acquire the images at a time step Δ*t*_{M} = *D**Δ**t* (\(D\in \mathbbN\)) larger than the one used in (11). To recover the concentration, we aim at solving the minimization problem

$$\beginarrayrcl(\bfC^\star ,\bfp^\star )\in \arg \min _\bfC\in \mathbbR^N\times N_t,\bfp\in \mathbbR^2N&\parallel \! \! \bfH_\bfp(\bfC^{\odot n_\rmH})-\bfG\parallel _1 \hfill\\ &+\tau _p\mathcalR_\bfp(\bfp)+\mathcalR_\bfC(\bfC),\hfill\endarray$$

(13)

where \(\bfM_D\in \mathbbR^N_t\times (N_t/D)\) encodes the downsampling operation and the matrix \(\bfG\in \mathbbR^N\times (N_t/D)\) denotes the measurements. The *ℓ*_{1}-norm is the data-fidelity term, which we found robust to the noise present in the measurements. The regularization terms \(\mathcalR_\bfp:\mathbbR^2N\to \mathbbR_\ge 0\) and \({\mathcalR}_\bfC:\mathbbR^N\times N_t\to \mathbbR_\ge 0\) enforce some prior knowledge about the parameters **p** and the concentration, respectively. In this work, \({{{\mathcalR}}}_\bfp\) only regularizes **q**_{e} with the total variation and a small trade-off parameter *τ*_{p} > 0^{72}.

### Deep spatiotemporal priors in the discrete domain

To mitigate the illposedness of (13), we aim at enforcing some regularity along space and time on the concentration **C**. To that end, we reconstruct the concentration using a deep spatiotemporal prior^{40,41,42}. In the framework of deep spatiotemporal prior, the concentration at time \(t\in \left[0,T\right)\) is the output **c**(** θ**,

*t*) =

**f**

_{θ}(

**z**(

*t*)) of a convolutional neural network \(\bff_\boldsymbol\theta :\mathbbR^L\to \mathbbR^N\) parameterized by \(\boldsymbol\theta \in \mathbbR^P\). By design, the concentration is sampled on an equispaced spatial grid with

*N*points, but the time can be sampled arbitrarily.

To recover the concentration, we then optimize the minimization problem

$$\beginarrayrcl(\boldsymbol\theta ^*,\bfp^*)\in \arg \min _{\boldsymbol\theta \in \mathbbR^P,\bfp\in \mathbbR_\ge 0^2N}&\parallel \!\! \bfH_\bfp(\bfC(\boldsymbol\theta ,\Delta t)^{\odot n_\rmH}){\bfM}_D-\bfG\parallel _1\\ & \hskip -70pt+\tau _p{{{\mathcalR}}}_{{\bfp}}({{\bfp}})\endarray$$

(14)

with

$$\bfC(\boldsymbol\theta ,\Delta t)=[\bfc(\boldsymbol\theta ,0),\bfc(\boldsymbol\theta ,\Delta t),\ldots,\bfc({\boldsymbol\theta },\Delta t(N_t-1))].$$

(15)

We represent our latent space with a parametric curve (see (10)) and optimize the coefficients **b**_{k} along the coefficients ** θ**. With a slight abuse of notation, the parameters

**will now encompass the coefficients \({\{{{\bfb}}_k\in \mathbbR^L\}}_k=0^K-1\) as well.**

*θ*### Architecture and optimization

The architecture of **f**_{θ} is detailed in Table 1. The network layers are applied sequentially from the top to the bottom of the table, starting with an input of shape (1 × 3). It is noteworthy that the sequence of concentration images is represented with an underparametrized neural network. For instance, if (*N*_{x} × *N*_{y} × *N*_{t}) = (96 × 68 × 244) and *K* = 30, *L* = 3, there are 149, 925 + 90 parameters to optimize, which amounts to about 10% of the total number of recovered pixels. We do not specify validation or training sets because our framework is training-free, i.e., the network remains untrained.

We optimize all the variables in (14) using the AMSGrad algorithm^{73} with a constant learning rate of 0.01, tolerance *ϵ*_{tol} = 10^{−12}, and a maximum number of iterations of 10^{4}. We set the tradeoff parameter *τ*_{p} = 10^{−5}. The number of knots *K* is optimized by grid search either by maximizing a metric for simulated data or by visual inspection for real data. We enforce that \({(\bfg_0)}_k > 0\) and \({(\bfq_e)}_k > 10^-6\) by projecting violating values to the respective bound after each gradient update.

For *k* = 1, …, *N*, we initialize \({({{\bfq}}_e)}_k=Q\) and \({({{\bfg}}_0)}_k=\min ((\bfG)_k,\cdot )\), i.e., with the minimal value reached during the measurements for each pixel. In simulated data, *Q* was chosen among the values [1, 10, 100] to maximize the regressed signal-to-noise ratio (RSNR, see (18)). In real data, we set to *Q* = 1 for all sensors but jGCaMP7f. We observed that the DUSK results on jGCaMP7f were insufficiently fitting the measurements. This is due to the neural network inability to reach very large values of concentration that are imputable to the high Hill coefficient of jGCaMP7f (*n*_{H} = 3.1).

For numerical stability, we chose to include the effect of the Hill coefficient in **f**_{θ} such that \({\bfc}{({{\boldsymbol\theta }},t)}^{n_{{{\rmH}}}}={\bff}_{{{\boldsymbol\theta }}}({{\bfz}}(t))\). Unless specified otherwise, the estimates displayed are the *n*_{H}th root of the output of **f**_{θ}.

In Table 2, we present the kinetic coefficients. They were measured experimentally for the sensors in the datasets that we use from ref. ^{21}. These are the only hyper-parameters to set in our framework other than the number of knots for the latent spline.

We set the time window \(\left[0,T\right)\) to avoid cutting the signal of interest (i.e., during a burst of APs). In addition, the memory cost could limit the duration of the sequence. However, it was not detrimental to the quality of reconstruction, as past values have little impact on future values after some time. Note that the principle of DUSK is not bound to a convolutional architecture with fixed spatial discretization. There are alternatives with continuous spatial representation^{74}. Similarly, we adopt the backward Euler scheme for temporal discretization for its efficiency and simplicity. Other adaptive schemes are possible too.

All experiments were run on a Linux workstation with an Intel Xeon Gold 6226R CPU (2.90GHz), 4 × 16 Gb, and a GPU NVIDIA RTX A6000 (48 Gb).

### Temporal regularity

In this work, the basis functions \(\\varphi (\cdot -t_k)\_k=0^K-1\) in (10) are cubic B-splines. In Supplementary Note 10, we show the effect of our temporal regularization on the recovered concentration distribution. By increasing the number of knots *K*, we can see that the temporal traces over the ROI show fewer fluctuations in both the predicted fluorescence and concentration traces. This behavior corroborates well with the tradeoff parameters used in classical spatial regularization.

### Simulation pipeline

Using a space colonization method^{75}, we designed an astrocyte-like sample fully contained in the image domain \(\Omega \subset \mathbbR^2\) (see Fig. 2B, bottom). We used a 2D reaction-diffusion equation to simulate the propagation of the CSI, where the diffusion coefficient in the branches (*Ω*_{branches} ⊂ *Ω*) is higher than in the background. The concentration thus propagates rapidly along the branches, and slower elsewhere. Here, the signal contains two distinct traveling waves with little dispersion: One wave in the branches with higher velocity and amplitude, the other one in the background with lower velocity and amplitude. We thus solve the partial differential equation

$$\frac\partial c(\bfx,t)\partial t= \boldsymbol\nabla _\bfx\cdot (D(\bfx){\boldsymbol\nabla }_\bfxc(\bfx,t))+k_rp(\bfx,t)c(\bfx,t)\\ -k_d({\bfx}) c ({{\bfx}},t)+s({{\bfx}},t),\\ \frac{\partial p({{\bfx}},t)}\partial t= -k_rp({{\bfx}},t) c ({{\bfx}},t),$$

(16)

where *∇*_{x} and *∇*_{x} ⋅ are the spatial gradients and spatial divergence, respectively. The function \(D:\Omega \to \mathbbR_\ge 0\) is the spatially varying diffusion coefficient. The quantity \(p:\Omega \times \mathbbR_\ge 0\to \mathbbR_\ge 0\) is a precursor in the autocatalytic reaction \(\rmp+\rmc\to ^{\rmk_r}2\,\rmc\) with rate *k*_{r}. The chemical species is degraded at a rate \(k_d:\Omega \to {\mathbbR}_\ge 0\). The spatially-varying decaying rate allows us to obtain different traveling waves in the branches and background. The source term *s*(**x**, *t*) models a stress signal that is compactly supported in both space and time. More precisely, *s* is only non-zeros and of value *s*_{0} = 1 in the two first frames and over a localized area at the center of the image (i.e. , in the soma of the astrocyte). We solve (16) with a backward Euler scheme and a finite element solver^{76} for the temporal and spatial discretization, respectively. To ensure numerical stability, we simulate with the time step *Δ**t*_{Euler} = 0.5Δ*t* and downsample the computed concentration by two to get **C**_{GT}.

The fluorescence images are then simulated using (11) with constant parameters \({({{{\bfg}}}_0)}_k=0.25\) and \({({{{\bfq}}}_e)}_k=10\) for *k* ∈ [1, …, *N*]. The sensors kinetics *k*_{f}, *k*_{b} correspond to the ones of the jGCaMP8s (Table 2) but the Hill coefficient is set to *n*_{H} = 1. The recorded measurement images \(\bfG\in {{\mathbbR}}^N\times (N_t/D)\) are then generated according to

$${\bfG}={\bfH}_{{{\bfp}}}\left(\bfC_\rmGT^{\odot n_{{{\rmH}}}}\right)+\bfN,$$

(17)

where each (*k*, *l*)th entry of \(\bfN\in {{\mathbbR}}^N\times (N_t/D)\) is a realization of a signal-dependent Gaussian random variable \(\mathcalN(0,\sigma _k,l^2)\). We emulate Poisson noise by setting \(\sigma _k,l^2={({{{\bfH}}}_{{{\bfp}}}\left(\bfC_{\rmGT}^{\odot n_{{{\rmH}}}}\right))}_k,l/B\) with photon budget *B* = 25.

The size of the astrocyte-like sample is 40 × 40 μm^{2}, discretized with (128 × 128) pixels (corresponding to a pixel length of ~0.3 μm). We acquired 128 frames at an acquisition rate of 200 Hz (frame period of 5 ms). We set the diffusion coefficient to *D*(**x**) = 3.91 ×= 10^{−4} μm^{2}/s for **x** ∈ *Ω*_{branches}, and to *D*(**x**) = 9.77 × 10^{−7} μm^{2}/s otherwise. The reaction rate is set to *k*_{r} = 1, and the decay rate is set to *k*_{d}(**x**) = 0.03 for **x** ∈ *Ω*_{branches}, and to *k*_{d}(**x**) = 0.3 otherwise. We set the initial conditions to *c*(**x**, 0) = 0 and *p*(**x**, 0) = 1 for **x** ∈ *Ω*_{branches}, and to *p*(**x**, 0) = 0.75 otherwise. We choose the Dirichlet boundary conditions (∂*Ω* = 0). The speed of the traveling wave then reaches about 31.25 μm/s, which is consistent with values measured in astrocytes^{77,78}.

### Quantitative evaluation of the performance

To measure the performance of reconstruction, we use the regressed signal-to-noise ratio (RSNR) over the whole image sequence,

$$\rmRSNR(\bfC,\hat\bfC)=\max _{a,b\in {\mathbbR}}\,20\log _10\left(\frac{\parallel \!\! {{\bfC}}\parallel _F}{\parallel \!\! a\hat{{{\bfC}}}+b{\bf1}_N,N_t-{{\bfC}}\parallel _F}\right),$$

(18)

where **C** and \(\hat{{{\bfC}}}\) are the ground truth and the reconstructed concentration, respectively. The RSNR adapts the standard signal-to-noise ratio—which compares the magnitude of the error of the reconstruction to the magnitude of the ground truth—to account for possible shifts and scalings of the signal. It is measured in a logarithmic scale and bigger values indicate better performance.

To quantify the detectability of a peak (i.e., fluorescence or calcium rise) within a spatial ROI, we compute a sensitivity index defined as^{79}

$$d^\prime (\mu _\rms,\mu _\rmbg,\sigma _\rms,\sigma _\rmbg)=\frac{| \mu _\rms-\mu _\rmbg| }{\sqrt{\frac{\sigma _{{{\rms}}}^2+\sigma _{{{\rmbg}}}^2}2}},$$

(19)

where *μ*_{s}, *μ*_{bg} (*σ*_{s}, *σ*_{bg}) denote the median value (median absolute deviation) at the peak occurrence and in the background (e.g., before the rise), respectively.

### Remarks about the physical model

While originally intended as an integer value, note that the Hill coefficient *n*_{H} is now generally understood as an empirical indicator of cooperative binding. In this interpretation, the coefficients often take non-integer values, which can be indicative of additional (hidden) sequential reactions^{34} or dependent binding sites^{31}. The coefficients *k*_{f}, *k*_{b}, *n*_{H} in (1)—on which Hill’s equation is based—are the ones that are usually measured experimentally when designing (or using) new sensors. For example, the values in Table 2 were measured for the same GCaMP8 dataset that we use in this article.

All this suggests that the behavior of some chemical sensors can be approximated by the phenomenological reaction model (1) with the three experimental parameters. We tested the quality of our reconstruction using mixing and unmixing experiments, where the behavior of the underlying concentration is “known” (see Supplementary Note 3). In addition, we offer further comments on the suitability of model (1) in Supplementary Notes 7 and 8. When available, more complex chemical models can be plugged into the framework seamlessly by changing (5) for another set of ODEs. This might yield better estimates but requires (1) working out the model and (2) coming up with ways to measure what are often (too) many parameters (see Supplementary Note 7).

While the fluorescence of most sensors (e.g., GCaMP) increases upon binding to the corresponding CSI as per (6), a few sensors see their fluorescence decrease instead. More specifically, the light they emit is proportional to the concentration of the unbound sensor. Such sensors are readily compatible with our framework by simply replacing \(\tildes_b\) for *s* in (6). One example is the H_{2}O_{2} sensors used to study signaling in plants, which we explored in^{33}.

### Statistics and reproducibility

We applied our computational framework to experimental video data of calcium sensors collected for^{21}. These are available at^{80}. The videos that we analyzed were randomly selected from the 1.4 TB dataset for jGCaMP8s, jGCaMP8m, and jGCaMP7f. No other data were excluded. Sample sizes are specified in the manuscript for each experiment. We did not use blinding or randomization because there were no experimental groups.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link