Reviewer #1 (Public review):
This work derives a general theory of optimal gain modulation in neural populations. It demonstrates that population homeostasis is a consequence of optimal modulation for information maximization with noisy neurons. The developed theory is then applied to the distributed distributional code (DDC) model of the primary visual cortex to demonstrate that homeostatic DDCs can account for stimulus-specific adaptation.
What I consider to be the most important contribution of this work is the unification of efficient information transmission in neural populations with population homeostasis. The former is an established theoretical framework, and the latter is a well-known empirical phenomenon - the relationship between them has never been fully clarified. I consider this work to be an interesting and relevant step in that direction.
The theory proposed in the paper is rigorous and the analysis is thorough. The manuscript begins with a general mathematical setting to identify normative solutions to the problem of information maximization. It then gradually builds towards questions about approximate solutions, neural implementation and plausibility of these solutions, applications of the theory to specific models of neural computation (DDC), and finally comparisons to experimental data in V1. Such a connection of different levels of abstraction is an obvious strength of this work.
Overall I find this contribution interesting and assess it positively. At the same time, I have three major points of criticism, which I believe the authors should address. I list them below, followed by a number of more specific comments and feedback.
Major comments:
(1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as "independent of the computational goal" (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework "independent of the computational goal". Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.
(2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.
(3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular:
A number of recent studies propose normative criteria for gain modulation in populations:
- Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems<br /> - Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.<br /> - Młynarski, W. and Tkačik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology<br /> - Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications<br /> - The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:<br /> - Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology<br /> - Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),
More detailed comments and feedback:
(1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it's worth speculating about.
(2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).
(3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.
(4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?
(5) Page 6 - "We did this in such a way that, for all ν, the correlation matrices, ρ(ν), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex." This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?