Reviewer #1 (Public Review):
In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity.
The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling.
Regarding the unclarity of the sampling mechanism, the authors state that within-assembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation.
Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm.
Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models.
There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these.