Reviewer #2 (Public Review):
The authors study M1 cortical recordings in two non-human primates performing straight delayed center-out reaches to one of 8 peripheral targets. They build a model for the data with the goal of investigating the interplay of inferred external inputs and recurrent synaptic connectivity and their contributions to the encoding of preferred movement direction during movement preparation and execution epochs. The model assumes neurons encode movement direction via a cosine tuning that can be different during preparation and execution epochs. As a result, each type of neuron in the model is described with four main properties: their preferred direction in the cosine tuning during preparation (denoted by θ_A) and execution (denoted by θ_B) epochs, and the strength of their encoding of the movement direction during the preparation (denoted by η_A) and execution (denoted by η_B) epochs. The authors assume that a recurrent network that can have different inputs during the preparation and execution epochs has generated the activity in the neurons. In the model, these inputs can both be internal to the network or external. The authors fit the model to real data by optimizing a loss that combines, via a hyperparameter α, the reconstruction of the cosine tunings with a cost to discourage/encourage the use of external inputs to explain the data. They study the solutions that would be obtained for various values of α. The authors conclude that during the preparatory epoch, external inputs seem to be more important for reproducing the neuron's cosine tunings to movement directions, whereas during movement execution external inputs seem to be untuned to movement direction, with the movement direction rather being encoded in the direction-specific recurrent connections in the network.
Major:
1) Fundamentally, without actually simultaneously recording the activity of upstream regions, it should not be possible to rule out that the seemingly recurrent connections in the M1 activity are actually due to external inputs to M1. I think it should be acknowledged in the discussion that inferred external inputs here are dependent on assumptions of the model and provide hypotheses to be validated in future experiments that actually record from upstream regions. To convey with an example why I think it is critical to simultaneously record from upstream regions to confirm these conclusions, consider two alternative scenarios: I) The recorded neurons in M1 have some recurrent connections that generate a pattern of activity that is based on the modeling seems to be recurrent. II) The exact same activity has been recorded from the same M1 neurons, but these neurons have absolutely no recurrent connections themselves, and are rather activated via purely feed-forward connections from some upstream region; that upstream region has recurrent connections and is generating the recurrent-like activity that is later echoed in M1. These two scenarios can produce the exact same M1 data, so they should not be distinguishable purely based on the M1 data. To distinguish them, one would need to simultaneously record from upstream regions to see if the same recurrent-like patterns that are seen in M1 were already generated in an upstream region or not. I think acknowledging this major limitation and discussing the need to eventually confirm the conclusions of this modeling study with actual simultaneous recordings from upstream regions is critical.
2) The ring network model used in this work implicitly relies on the assumption that cosine tuning models are good representations of the recorded M1 neuronal activity. However, this assumption is not quantitatively validated in the data. Given that all conclusions depend on this, it would be important to provide some goodness of fit measure for the cosine tuning models to quantify how well the neurons' directional preferences are explained by cosine tunings. For example, reporting a histogram of the cosine tuning fit error over all neurons in Fig 2 would be helpful (currently example fits are shown only for a few neurons in Fig. 2 (a), (b), and Figure S6(b)). This would help quantitatively justify the modeling choice.
3) The authors explain that the two-cylinder model that they use has "distinct but correlated" maps A and B during the preparation and movement. This is hard to see in the formulation. It would be helpful if the authors could expand in the Results on what they mean by "correlation" between the maps and which part of the model enforces the correlation.
4) The authors note that a key innovation in the model formulation here is the addition of participation strengths parameters (η_A, η_B) to prior two-cylinder models to represent the degree of neuron's participation in the encoding of the circular variable in either map. The authors state that this is critical for explaining the cosine tunings well: "We have discussed how the presence of this dimension is key to having tuning curves whose shape resembles the one computed from data, and decreases the level of orthogonality between the subspaces dedicated to the preparatory and movement-related activity". However, I am not sure where this is discussed. To me, it seems like to show that an additional parameter is necessary to explain the data well, one would need to compare fit to data between the model with that parameter and a model without that parameter. I don't think such a comparison was provided in the paper. It is important to show such a comparison to quantitatively show the benefit of the novel element of the model.
5) The model parameters are fitted by minimizing a total cost that is a weighted average of two costs as E_tot = α E_rec + E_ext, with the hyperparameter α determining how the two costs are combined. The selection of α is key in determining how much the model relies on external inputs to explain the cosine tunings in the data. As such, the conclusions of the paper rely on a clear justification of the selection of α and a clear discussion of its effect. Otherwise, all conclusions can be arbitrary confounds of this selection and thus unreliable. Most importantly, I think there should be a quantitative fit to data measure that is reported for different scenarios to allow comparison between them (also see comment 2). For example, when arguing that α should be "chosen so that the two terms have equal magnitude after minimization", this would be convincing if somehow that selection results in a better fit to the neural data compared with other values of α. If all such selections of α have a similar fit to neural data, then how can the authors argue that some are more appropriate than others? This is critical since small changes in alpha can lead to completely different conclusions (Fig. 6, see my next two comments).
6) The authors seem to select alpha based on the following: "The hyperparameter α was chosen so that the two terms have equal magnitude after minimization (see Fig. S4 for details)". Why is this the appropriate choice? The authors explain that this will lead to the behavior of the model being close to the "bifurcation surface". But why is that the appropriate choice? Does it result in a better fit to neural data compared with other choices of α? It is critical to clarify and justify as again all conclusions hinge on this choice.
7) Fig 6 shows example solutions for 2 close values of α, and how even slight changes in the selection of α can change the conclusions. In Fig. 6 (d-e-f), α is chosen as the default approach such that the two terms E_rec and E_ext have equal magnitude. Here, as the authors note, during movement execution tuned external inputs are zero. In contrast, in Fig. 6 (g-h-i), α is chosen so that the E_rec term has a "slightly larger weight" than the E_ext term so that there is less penalty for using large external inputs. This leads to a different conclusion whereby "a small input tuned to θ_B is present during movement execution". Is one value of α a better fit to neural data? Otherwise, how do the authors justify key conclusions such as the following, which seems to be based on the first choice of α shown in Fig. 6 (d-e-f): "...observed patterns of covariance are shaped by external inputs that are tuned to neurons' preferred directions during movement preparation, and they are dominated by strong direction-specific recurrent connectivity during movement execution".
8) It would be informative to see the extreme case of very large and very small α. For example, if α is very large such that external inputs are practically not penalized, would the model rely purely on external inputs (rather than recurrent inputs) to explain the tuning curves? This would be an example of the hypothetical scenario mentioned in my first comment. Would this result in a worse fit to neural data?
9) The authors argue in the discussion that "the addition of an external input strength minimization constraint breaks the degeneracy of the space of solutions, leading to a solution where synaptic couplings depend on the tuning properties of the pre- and post-synaptic neurons, in such a way that in the absence of a tuned input, neural activity is localized in map B". In other words, the use of the E_ext term, apparently reduces "degeneracy" of the solution. This was not clear to me and I'm not sure where it is explained. This is also related to α because if alpha goes toward very large values, it would be like the E_ext term is removed, so it seems like the authors are saying that the solution becomes degenerate if alpha grows very large. This should be clarified.
10) How do the authors justify setting Φ_A = Φ_B in equation (5)? In other words, how is the last assumption in the following sentence justified: "To model the data, we assumed that the neurons are responding both to recurrent inputs and to fluctuating external inputs that can be either homogeneous or tuned to θ_A; θ_B, with a peak at constant location Φ_A = Φ_B ≡ Φ". Does this mean that the preferred direction for a given neuron is the same during preparation and movement epochs? If so, how is this consistent with the not-so-high correlation between the preferred directions of the two epochs shown in Fig. 2 c, which is reported to have a circular correlation coefficient of 0.4?