Reviewer #1 (Public Review):
Summary:
In this paper, the authors had 2 aims:
(1) Measure macaques' aversion to sand and see if its' removal is intentional, as it is likely in an unpleasurable sensation that causes tooth damage.
(2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.
They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.
The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine-grained silicates and that removing it via brushing or washing is intentional.
They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.
High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.
This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.
Strengths:
The field experiment seemed well-designed, and their quantification of physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer who is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.
In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.
I commend their approach in developing a quantitative model to generate predictions to compare to empirical data for their second aim.
This is something others should strive for.
I really appreciated the historical context of this paper in the introduction, and found it very enjoyable and easy to read.
I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.
Weaknesses:
Most of the weaknesses in this paper lie in statistical methods, visualization, and a missing connection to the marginal value theorem and optimal foraging theory.
I think all of these weaknesses are solvable.
The data and code were not submitted. Therefore I was unable to better understand the simulation or to provide useful feedback on the stats, the connection between the two, and its relevance to the broader community.
(1) Statistics:
(a) AIC and outcome distributions
The use of AIC for hierarchical models, and models with different outcome distributions brought up several concerns.
The authors appear to use AIC to help inform which model to use for their primary analyses in Tables S1 and S2. It is unclear which of these models are analyzed in Tables S3 and S4.
AIC should not be used on hierarchical models, and something like WAIC (or DIC which has other caveats) would be more appropriate.
Also, using information criteria on Mixture Models like Negative Binomials (aka Gamma-Poisson) should be done with extreme caution, or not at all, as the values are highly sensitive to the data structure.
Some researchers also say that information criteria should not be used to compare models with different outcome distributions - although this might be slightly less of a concern as all of your models are essentially variations on a Poisson GLM.
Discussion on this can be found in McElreath Statistical Rethinking (Section 12.1.3) and Gelman et al. BDA3 (Chapter 7).
Choosing an outcome distribution, based on your understanding of the data generating process is a better approach than relying on AIC, especially in this context where it can be misleading.
(b) Zeros
I also had some concerns about how zeros were treated in the models.
In lines 217-218, they mentioned that "if a monkey consumed a cucumber slice without brushing or washing it, the zero-second duration was included in both GLMMs."
This zero implies no processing and should not be treated as a length 0 duration of processing.
This suggests to me that a zero-inflated poisson or zero-inflated negative binomial, would be the best choice for modelling the data as it is essentially a 2-step process:<br />
(i) Do they process the cucumber at all?<br />
(ii) If so do they wash or brush, and how is this predicted by rank and treatment?
(2) Absence of Links to Foraging Theory
Optimal cleaning time model: the optimality model was not well described including how it was programmed. Better description and documentation of this model, along with code (Mathematica judging from the plot?) is needed.
There seems to be much conceptual and theoretical overlap with foraging theory models that were not well described - namely the *marginal value theorem (Charnov (1976), Krebs et al. (1974),) and its subsequent advances* (see https://doi.org/10.1016/j.jaa.2016.03.002 and https://doi.org/10.1086/283929 for examples).
In the suggestions, I attached the R code where I replicated their model to show that it is *mathematically identical to the marginal value theorem*. This was not mentioned at all in the text or citations.
This is a well-studied literature since the 1970's and there is a history of studies that compare behavior to an optimality model and fail (or do find) instances where animals conform or diverge with its predictions (https://doi.org/10.1146/annurev.es.15.110184.002515). This link should be highlighted, and interpreting it in that theoretical context will make it more broadly applicable to behavioral ecologists.
The data was subsetted to include instances where there were < 3 monkeys present to avoid confounds of rank, but it is important to know that optimal behavior might vary by individual, and can change in a social context depending on rank (see https://doi.org/10.1016/j.tree.2022.06.010). Discussion of this, and further exploration of it in the data would strengthen the overall contribution of this manuscript to the field, but I understand that the researchers wish to avoid that in this paper for it is a complex topic, which this dataset is uniquely suited to address.
(3) Interpretation and validity of model relative to data
In lines 92-102, they present summary statistics (I think) showing that time spent brushing and washing is consistent with washing or brushing to remove sand.
In the **mitigating tooth wear** section (line 73) and corresponding Figure S1 showing surface sand removed, more detail about how these numbers were acquired, and statistical modelling, is needed.
This is important as uncertainty and measurement error around these metrics are key to the central finding and interpretation of Aim 2 in this paper.
It appears that the researchers simulated the monkey's brushing and washing behaviors (similar to https://doi.org/10.1007/s10071-009-0230-3).
How many researchers simulated monkey behavior and how many times?
What are the repeat points in Figure S1?
What is the number of trials or number of people?
This effect appears stronger for washing than brushing as well - if so, why?
More info about this data, and the uncertainty in this is important, as it is key to the second central claim of this paper.
The estimates of removing between 76% +/- 7 and 93% +/- 4 of sand (visualized in Figure S1), are statistical estimates.
I would find the argument more convincing if after propagating for the uncertainty in handling in sand removal rates, and the corresponding half-saturation constants, if this processing for food is too long, after accounting for diminishing returns held true.<br />
It is very possible that after accounting for uncertainty and variation in handling time and removal rates, the second result may not hold true.
I was not able to convince myself of this via reanalysis as the description of the data in the text was not enough to simulate it myself.
Essentially, this would imply that in Figure 3 the predicted value would have some variation around it (informed by boundary conditions of time being positive, and percents having floors and ceilings) and that a range of predicting cleaning times (optimal give-up times) would be plotted in Figure 3.
This could be accomplished in a Bayesian approach, Or by simply plotting multiple predictions given some confidence interval around, c and h.