On 2020-06-09 12:13:18, user Alex Mielke wrote:
Review<br />
While the idea behind this preprint – using modelling approaches to determine what an appropriate null model for behavioural variation between chimpanzee/orang-utan communities would be – is nice, the simulations fail in several ways in replicating the generative process that underlies the original Whiten et al 1999 paper. It is simply an insufficient null model because it ignores any information about ape behaviour except the number of cultural traits produced by that paper. The model simply does not approximate the system that it claims to simulate. It also feels like a strawman is attacked here by singling out one paper and ignoring everything else that is known about these species and their behaviour in the wild. It is telling that apart from the Whiten et al 1999 paper and van Schaik et al 2001 paper, almost no other papers on tool use in wild apes is cited. In the following, I will detail where the simulations fail to convince. This is often due to a complete lack of explanation as to why certain choices were made, making it impossible to replicate if one would start from scratch. There also seems to be a lack of humility in terms of what simulations can and cannot do.<br />
I will focus my criticism largely on the modelling approach and less on the concept of ‘socially-mediated reinnovation’, even though there is certainly enough to be said on the topic. The subheading for each paragraphs summarises the argument made. Mainly, this boils down to the following: the simulations ignore what we know about chimpanzee/orang-utan behaviour in general but also about the Whiten et al paper specifically. The simulations ignore that apes show hundreds of behaviours and exist in thousands of communities, and Whiten et al randomly picked subsets of both – picking 7 other communities might have led to having 70 or 90 cultural behaviours. By pre-selecting 64 behaviours as their only choice, the authors forego the conclusion of their simulations. They also ignore the fact that not all the behaviours in Whiten et al are the same in form, function, complexity, usage etc. The concept of ‘innovation’ seems so broad to be meaningless, because individuals can ‘reinnovate’ behaviours they already used in the past and seem to never be exposed to the behaviours of others until they innovate, at which point they suddenly take social information into account. Innovation rates seem excessively high. Innovation as defined here is meaningless for social behaviours. In the simulations, this leads to the fact that the rule that defines ‘innovation’ is essentially the same as one that would define ‘copying’. The simulations cover the entire range of possible results if one tweaks the parameters right; the authors then focus only on the simulation that contained the observed value in Whiten et al and claim that the parameters for that simulation were biologically the most meaningful, without giving any indication as to how this was decided.
Material and Methods<br />
Highly unnatural demographics and ignorance of the consequences of failed learning and of known learning biases towards specific group members<br />
First off, it is hard to see how the description of an oranzees life here follows ‘realistic demographic features’, as is stated by the authors. Citing the Hill et al 2001 paper here is somehow odd, because even in the best field site in that study, half of individuals died before 25 years of age. In most field sites, risk of death is highest in the first year of life, and continues to be high before individuals reach adulthood; in many sites, few males especially reach 25 years of age. We also have a good idea about inter-birth intervals in these species. This is non-trivial because individual survival will depend on mastering socially learned skills. Presumably, copying does not evolve mysteriously out of the blue – it is useful when the costs of failing to perform a task correctly are high. The model, and I would say the theory underlying it, ignore that an ape who fails to learn a skill correctly faces immense costs. Also, the weird age distribution of communities means that the number of individuals from whom an individual can learn is skewed: there is ample evidence that younger primates learn more, and they use older individuals and higher-ranking individuals as models (e.g. Kendal et al 2015, Horner et al 2010). Individuals will obviously learn more from their mother than any other group member, especially early on – that effect alone renders the simulations meaningless, because an individual will have all the skills it needs by age 8 and then just apply them. So, if learning probability in the simulations is based on the frequency a behaviour is observed in the population, treating all potential models evenly and not weighting the impact of potential models by their age (e.g. remove infants and juveniles) biases the outcome of the results. Any theory and model that is based solely on the frequency of behaviours in the population fails to account for all of these well-known effects.
Faulty assumptions of base likelihoods of behaviours and ignorance to the generative process underlying Whiten et al<br />
I think the most fundamental mistake encoded in the simulations is that they completely fail to understand the process that generated the Whiten et al 1999 results, and rather set up a process that is designed to create exactly the same number of behaviours just to make a point. I could make a random model with each individuals having a 30% chance of showing each of 65 behaviours, and there would obviously be some solutions that could look similar to the Whiten et al results, but that would not mean that the model at all captures any underlying processes. Simulation studies are only useful in as much as they can actually represent the probabilities underlying the original study, especially in this case where the simulation is specifically design to invalidate one existing study. Whiten et al 1999 did not select 65 behaviours out of 65; they selected 65 behaviours out of the several hundred observed chimpanzee behaviours in each site (Nishida et al 2010). They never claimed that these were the only behaviours in which variation could occur, and in fact adding more field sites since then has brought to light many other variants of existing behaviours, as well as entirely new behaviours. Non-tool use behavioural variation is not even included. Also, that study used a very small subset of randomly selected field sites. Everything else aside, sampling out of 65 behaviours means that the number of ‘customary’ etc behaviours is bound in a certain range, which becomes quite obvious in the supplementary, where even random selection leads to similar results as the Whiten et al 1999 study. This does not seem to make the authors suspicious. Essentially, any model would have to explain not why there is variation in 7 group in 65 behaviours, but how likely it is that a random selection of 7 groups (out of thousands) with hundreds of different behaviours shows the patterns observed here. For example, while all the communities in the Whiten et al study drag branches during displays, there is no a priori reason to believe that this is a chimpanzee universal. Also, just because the Whiten et al study does not include group-specific gesture use does not mean it does not exist. The impact of this decision becomes obvious once the genetic parameter is included: if the 65 variable behaviours are a subset of several hundred genetically or environmentally fixed behaviours, then the genetic and environmental parameters would have fundamentally different functions in the simulations.
Ignoring that most problems in the wild can be solved in hundreds of different ways<br />
The other problem with this generative process is one that is also apparent in all experimental studies in captivity, when testing whether apes learn tool use socially: usually, in those experiments, apes have a limited number of different ways of performing an action – often 2 options. However, this is not the case in wild populations. There are usually a large number of options with equal success likelihood that are NOT used; the more detailed the analysis, the more possible options there are. This becomes apparent, for example, in the use of bark pieces of different sizes for termite fishing in neighbouring communities in Gombe (Pascual-Garrido 2019). Chimpanzees in all sites could use a whole lot of tools to fish for termites, comb their hair etc: sticks of different sizes, bark of different sizes, parts of leaves, full leaves, their fingers etc. There are hundreds of different ways to groom someone. Many of these options are not used by any of the groups in the Whiten et al 1999 paper, without good reason, which again speaks against the a priori reduction to 64 behaviours as highly artificial. Again, the generative process for the original paper includes a random selection of field sites that happen to result in 65 solutions. Adding an 8th field site would have added e.g. 10 more behaviours. By ignoring this, the base likelihood of each behaviour in the simulations is off, and the result of the simulations more or less decided before any model is run.
Reinnovation is meaningless for social behaviours and embedded behaviours in sequences<br />
Next point: for social behaviours, re-innovation is a rather pointless concept. A display is not successful if nobody in the group understands what the displayer wants to express – even though individuals could incorporate a fantastical number of potential elements into their displays. Play elements that nobody else knows will not lead to successful play. Hand-clasp grooming does not work if only one individual does them. Courting a female by building a ground nest, as some chimpanzee males do, only works if the female gets the idea. This is as if I would re-invent the handshake – what is the point if nobody understands its meaning? This is completely putting aside that the 65 behaviours in Whiten et al largely ignore social traditions and communication, and focus heavily on tool use, which was the best studied at the time. Apes have probably in access of 100 different play elements in each group (Nishida et al 2010; Petru et al 2009), and it can easily be expected that innovation and social transmission occurs in this context (Perry 2011). One non-tool use example in Whiten et al 1999, rain dancing, cannot conceivably be reinnovated by one individual – what would that even look like, given that it is a coordinated action of several individuals with no discernible physical function? Many of the described behaviours in Whiten et al 1999 are not simple behaviours that occur in a vacuum, but action sequences with several elements that have to be fulfilled in the exact right order and are embedded in sequential behaviour patterns; for example leaf clipping. The generating process of the simulation ignores this and reduces behaviours to independent, on/off instances that fulfil their function outside of a wider context.
Artificial number of states and artificial assumptions about the use of behaviours<br />
The lack of detail in the Whiten et al study also plays an important role. For example, even though ‘drag branch’ is a common behaviour in all field sites, detailed analysis will likely find that there are different ways of dragging a branch, as has been found for other behaviour on the list (e.g. digging for honey Estienne et al 2017). But for the simulations here, this means that achieving the wanted ‘state’ of an individual is directly bound to doing a behaviour exactly like the partner. Also, the basic assumption of the simulations (there are 8 different grooming/play/courtship behaviours that all have the same outcome) is thoroughly misleading, because this is again not the case for the generating process underlying Whiten et al: The 64 behaviours on that list largely fulfil different functions, so instead of simulating 8 categories leading to 64 behaviours, the simulations would have to address 64 ‘states’ that need to be fulfilled. Just because many of them are used to acquire food does not mean that they all serve the same function. Categorising the behaviours as done in the simulations also ignores that even though the form of behaviours might be similar, function can differ drastically. For example, two behaviours that would fall under ‘display’, are branch dragging and drumming. Both are used in displays, but at different times and have different messaging functions, and drumming is also used in some field sites for long-distance communication in other contexts. Function and context might be specific to one sex or age-group: drumming in juveniles, for example, is often part of play; female chimps will slap the ground in displays rather than drum, even though drumming is sometimes observed.
Almost no decision in the simulation process is justified<br />
In general, it would be fantastic if there was even a basic description of why any of the simulations was designed as described here; many of the choices seem to be arbitrary and could not be replicated by someone engaging in the same activity.
‘Innovation probability’ is meaningless for social behaviours<br />
The innovation probability of social behaviour, psocial, seems to have no correspondence in the real world, and it is unclear to the reader why this way of calculating the necessity to innovate was chosen. This again highlights the inability of this framework to account for social traditions. Chimpanzees groom every day, play every day, display regularly, etc; they also observe others do these behaviours, and are the recipient, long before they actively take part in social interactions in their group. Also, from a modelling perspective, it is not clear here what is being innovated: for example, if an individual already has one ‘play’ behaviour but none of the other categories, is the play behaviour potentially reinnovated? Why are these social categories treated as fulfilling one overwhelming urge for ‘social’? I would understand if individuals would be assigned a random number of behaviours in each category, and would have to reinnovate if these do not match those of other group members (seems biologically much more plausible), but the way this process is described here seems meaningless. Also, the state of an individual’s social behaviours cannot exist outside of the state of other group members.
Group members’ needs are not independent<br />
The same is true for food: all group members at a certain time point would obviously be exposed to the same need and availability of food resources, so why is the simulation assuming independence of these things?
Socially-mediated innovation<br />
Individuals lack memory and the concept of ‘innovation’ used here is meaningless<br />
Now we come to some of the most irritating decisions taken in the modelling process, and it feels hard for the reader to understand why they were taken. These seem to completely ignore anything we know about social learning or the life history of animals. Let’s start with the first one: what is the meaning of ‘innovation’ if innovation can happen every month over and over again? If I read this right, each individual can ‘reinnovate’ the same behaviour multiple times in their life? That seems nonsensical – individuals create a repertoire of skills that they apply when necessary. They don’t ‘reinnovate’ nutcracking every time they are hungry, this renders the idea of innovation meaningless. If they already have one ‘play’ behaviour, and their state tells them to play, they use that behaviour. Essentially, the simulation pretends that these behaviours are consistently in flux in a population and an individual, but we know that they are not in wild ape groups. Also, the concept of innovation makes sense for zoo-based apes who are exposed to a new tasks, but is completely nonsensical for wild individuals: for example, by the time any chimp starts nut cracking, they will have observed several millions of strikes by their mother and other group members – are we to believe that they did not in any way take this information into account when acting? They will observe these actions by others while they themselves are in a sensitive period for learning the skills. This is fundamentally incomparable to throwing some stones into a zoo enclosure and hoping that an adult chimpanzee will potentially bang them on a nut. That reinnovation is potentially possible does not rule out the most individuals in a community do not in fact reinnovate. For example, I could easily re-invent the handshake; that does not mean that I initially learned the form and function of handshakes by myself.
The simulation rules used are undistinguishable from copying <br />
The second confusing aspect of the modelling approach described here is, that it would look essentially look the same if copying was described; it is unclear for the reader how these two models would differ from each other in real life. This does not mean that you rule out copying – it means that you are essentially modelling the same effect and give it a fancy name. The assumed difference in the simulations between socially mediated reinnovation (I find a pattern, I check whether this pattern fits the group pattern, I adopt the pattern) and copying (I check the group pattern, I adopt the group pattern) is the order of action and social information. As they are here modelled in the same step, there is no difference. The frequency of ‘innovation’ for any behaviour depends on the frequency of occurrence in the group. That is the same for copying – I can only copy a behaviour that I can observe, and the more I observe them, the more likely I am to copy them. Let us say I am looking for a way to crack nuts. There are three different ways of cracking nuts in my community. I choose to use one of those ways. This is particularly pertinent if the frequency of most behavioural options for a state is zero in the population, and no real ‘innovation’ (new solution) occurs. In the example run in the additional information, many behaviours seem to have one fixed choice in the population. I am not ‘reinnovating’ anything – I make use of the information that is present in the group, which I have observed my entire life. What the authors call ‘innovation’ from an individual perspective is not an ‘innovation’ from an information perspective – in my opinion, this it is thoroughly misnamed, because it assumes that individuals only incorporate social information after they have found an individual solution, which seems wasteful. It is therefore unclear why ‘socially mediated reinnovation’ is supposed to be a simpler explanation for copying. The S factor indicates that sometimes, individuals do not copy faithfully; that does not provide any evidence that the rule described here differs from copying. There is also no accounting for the fact that we do not know at which rate chimpanzees and orang-utans innovate at all; the assumption of the modelling approach seems that each individual innovates whatever they need all the time, but this stands in complete contrast to the fact that chimpanzee communities seem to spend decades doing things the same way. I would urge the authors to somehow indicate why they think that their approach is not simply modelling exactly the same process that everyone else would call ‘copying’, except that they switch whether individuals first observe and then do, or first do and then observe. Because, the latter is meaningless for long-lived animals with long infancy.
Results<br />
The simulations cover the full range of possible values, and the authors simply pick the one they like best<br />
I am not going to go into much detail for the results, because I am not sure what they are supposed to show given the problems raised above. Just relating to Figure 1: It is clear from this figure that a) the result of the simulation is dramatically influenced by the ‘genetic variability’ parameter, which seems artificial and not anchored in any real-world research, even when ignoring the problems of preselecting a subset of behaviours raised above; you can basically achieve any distribution between 0 and 64 by varying this parameter, so some of them necessarily will cover the value from Whiten et al 1999. B) On top of that, the variation for each of the combinations of environmental and genetic factors is huge, all of them cover a range of 20-30 cultural traits (about half the possible values). So, again, what does it mean in this case that the value described in Whiten et al falls into this category? Every other value does as well under some conditions, and the authors simply pick a subset and argue that this is the one they were looking for all along. For example, the manuscript says that there is a good match for alpha_e = 0.8 and alpha_g = 0.2. What does this mean biologically? Is there any indication that this in any way represents that actual circumstances of these chimpanzee communities in the original paper? Is this a better representation of chimpanzee communities than the ones with alpha_g = 0 or alpha_g > 0.5, and what are the criteria to make this decision? If we assume that these chimpanzee communities share several hundred behaviours that were not included in the original 65 possibly cultural behaviours, then alpha_g for chimpanzees is probably very large; the picture is distorted by just using one specific subset. It is repeatedly stated that this simulation represents ‘realistic values for genetic propensity and ecological availability’; it seems a bit cartoonish to reduce genetics and ecology to one value each and call that ‘realistic’. Obviously models need to abstract, but then this should be presented as what it is.
Discussion<br />
I just want to very specifically point out this statement: ‘More generally, the results of our models suggest caution when deriving individual-level mechanisms from population-level patterns (see also (Acerbi et al. 2016; Barrett 2019)).’ However, the same thing is also true the other way round. This paper obviously produces some population-level patterns, and under certain circumstances and when one abstracts everything one knows about primates, they look like they might be similar to the ones reported in the wild. That does not mean that the individual-level process that was used to generate the data was biologically meaningful or represents the system you want to study; as described above, there are many unexplained decisions taken by the authors, and they fail to convince this reader that their choices are replicable and accurately describe chimpanzee or orang-utan behaviour.