Reviewer #1 (Public review):
Summary:
The authors develop a multivariate extension of SEM models incorporating transmitted and non-transmitted polygenic scores to disentangle genetic and environmental intergenerational effects across multiple traits. Their goal is to enable unbiased estimation of cross-trait vertical transmission, genetic nurture, gene-environment covariance, and assortative mating within a single coherent framework. By formally deriving multivariate path-tracing rules and validating the model through simulation, they show that ignoring cross-trait structure can severely bias both cross- and within-trait estimates. The proposed method provides a principled tool for studying complex gene-environment interplay in family genomic data.
Strengths:
It has become apparent in recent years that multivariate processes play an important role in genetic effects that are studied (e.g., Border et al., 2022), and these processes can affect the interpretation of these studies. This paper develops a comprehensive framework for polygenic score studies using trio data. Their model allows for assortative mating, vertical transmission, gene-environment correlation, and genetic nurture. Their study makes it clear that within-trait and cross-trait influences are important considerations. While their exposition and simulation focus on a bivariate model, the authors point out that their approach can be easily extended to higher-dimensional applications.
Weaknesses:
(1) My primary concern is that the paper is very difficult to follow. Perhaps this is inevitable for a model as complicated as this one. Admittedly, I have limited experience working with SEMs, so that might be partly why I really struggled with this paper, but I ultimately still have many questions about how to interpret many aspects of the path diagram, even after spending a considerable amount of time with it. Below, I will try to point out the areas where I got confused (and some where I still am confused). If the authors choose to revise the paper, clarifying some of these points would substantially broaden the paper's accessibility and impact.
(1a) Figure 1 contains a large number of paths and variable names, and it is not always apparent which variables correspond to which paths. For example, at a first glance, the "k + g_c" term next to the "T_m" box could arguably correspond to any of the four paths near it. Disentangling this requires finding other, more reasonable variables for the other lines and sifting through the 3 pages of tables describing the elements of the figure.
(1b) More hand-holding, describing the different parameters in the model, would help readers who don't have experience with SEMs. For example, many parameters show up several times (e.g., delta, a, g_c, i_c, w) and describing what these parameters are and why they show up several times would help. Some of this information is found in the tables (e.g., "Note: [N]T denotes either NT or T, as both share the same matrix content"), though I don't believe it is explained what it means to "share the same matrix content."
(1c) Relatedly, descriptions of the path tracing were very confusing to me. I was relieved to see the example on the bottom of page 10 and top of page 11, but then as I tried to follow the example, I was again confused. Because multiple paths have the same labels, I was not able to follow along which exact path from Figure 1 corresponded to the elements of the sum that made up Theta_{Tm}. Also, based on my understanding of the path-tracing rules described, some paths seemed to be missing. After a while, I think I decided that these paths were captured by the (1/2)*w term since that term didn't seem to be represented by any particular path in the figure, but I'm still not confident I'm right. In this example, rather than referring to things like "four paths through the increased genetic covariance from AM", it might be useful to identify the exact paths represented by indicating the nodes those paths go through. If there aren't space constraints, the authors might even consider adding a figure which just contains the relevant paths for the example
(1d) The paper has many acronyms and variable names that are defined early in the paper and used throughout. Generally, I would limit acronyms wherever possible in a setting like this, where readers are not necessarily specialists. For the variables, while the definitions are technically found in the paper, it would be useful to readers if they were reminded what the variables stood for when they are referred to later, especially if that particular variable hasn't been mentioned for a while. As I read, I found myself constantly having to scroll back up to the several pages of figures and tables to remind myself of what certain variables meant. Then I would have to find where I was again. It really made a dense paper even harder to follow.
(1e) Relatedly, on page 13, the authors make reference to a parameter eta, and I don't see it in Figure 1 or any of the tables. What is that parameter?
(2) This point may be related to me misunderstanding the model, but if LT_p represent the actual genetic factors for the two traits for variants that are transmitted to the child, and T_p represents the PGS of for transmitted variants, shouldn't their be a unidirectional arrow from LT_p to T_p (since the genetic factor affects the PGS and not the other way around) and shouldn't there be no arrow from T_p to Y_0 (since the entire effect of the transmitted SNPs is represented by the arrow from LT_p to Y_0)? If I'm mistaken here, it would be useful to explain why these arrows are necessary.
(3) Some explanation of how the interpretation of the coefficients differs in a univariate model versus a bivariate model would be useful. For example, in a univariate model, the delta parameter represents the "direct effect" of the PGI on the offspring's outcome (roughly corresponding to a regression of the offspring's outcome onto the offspring's PGI and each parent's PGI). Does it have the same interpretation in the bivariate case, or is it more closely related to a regression of one of the outcomes onto the PGIs for both traits?
(4) It appears from the model that the authors are assuming away population stratification since the path coefficient between T_m and T_m is delta (the same as the path coefficient between T_m and Y_0). Similarly, I believe the effect of NT_m on Y_0 only has a genetic nurture interpretation if there is no population stratification. Some discussion of this would be valuable.
References:
Border, R., Athanasiadis, G., Buil, A., Schork, AJ, Cai, N., Young, AI, ... & Zaitlen, N.A. (2022). Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science , 378 (6621), 754-761.