Reviewer #2 (Public Review):
This work conducted a Mendelian randomization analysis between TG and a large number of disease traits in biobanks. They leverage the publicly available summary statistics from the European samples from the UK Biobank and FinnGen. A solid but routine standard summary-statistics based MR study is conducted. Several significant causal associations from TG to phenotypes are called by setting p-value cutoff with some Bonferroni correction. Sensitivity statistical analyses are conducted which generate largely consistent results. The research problem is important and relevant for public health as well we drug development. Overall this is a solid execution of current methods over appropriate data source and yields a convincing result. The interpretation of the results in discussion is also well-balanced.
While the paper does have strengths in principle, a few technical weaknesses are observed.
They used UK Biobank as the discovery and FinnGen as the replication. But the two cohorts are rather used symmetrically. Especially for the Tier 3 (NB), it seems to be an attempt of reusing the replication cohort as the discovery. I wonder if that would create additional multiple testing burden as a greater number of hypotheses are considered.
The replication p-value cutoff is a bit statistically lenient. In a typical discovery-replication setting the two stages are conducted sequentially and replication should go through the Bonferroni adjustment on the number of significant signals from discovery that is tested in the replication. For example, in this case, in tier 2, the cutoff should be 0.05/39. This may make the association of leiomyoma of the uterus slightly non-significant though. Similar cutoff should be applied to tier 3 as well.
The causal effect of TG to leiomyoma of the uterus is weak, as indicated by both the sub-significant in the replication and the non-significant of MR-PRESSO. Similarly, I would recommend more caution on the weak statistical rigor when interpreting Tier 2 and Tier 3 results.
Another methodological choice that might need justification is the use of UKB TG GWAS loci (1,248 SNPs) are the instrument for FinnGen. This may create some subtle interference with the use of UKB as outcomes in the discovery analysis. It may be minor but some justification or at least some discussions of potential limitations should be mentioned. What about the alternative of using GLGC as instruments in replication?
For disease outcomes (line 188), UKB European sample size is ~400,000 rather than ~500,000. Can the author clarify the sample size they used?
It would be reassuring to the reader if the TG measurements were measured in a treatment-naïve manner.
"Phenome-wide MR is a high-throughput extension of MR that, under specific assumptions, estimates the causal effects of an exposure on multiple outcomes simultaneously." - I guess it is more informative to mention the specific assumptions, at least briefly, in the introduction so it is easier for the reader to interpret the results.