3. Negative general ability
couldn't \ not be able to
3. Negative general ability
couldn't \ not be able to
3. When I was fifteen, I ________________________ ride a motorbike.
I could / was able to
3.____ We were able to get an excellent price on flights for our trip.
Sa
1. At the age of eight, she_______________________ speak three languages. 2. And when she was ten, she ___________________________ speak another two languages. 3. When I was fifteen, I ________________________ ride a motorbike. 4. However, at age fifteen, I ____________________________ drive a car. 5. Unfortunately, I ______________________________ find my keys last night when I got home last night. 6. Fortunately, I ____________________________ unlock the door with a spare key last night. 7. However, I _______________________ find the original keys at that time. 8. The rescuers _______________________contact the trapped miners soon after the mine collapsed yesterday. 9. She ______________________ smell gas coming from the stove, so she turned off the burner. 10. However, she ____________________________ see the escaping gas. 11. I ________________________________understand anything that he was saying. 12. We ______________________________ hear Joe enter the house because he made a loud noise.
she could / was able to speak she was able to speak I could / was able to ride I couldn’t / wasn’t able to drive I couldn’t / wasn’t able to find /I was able to unlock I couldn’t / wasn’t able to find /were able to contact /She could smell/ she couldn’t see/ I couldn’t / wasn’t able to understand /We could hear
1.____ I was able to ride a bike when I was ten years old. 2.____ Unfortunately, I wasn’t able to make a dentist appointment for this week. 3.____ We were able to get an excellent price on flights for our trip. 4.____ Sherry was able to run faster than her brother when they were young. 5.____ David was able to lift heavy boxes before his injury. 6.____ Randy was not able to drive until age eighteen. 7.____ I used to be able to sing. 8.____ The students were able to pass the Final Exam last fall. 9.____ Were you able to finish your homework yesterday?
1GA 2SA 3SA 4GA 5GA 6GA 7GA 8SA 9SA
Complete these sentences about past ability using could/couldn't or was/were able to. More than one answer may be correct. If two answers are correct, write both answers. 1. At the age of eight, she_______________________ speak three languages. 2. And when she was ten, she ___________________________ speak another two languages. 3. When I was fifteen, I ________________________ ride a motorbike. 4. However, at age fifteen, I ____________________________ drive a car. 5. Unfortunately, I ______________________________ find my keys last night when I got home last night. 6. Fortunately, I ____________________________ unlock the door with a spare key last night. 7. However, I _______________________ find the original keys at that time. 8. The rescuers _______________________contact the trapped miners soon after the mine collapsed yesterday. 9. She ______________________ smell gas coming from the stove, so she turned off the burner. 10. However, she ____________________________ see the escaping gas. 11. I ________________________________understand anything that he was saying. 12. We ______________________________ hear Joe enter the house because he made a loud noise.
she could / was able to speak she was able to speak I could / was able to ride I couldn’t / wasn’t able to drive I couldn’t / wasn’t able to find I was able to unlock I couldn’t / wasn’t able to find were able to contact She could smell she couldn’t see I couldn’t / wasn’t able to understand We could hear
Decide whether the sentence describes a specific or general ability Write SA for specific ability and GA for general ability. 1.____ I was able to ride a bike when I was ten years old. 2.____ Unfortunately, I wasn’t able to make a dentist appointment for this week. 3.____ We were able to get an excellent price on flights for our trip. 4.____ Sherry was able to run faster than her brother when they were young. 5.____ David was able to lift heavy boxes before his injury. 6.____ Randy was not able to drive until age eighteen. 7.____ I used to be able to sing. 8.____ The students were able to pass the Final Exam last fall. 9.____ Were you able to finish your homework yesterday?
1GA 2SA 3SA 4GA 5GA 6GA 7GA 8SA 9SAI could ride a bike when I was ten years old.
Sherry could run faster than her brother when they were young.
David could lift heavy boxes before his injury.
Randy could not drive until age eighteen.
I could sing.
jaguars




Jaguars is the plural noun of jaguar. Jaguars are large wild cats native to the Americas, especially Central and South American rainforests. They are the largest big cat in the Americas.
In simple terms: jaguars = powerful spotted big cats of the American tropics
| Feature | Jaguars | Leopards | | -------- | ------------------------ | ---------------------------- | | Region | Americas | Africa & Asia | | Build | Stocky, very powerful | Slimmer | | Rosettes | Larger, with central dot | Smaller, no central dot | | Water | Strong swimmers | Avoid water | | Bite | Extremely strong | Strong, but less than jaguar |
Jaguars are top predators that:
If you see “jaguars” by itself, it usually functions as:




jaguars 是 jaguar(美洲豹) 的复数形式。 美洲豹是 美洲体型最大的猫科动物,主要生活在 中美洲和南美洲。
简单说: jaguars = 生活在美洲热带地区的强壮大型猫科动物
美洲豹常被视为 雨林顶级捕食者的代表。
| 特点 | 美洲豹 | 豹子 | | -- | ----- | ----- | | 分布 | 美洲 | 非洲、亚洲 | | 体型 | 更强壮 | 较瘦 | | 斑点 | 中央有黑点 | 无黑点 | | 水性 | 喜水 | 避水 |
如果只看到 jaguars,通常表示:
If you want, paste the full sentence, diagram, or exam question (e.g., rainforest food webs or canopy adaptations), and I’ll explain exactly why jaguars are mentioned and what concept they illustrate(中英对照、直击考点).
three-toed sloth


![]()

A three-toed sloth is a slow-moving tree-dwelling mammal found in Central and South American rainforests. The name comes from the fact that it has three toes on each limb.
In simple terms: three-toed sloth = a very slow rainforest animal that lives in trees and has three toes
Three-toed sloths survive by not being noticed:
This is a classic example of adaptation to the canopy environment.
| Feature | Three-toed sloth | Two-toed sloth | | -------- | ---------------------------- | ---------------- | | Toes | 3 on all limbs | 2 on front limbs | | Neck | Longer (can rotate head far) | Shorter | | Activity | More diurnal | More nocturnal | | Diet | Mostly leaves | Leaves + fruit |
Despite the names, both have three toes on their hind feet.
If you see “three-toed sloth” by itself, it usually functions as:
![]()


three-toed sloth 指 三趾树懒,是一种生活在 中美洲和南美洲热带雨林 的 树栖哺乳动物,因 每只脚有三根脚趾 而得名。
简单说: three-toed sloth = 生活在雨林树上的慢动作动物
三趾树懒的生存策略是 “不被发现”:
| 特征 | 三趾树懒 | 二趾树懒 | | ---- | ---- | ------ | | 前肢趾数 | 3 | 2 | | 颈部 | 较灵活 | 较短 | | 活动时间 | 偏白天 | 偏夜间 | | 食性 | 以叶为主 | 叶 + 果实 |
注意:两种树懒的后肢都是三趾。
如果只看到 three-toed sloth,通常表示:
If you want, you can paste the full sentence, diagram, or exam question (for example about rainforest canopy adaptations or food webs), and I’ll explain exactly why the three-toed sloth is mentioned and what concept it illustrates(中英对照,直击考点).
toucans



Toucans is the plural noun of toucan. Toucans are tropical birds best known for their very large, colorful beaks and their life in rainforest canopies.
In simple terms: toucans = tropical birds with big, bright beaks
Despite appearances, the beak is:
This is a classic example of adaptation.
Toucans are important because they:
If you see “toucans” by itself, it usually functions as:

![]()


toucans 是 toucan(巨嘴鸟) 的复数形式。 巨嘴鸟是一类生活在 中南美洲热带雨林 的鸟类,以 巨大而鲜艳的喙 闻名。
简单说: toucans = 有巨大彩色嘴巴的热带鸟类
巨嘴鸟的喙并不是负担,而是:
这是典型的 结构与功能相适应 的例子。
如果只看到 toucans,通常表示:
If you want, paste the full sentence, diagram, or exam question (e.g., rainforest canopy adaptations or seed dispersal), and I’ll explain exactly why toucans are mentioned and what concept they illustrate(中英对照、直击考点).
orchids




Orchids is the plural noun of orchid. It refers to plants in the orchid family (Orchidaceae)—one of the largest and most diverse plant families on Earth. Orchids are flowering plants known for their complex, often beautiful flowers.
In simple terms: orchids = a very large group of flowering plants with specialized flowers
Growth forms:
Epiphytic (grow on trees, not parasitic)
Orchids play important ecological roles:
Orchids are classic examples of evolutionary adaptation:
If you see “orchids” by itself, it usually functions as:




orchids 是 orchid(兰花) 的复数形式,指 兰科植物。 兰科是 地球上物种最多的植物科之一,以 结构复杂、形态多样的花朵 著称。
简单说: orchids = 兰花,一大类开花植物
生长方式:
附生兰(长在树上,但不寄生)
兰花是 自然选择与适应进化 的经典例子:
如果只看到 orchids,通常表示:
If you want, paste the full sentence, diagram, or exam question (for example about pollination, coevolution, or forest layers), and I’ll explain exactly why orchids are mentioned and what concept they illustrate(中英对照、直击考点).
Vines




Vines is the plural noun of vine. It refers to plants with long, flexible stems that climb, trail, or spread rather than standing upright on their own.
In simple terms: vines = plants that grow by climbing or creeping instead of standing straight
Vines use other objects for support instead of building thick, rigid stems. They may climb by:
This strategy saves energy and helps vines reach sunlight.
Vines are especially common in warm, wet climates, but many also grow in temperate regions.
Some vines are woody, others are soft and flexible.
Vines:
In tropical forests, vines are a major part of the canopy and understorey.
If you see “vines” by itself, it usually functions as:




vines 是 vine(藤蔓植物) 的复数形式,指 茎细长、柔软,不能独立直立生长,而是 攀爬或蔓延生长 的植物。
简单说: vines = 藤蔓植物
藤蔓植物通常通过以下方式生长:
这种方式能 节省能量,快速到达有阳光的地方。
有些藤蔓是 木质的,有些是 草质的。
藤蔓:
如果只看到 Vines,通常表示:
If you want, you can paste the full sentence or diagram (for example, a forest layers or plant adaptation passage), and I’ll explain exactly why vines are mentioned there and what concept they illustrate(中英对照、直击考点)。
understorey



Understorey (British/Canadian spelling; understory in American English) is the layer of vegetation that grows beneath the forest canopy but above the forest floor.
In simple terms: understorey = the middle plant layer under the tree canopy
The understorey typically includes:
These plants receive filtered sunlight, not full sun.
The understorey plays key roles in forest ecosystems:
Forests are often described as having layers:
Understanding the understorey helps explain light competition and plant adaptation.
Plants in the understorey are adapted to low light:
| Term | Meaning | | --------------- | ------------------------------- | | understorey | Middle vegetation layer | | canopy | Upper tree layer | | forest floor | Ground layer | | shrub layer | Often overlaps with understorey |
If you see “understorey” by itself, it usually functions as:




understorey(英式/加式拼写,美式为 understory)指 位于森林树冠层之下、林地层之上的植被层,中文常称 林下层 / 下木层。
简单说: understorey = 树冠下面的中间植物层
这些植物只能接收到 被树冠过滤后的阳光。
林下层:
森林通常分为:
林下层植物常具有:
以适应 弱光环境。
| 英文 | 中文 | | ------------ | --- | | canopy | 树冠层 | | understorey | 林下层 | | forest floor | 林地层 |
若只看到 understorey,通常表示:
If you want, you can paste the 完整句子、图表或考试题(例如关于 光照、森林分层或植物适应性),我可以帮你 结合语境精准讲解 understorey 的考点(中英对照)。
canopy




A canopy is the upper layer formed by the tops of trees or plants, where leaves and branches overlap to create a kind of roof over what is below.
In simple terms: canopy = the leafy “ceiling” at the top of a forest or plant community
In forests, the canopy:
Many textbooks describe forests by layers:
The canopy:
In rainforests, the canopy is often the most biologically diverse layer.
Canopy cover means the percentage of ground shaded by the canopy when viewed from above.
This term is widely used in ecology, forestry, and geography.
Outside ecology, canopy can also mean:
But in science texts, it almost always refers to the plant/forest layer.
If you see “canopy” by itself, it usually functions as:




canopy 指 树冠层,是由树木顶部的 枝叶相互覆盖 形成的 上层结构,像一顶“绿色屋顶”。
简单说: canopy = 森林上方的树冠层
在森林生态系统中,树冠层:
森林常被分为不同层次:
树冠层:
canopy cover(树冠覆盖率) 指:
从上方看,树冠遮住地面的百分比
在非生态语境中,canopy 也可指:
但在 Science / Geography 中,几乎一定指 树冠层。
如果只看到 canopy,通常表示:
If you want, paste the full sentence, diagram, or exam question where canopy appears (e.g., forest layers or light competition), and I’ll explain exactly what role it plays in that context(中英对照、直击考点).
Tropic of Capricorn




The Tropic of Capricorn is an imaginary line of latitude at about 23.5° south of the Equator. It marks the southernmost point on Earth where the Sun can be directly overhead (at noon) during the year.
In simple terms: Tropic of Capricorn = the southern limit of the tropics
The Tropic of Capricorn crosses parts of:
| Line | Latitude | Meaning | | ----------------------- | ----------- | ----------------------------- | | Equator | 0° | Middle of Earth | | Tropic of Cancer | 23.5° N | Northern limit of tropics | | Tropic of Capricorn | 23.5° S | Southern limit of tropics |
![]()



Tropic of Capricorn 中文叫 南回归线,是一条位于 南纬约 23.5° 的 假想纬线。 它表示 太阳一年中能够直射到达的最南界线。
简单说: 南回归线 = 热带的最南边界
南回归线穿过:
| 纬线 | 纬度 | 含义 | | -------- | ----------- | --------- | | 赤道 | 0° | 地球中线 | | 北回归线 | 23.5° N | 热带最北界 | | 南回归线 | 23.5° S | 热带最南界 |
If you want, paste the full sentence or exam question (e.g., about seasons, climate zones, or Earth–Sun geometry), and I’ll explain exactly why the Tropic of Capricorn is mentioned—中英对照、直击考点.
Tropic ofCancer



The Tropic of Cancer is an imaginary line of latitude located at about 23.5° north of the Equator. It marks the northernmost point on Earth where the Sun can be directly overhead (at noon) during the year.
In simple terms: Tropic of Cancer = the northern limit of the tropics
The Tropic of Cancer crosses parts of:
| Line | Latitude | Meaning | | ------------------- | -------- | ------------------------- | | Equator | 0° | Middle of Earth | | Tropic of Cancer | 23.5° N | Northern limit of tropics | | Tropic of Capricorn | 23.5° S | Southern limit of tropics |
![]()



Tropic of Cancer 中文叫 北回归线,是一条位于 北纬约 23.5° 的 假想纬线。 它表示 太阳一年中能够直射到达的最北界线。
简单说: 北回归线 = 热带的最北边界
北回归线穿过:
| 纬线 | 纬度 | 含义 | | ---- | ------- | ----- | | 赤道 | 0° | 地球中线 | | 北回归线 | 23.5° N | 热带最北界 | | 南回归线 | 23.5° S | 热带最南界 |
If you want, paste the full sentence or exam question (e.g., about seasons, climate zones, or Earth–Sun geometry), and I’ll explain exactly why the Tropic of Cancer is mentioned—中英对照、直击考点.
prairie




A prairie is a type of large, open grassland with few or no trees, found mainly in central North America. The land is usually flat or gently rolling and covered mostly by grasses and wildflowers.
In simple terms: prairie = wide, open grassland
Prairies are often divided into types based on rainfall:
| Type | Rainfall | Description | | ----------------------- | -------- | ------------------------------- | | Tallgrass prairie | Higher | Tall grasses, very fertile soil | | Mixed-grass prairie | Medium | Mix of tall and short grasses | | Shortgrass prairie | Lower | Short grasses, drier conditions |
Many animals are adapted for open spaces, burrowing, or running.
Prairies:
| Feature | Prairie | Forest | Desert | | -------- | ------------ | -------- | ------------- | | Trees | Few | Many | Very few | | Rainfall | Medium | High | Very low | | Plants | Grasses | Trees | Sparse plants | | Soil | Very fertile | Moderate | Poor |
If you see “prairie” by itself, it usually functions as:




prairie 指 大草原,是一种 以草类为主、几乎没有树木 的开阔地形,主要分布在 北美中部。
简单说: prairie = 北美大草原
| 类型 | 中文 | 特点 | | ------------------- | ---- | -------- | | Tallgrass prairie | 高草草原 | 草高、土壤最肥沃 | | Mixed-grass prairie | 混合草原 | 中等高度 | | Shortgrass prairie | 矮草草原 | 较干燥 |
| 特点 | 草原 | 森林 | 沙漠 | | -- | -- | -- | -- | | 树木 | 少 | 多 | 极少 | | 降水 | 中等 | 高 | 很低 | | 土壤 | 肥沃 | 中等 | 贫瘠 |
如果只看到 prairie,通常表示:
If you want, you can paste the 完整句子、地图或考试题(例如关于 prairie food webs 或生态适应 的内容),我可以帮你 逐句解释并标出考点(中英对照)。
burrow




Burrow can be both a noun and a verb.
In simple terms: burrow = an underground home or tunnel made by animals
A burrow is an underground space used by animals for:
Examples:
To burrow means:
Examples:
Burrows affect ecosystems by:
Animals that do this are sometimes called ecosystem engineers.
| Word | Difference | | ---------- | ------------------------------- | | burrow | underground tunnel | | nest | built above ground or in trees | | den | shelter (often larger animals) | | hole | general, not necessarily a home |




burrow 既可以作 名词,也可以作 动词。
简单说: burrow = 动物挖的地下洞穴 / 挖洞
动物的 burrow 用来:
例句:
burrow 表示:
例句:
动物挖洞可以:
| 英文 | 中文区别 | | ---------- | ---------- | | burrow | 地下洞穴 | | nest | 巢(多在地面或树上) | | den | 兽穴(较大动物) | | hole | 普通的洞 |
If you want, you can paste the 完整句子或课文段落(例如关于 gophers、coyotes、grassland ecosystems 的内容),我可以帮你 结合语境精确解释 burrow 的作用和考点(中英对照)。
gophers




Gophers is the plural noun of gopher. In biology, it usually refers to small burrowing rodents, especially pocket gophers, known for living underground and pushing up soil mounds.
In simple terms: gophers = small digging rodents that live in tunnels underground
Gophers play important roles in ecosystems:
If you see “gophers” by itself, it usually functions as:




gophers 是 gopher(地鼠 / 囊地鼠) 的复数形式,指一类 善于挖洞、长期生活在地下的啮齿动物。
简单说: gophers = 生活在地下、会挖洞的地鼠
地鼠在生态系统中具有重要作用:
如果只看到 gophers,通常表示:
If you want, paste the full sentence or exam question where gophers appears (e.g., a grassland food web), and I’ll explain its exact role and why it’s mentioned in both English and Chinese.
coyotes
![]()



Coyotes is the plural noun of coyote. Coyotes are medium-sized wild canines (Canis latrans) native to North America and closely related to wolves and dogs.
In simple terms: coyotes = wild, dog-like predators that are highly adaptable
Coyotes have expanded dramatically and now live:
Coyotes are mesopredators that:
If the excerpt is just “coyotes”, it likely functions as:




coyotes 是 coyote(郊狼) 的复数形式。 郊狼是一种 野生犬科动物,原产于 北美,与狼和狗有亲缘关系。
简单说: coyotes = 适应能力很强的野生“狼狗类”动物
郊狼分布极广:
郊狼属于 中型捕食者(中级捕食者):
若只看到 coyotes,通常是:
如果你把 完整句子或课文段落(例如 Science 10 的食物网或生态系统内容)贴出来,我可以帮你 精确解释 coyotes 在该语境中的含义与考点(中英对照)。
kangaroos




Kangaroos is the plural noun of kangaroo. Kangaroos are large marsupial mammals native to Australia, famous for hopping on powerful hind legs and for carrying their young in a pouch.
In simple terms: kangaroos = large Australian animals that hop and carry babies in a pouch
If you see “kangaroos” by itself, it usually serves as:




kangaroos 是 kangaroo(袋鼠) 的复数形式。 袋鼠是 澳大利亚特有的大型有袋类哺乳动物,以 跳跃行走 和 育儿袋 著称。
简单说: kangaroos = 会跳跃、用育儿袋养幼崽的澳大利亚动物
若只出现 kangaroos,通常是:
如果你把 完整句子或课文段落(如关于草原放牧、食物网或适应性的内容)贴出来,我可以进一步 逐句解释其语境含义与考点(中英对照)。
leopards




Leopards is the plural noun of leopard. Leopards are large wild cats known for their spotted coats, stealth, and adaptability.
In simple terms: leopards = powerful, stealthy big cats with spotted fur
Leopards have one of the widest ranges of any big cat:
They can live near human settlements due to their adaptability.
| Feature | Leopards | Cheetahs | | -------- | --------------------- | --------------------- | | Spots | Rosettes | Solid black spots | | Body | Stocky, powerful | Slim, built for speed | | Speed | Fast, but not fastest | Fastest land animal | | Climbing | Excellent climbers | Poor climbers | | Hunting | Stealth & strength | Speed chases |
Leopards are top predators that:
If you see “leopards” by itself, it usually functions as:




leopards 是 leopard(豹子) 的复数形式。 豹子是一种 大型猫科动物,以 斑点花纹、隐蔽性强、适应能力高 而著名。
简单说: leopards = 有斑点、善于潜行的大型猫科动物
豹子分布范围很广:
适应力强,能在多种环境中生存。
| 特点 | 豹子 | 猎豹 | | ---- | ----- | ---- | | 斑点 | 玫瑰状 | 实心点 | | 体型 | 强壮 | 修长 | | 奔跑 | 快但非最快 | 最快 | | 爬树 | 很强 | 很弱 | | 捕猎方式 | 潜行伏击 | 高速追逐 |
如果只看到 leopards,通常表示:
如果你愿意,把 完整句子或课文段落(如草原生态、捕食者比较、适应性等)贴出来,我可以帮你 逐词拆解并标注考试要点(中英对照)。
cheetahs




Cheetahs is the plural noun of cheetah. It refers to a species of large wild cat best known as the fastest land animal.
In simple terms: cheetahs = very fast spotted big cats
Cheetahs are predators that help control prey populations, contributing to ecosystem balance.
If the excerpt is just “cheetahs”, it likely functions as:




cheetahs 是 cheetah(猎豹) 的复数形式。 猎豹是 陆地上速度最快的动物。
简单说: cheetahs = 速度极快、身上有黑色斑点的大型猫科动物
如果只看到 cheetahs,通常是:
If you want, paste the 完整句子或课文段落(如关于草原生态、捕食者或适应性的内容),我可以帮你 精确说明 cheetahs 在该语境中的含义与考点(中英对照)。
grazing




Grazing is a noun and also the -ing form of the verb “graze.” It refers to the act of animals feeding on grasses and other low-growing plants.
In simple terms: grazing = animals eating grass or similar plants
Typical grazing animals include:
These animals are often called grazers.
| Term | What is eaten | Examples | | ------------ | --------------------- | -------------- | | Grazing | Grass, low plants | Cows, horses | | Browsing | Leaves, shrubs, twigs | Deer, giraffes |
This distinction is common in biology and ecology texts.
Grazing plays a major role in ecosystems:
Moderate grazing can be beneficial to ecosystems.
Overgrazing happens when animals eat plants faster than they can regrow.
Consequences include:
This term often appears in environmental science and geography.
Besides feeding, graze can also mean:
To touch lightly or scrape
“The ball grazed his arm.”
But in science/ecology contexts, grazing almost always means feeding.



grazing 指 放牧、吃草,是动物在草地上 啃食草类和低矮植物 的行为。
简单说: grazing = 动物吃草 / 放牧
常见 食草动物:
这些动物被称为 放牧动物(grazers)。
| 英文 | 中文 | 吃什么 | | ------------ | ---- | ------ | | grazing | 放牧 | 草、低矮植物 | | browsing | 取食灌木 | 树叶、嫩枝 |
适度放牧可以:
过度放牧 是考试中的高频概念,指:
动物取食速度超过植物再生速度
后果包括:
graze 还可表示:
轻微擦过、掠过
球擦过他的手臂
但在 生物/地理课本 中,grazing 几乎总是指“放牧”。
If you want, you can paste the 完整句子或考试题(例如关于草原、食物网或人类活动的内容),我可以帮你 精确解释 grazing 在该语境中的含义与考点(中英对照)。
acacia

![]()


Acacia is a noun referring to a group of trees and shrubs in the genus Acacia (pea/legume family). They are especially common in Australia, Africa, and tropical–subtropical regions.
In simple terms: acacia = a thorny tree or shrub adapted to hot, dry environments
Acacias show classic dry-environment adaptations:
These traits make acacias dominant in savannas and semi-deserts.
If the excerpt is just “acacia”, it usually functions as:




acacia 指 金合欢,是一类生长在 热带和亚热带地区 的 乔木或灌木,属于 豆科植物。
简单说: acacia = 金合欢,一种耐旱、常带刺的树或灌木
金合欢是 干旱环境适应植物 的典型例子:
如果只看到 acacia,通常表示:
If you want, paste the 完整句子或课文段落(例如关于 干旱生态系统或植物适应性 的内容),我可以帮你 逐句拆解并标出考点(中英对照)。
clover





Clover is a noun referring to a group of low-growing flowering plants in the genus Trifolium (pea/legume family). They are common in lawns, meadows, and fields.
In simple terms: clover = a small plant with three-part leaves and round flower heads
Note: A four-leaf clover is rare and traditionally considered lucky.
Clover is mainly insect-pollinated:
This makes clover important for pollinator health.
Clover plays several key roles:
If you see “clover” by itself, it usually functions as:





clover 指 三叶草,是一类 低矮的开花植物,属于 豆科(Trifolium 属),在草地和草坪中非常常见。
简单说: clover = 三叶草
四叶草 属于基因突变,较罕见,常被视为幸运象征。
三叶草主要是 虫媒花:
如果只看到 clover,通常表示:
If you want, paste the 完整句子或课文段落(如 Science 10 的授粉或生态系统内容),我可以帮你 逐词拆解并标出考点(中英对照)。
goldenrod




Goldenrod is a noun referring to a group of wildflower plants in the genus Solidago. They are named for their bright golden-yellow flowers arranged in clusters on tall stems.
In simple terms: goldenrod = tall yellow wildflowers commonly seen in late summer and fall
Goldenrod is insect-pollinated, not wind-pollinated.
⚠️ Common misconception: Goldenrod does NOT cause hay fever.
Goldenrod:
If the excerpt is just “goldenrod”, it usually functions as:




goldenrod 指 一枝黄花,是一类常见的 野生开花植物,学名属 Solidago。
名字直译为“金色的枝条”,形容它 金黄色的花序。
简单说: goldenrod = 一枝黄花,秋季常见的黄色野花
一枝黄花是 虫媒花:
⚠️ 常见误区: 一枝黄花不是导致花粉过敏的元凶。 真正引起秋季过敏的多是 豚草(ragweed),因为它是 风媒花。
如果只看到 goldenrod,通常表示:
If you want, you can paste the 完整句子或课文段落(例如 Science 10 关于授粉或生态系统的内容),我可以帮你 逐句拆解 + 标出考点(中英对照)。
asters




Asters is the plural noun of aster. It refers to a group of wildflower plants in the daisy family (Asteraceae), known for their star-shaped flowers.
The word aster comes from Greek, meaning “star”, describing the flower’s shape.
In simple terms: asters = star-shaped wildflowers, often blooming in late summer and fall
Flower shape: Looks like a single flower but is actually a composite flower head
Disk florets in the center
Asters are typically insect-pollinated:
This makes asters ecologically important in temperate ecosystems.
Asters:
If the excerpt is just “asters”, it usually functions as:


![]()

asters 是 aster(紫菀) 的复数形式,指一类 野生开花植物,属于 菊科。
“aster” 一词源自希腊语,意思是 “星星”,形容花朵呈放射状。
简单说: asters = 紫菀类野花,花形像星星
复合花序:看起来像一朵花,实际上由很多小花组成
中央是 管状花
紫菀属于 虫媒花:
如果只看到 asters,通常表示:
If you want, you can paste the 完整句子或课文段落(例如 Science 10 的野花或授粉内容),我可以帮你 逐词拆解 + 标注考点(中英对照)。
wind- and insect-pollinated wildflowers




This phrase describes wildflowers (naturally growing flowering plants) that are pollinated in two different ways:
So, the phrase groups wildflowers by their pollination method.
In simple terms: wind- and insect-pollinated wildflowers = wildflowers that rely on wind or insects to transfer pollen
Pollination is the transfer of pollen from the male parts of a flower (anthers) to the female parts (stigma). This process is necessary for:
Wind-pollinated plants do not rely on animals, so they usually have:
Examples (general): grasses, some meadow plants.
👉 Strategy: quantity over attraction — release lots of pollen and let the wind do the work.
Insect-pollinated plants attract insects, so they usually have:
Examples: daisies, lupines, buttercups.
👉 Strategy: attraction and precision — fewer pollen grains, but targeted delivery.
| Feature | Wind-pollinated | Insect-pollinated | | ---------- | --------------- | ----------------- | | Petals | Small / dull | Bright / showy | | Scent | None | Often strong | | Nectar | None | Present | | Pollen | Light, abundant | Sticky, heavier | | Efficiency | Low precision | High precision |
When you see “wind- and insect-pollinated wildflowers”, the text is usually:

![]()


这个短语指的是 野生开花植物(野花),它们通过 两种方式之一 完成授粉:
简单说: 风媒和虫媒授粉的野花
授粉 是指 花粉从雄蕊传到雌蕊 的过程,是植物:
风媒植物通常具有:
👉 依赖自然风力,而不是动物。
虫媒植物通常具有:
👉 依靠昆虫的“精准投递”。
| 特征 | 风媒花 | 虫媒花 | | ---- | ----- | ---- | | 花瓣 | 小、不显眼 | 鲜艳 | | 气味 | 无 | 有 | | 花蜜 | 无 | 有 | | 花粉 | 多、轻 | 少、黏 | | 授粉方式 | 随机 | 高效精准 |
看到 “wind- and insect-pollinated wildflowers”,通常是在:
If you want, paste the 完整句子或课文段落(例如 Science 10 生态系统或阅读理解),我可以帮你 逐句拆解 + 标出考点(中英对照)。
sticky




Sticky is an adjective describing something that clings, adheres, or sticks to other things when touched. It usually involves a tacky or glue-like surface.
In simple terms: sticky = able to stick to things
Something described as sticky often has one or more of these features:
Examples:
Here, sticky often describes an adaptation for protection or feeding.
Sticky can also describe difficult or awkward situations:
Meaning: hard to deal with or escape from.
Related words:
stick (verb)
Example sentences:
| Word | Difference | | ---------- | ---------------------- | | sticky | general, everyday term | | adhesive | more technical | | tacky | lightly sticky | | gluey | thick, glue-like |



sticky 是一个形容词,意思是 黏的、粘的、容易粘住东西的。
简单说: sticky = 有黏性的
被形容为 sticky 的东西通常:
例子:
sticky 也可形容 棘手、难处理的情况:
例句:
| 英文 | 中文差别 | | -------- | --------- | | sticky | 通用“黏” | | adhesive | 技术性“有黏附力” | | tacky | 微黏 | | gluey | 像胶水一样黏 |
If you want, you can paste the 完整句子或课文 where sticky appears (例如 Science 10 的生态系统描述), and I’ll explain 它在该语境中的准确含义和考点(中英对照)。
beaks





Beaks is the plural noun of beak. A beak is the hard, pointed mouthpart of birds (and some other animals) used for eating, grasping, grooming, building nests, and defense.
In simple terms: beaks = the hard mouths of birds
Beak shape reflects what and how an animal eats:
| Beak type | Shape | Function | | ------------- | ------------- | --------------------------------- | | Short & thick | Cone-shaped | Cracking seeds (e.g., finches) | | Long & thin | Needle-like | Sipping nectar / catching insects | | Hooked | Sharp, curved | Tearing meat (hawks, owls) | | Flat & broad | Wide | Filtering food from water (ducks) |
This is a classic example of adaptation and natural selection.
While most common in birds, beaks are also found in:
If the excerpt is just “beaks”, it usually functions as:
![]()




beaks 是 beak(喙) 的复数形式。 喙 是鸟类坚硬的口器,用于 进食、抓取、整理羽毛、筑巢、防御 等。
简单说: beaks = 鸟的嘴(喙)
不同鸟类的喙形状不同,是为了适应不同食物:
| 喙形 | 功能 | | ---- | ----------- | | 短而厚 | 啄碎种子 | | 细而长 | 吸食花蜜 / 捕捉昆虫 | | 弯钩状 | 撕裂肉类 | | 扁平宽阔 | 过滤水中食物 |
这是生物课中常见的 适应性进化 例子。
除了鸟类,一些动物也有喙:
如果只看到 beaks,通常是:
If you want, paste the full sentence, diagram, or exam question where beaks appears, and I’ll explain its exact role and why it matters in that context(中英对照).
tree bark


![]()

Tree bark is the outer protective covering of a tree’s trunk, branches, and roots. It is not wood; it’s the layer outside the wood that shields the tree from harm.
In simple terms: tree bark = the tree’s “skin”
Bark usually includes two main parts:
Outer bark
Dead tissue
Inner bark (phloem)
Living tissue
Together, these layers keep the tree alive and protected.
Tree bark helps a tree by:
Bark appearance varies widely:
These differences help with tree identification.
Tree bark:
If you see “tree bark” by itself, it usually functions as:



tree bark 指 树皮,是覆盖在树干、树枝和根部最外层的组织。
简单说: tree bark = 树的“皮肤”
树皮一般包括两部分:
外树皮
多为死亡组织
内树皮(韧皮部)
活组织
树皮的功能包括:
这些差异常用于 识别树种。
树皮为生态系统提供:
如果只看到 tree bark,通常是:
If you want, you can paste the full sentence or diagram where tree bark appears (for example in a Science 10 forest ecosystem passage), and I’ll explain why it’s mentioned and what concept it supports, in both English and Chinese.
Ferns



Ferns is the plural noun of fern. It refers to a group of non-flowering, seedless plants that reproduce by spores, not seeds.
In simple terms: ferns = leafy plants with fronds that reproduce by spores instead of flowers or seeds
Ferns have alternation of generations:
This explains why ferns are common near forests, streams, and wetlands.
| Feature | Ferns | Mosses | Seed plants | | --------------- | ---------- | ----------- | ----------- | | Vascular tissue | ✅ Yes | ❌ No | ✅ Yes | | Flowers | ❌ No | ❌ No | ✅ (many) | | Seeds | ❌ No | ❌ No | ✅ Yes | | Reproduction | Spores | Spores | Seeds | | Dominant stage | Sporophyte | Gametophyte | Sporophyte |
If you see “Ferns” by itself, it is usually:
![]()



Ferns 是 fern(蕨类植物) 的复数形式,指 蕨类植物。 蕨类 不开花、不结果、不结种子,而是通过 孢子 繁殖。
简单说: ferns = 通过孢子繁殖的蕨类植物
蕨类具有 世代交替:
这也是蕨类多分布在潮湿地区的原因。
| 特点 | 蕨类 | 苔藓 | 种子植物 | | ---- | --- | --- | ---- | | 输导组织 | ✅ | ❌ | ✅ | | 种子 | ❌ | ❌ | ✅ | | 花 | ❌ | ❌ | ✅ | | 显性世代 | 孢子体 | 配子体 | 孢子体 |
如果只出现 Ferns,通常表示:
If you want, you can paste the 完整句子、图表或考试题 where Ferns appears, and I’ll explain 它在该语境中的具体含义和考点(中英对照)。
lichens




Lichens is the plural noun of lichen. Lichens are not single organisms; they are a partnership (symbiosis) between:
In simple terms: lichens = organisms made of fungus + algae living together
Lichens can grow in extreme environments:
Lichens are very important because they:
If the excerpt is just “lichens”, it usually functions as:
![]()



lichens 是 lichen(地衣) 的复数形式。 地衣不是单一生物,而是由 真菌 + 藻类(或蓝藻) 共同组成的 共生体。
简单说: lichens = 真菌和藻类共同生活形成的生物
地衣能生活在 极端环境 中:
地衣的作用包括:
如果只看到 lichens,通常表示:
If you want, you can paste the full sentence or diagram (for example from a Science 10 forest ecosystem passage), and I’ll explain exact meaning, function, and why lichens are mentioned there, in both English and Chinese.
draped




Draped is the past tense / past participle of the verb drape. It means hung loosely or laid over something so that it falls naturally in folds.
In simple terms: draped = loosely hung or spread over something
The key idea behind draped is gravity + softness:
Here, draped emphasizes gentle coverage, not force or damage.
This use creates a soft, atmospheric image.
Passive voice: often used
“The hills were draped in fog.”
Common patterns:
| Word | Difference | | ----------- | --------------------------- | | draped | loose, natural, soft | | covered | neutral, no texture implied | | wrapped | tight, enclosed | | coated | thin layer, often liquid |




draped 是动词 drape 的 过去式/过去分词,意思是: 松散地垂挂、披着、覆盖在……之上,通常是 自然下垂的状态。
简单说: draped = 垂挂着的、披着的
“draped” 强调的是:
这里的 draped 强调 轻柔覆盖,不是压断或破坏。
| 英文 | 中文区别 | | ---------- | ----------- | | draped | 垂挂、披着(柔软自然) | | covered | 覆盖(中性) | | wrapped | 包裹(紧) | | coated | 涂层(薄层) |
If you want, you can paste the full sentence where draped appears (for example in a Science 10 forest ecosystem passage), and I’ll explain why this word is chosen instead of “covered”, in both English and Chinese.
Mosses




Mosses is the plural noun of moss. It refers to a group of small, non-vascular plants (called bryophytes) that do not produce flowers or seeds and instead reproduce by spores.
In simple terms: mosses = tiny green plants that grow in damp places and reproduce by spores
Mosses show alternation of generations, but with a key difference from ferns:
| Feature | Mosses | Ferns | Seed plants | | --------------- | ----------- | ---------- | ----------- | | Vascular tissue | ❌ No | ✅ Yes | ✅ Yes | | Seeds | ❌ No | ❌ No | ✅ Yes | | Flowers | ❌ No | ❌ No | ✅ (many) | | Dominant stage | Gametophyte | Sporophyte | Sporophyte | | Typical size | Very small | Medium | Large |
Mosses play major roles in ecosystems:
If the excerpt is just “Mosses”, it most likely functions as:




Mosses 是 moss(苔藓) 的复数形式,指 苔藓植物。 苔藓是 非维管植物,不开花、不结种子,通过 孢子繁殖。
简单说: mosses = 苔藓植物
苔藓具有 世代交替,但特点是:
这就是苔藓多分布在 阴湿环境 的原因。
| 特点 | 苔藓 | 蕨类 | 种子植物 | | ---- | --- | --- | ---- | | 输导组织 | ❌ | ✅ | ✅ | | 种子 | ❌ | ❌ | ✅ | | 花 | ❌ | ❌ | ✅ | | 显性世代 | 配子体 | 孢子体 | 孢子体 | | 体型 | 很小 | 中等 | 大 |
如果只看到 Mosses,通常表示:
If you want, paste the full sentence、图表或考试题 where Mosses appears, and I’ll explain 它在该语境中的确切含义与考点(中英对照)。
Sitka spruce




Sitka spruce is a large evergreen coniferous tree (Picea sitchensis) native to the Pacific Northwest coast of North America—from Alaska down through British Columbia to California.
In simple terms: Sitka spruce = a very tall coastal evergreen tree common in wet, cool climates.
If you see “Sitka spruce” alone, it’s usually:
Example sentence:



Sitka spruce 指 西加云杉(学名 Picea sitchensis),是一种生长在 北美太平洋沿岸 的 大型常绿针叶树。
简单说: Sitka spruce = 太平洋沿岸常见的高大云杉树
如果单独出现 Sitka spruce,通常是:
If you want, you can paste the full sentence or diagram (for example from a Science 10 or BC ecosystems reading), and I’ll explain why Sitka spruce is mentioned there and what point it supports, in both English and Chinese.
blue jays




Blue jays is the plural noun of blue jay, a medium-sized songbird native to North America. They are famous for their bright blue feathers, white chest, black markings, and loud, intelligent behavior.
In simple terms: blue jays = bright blue birds known for loud calls and high intelligence.
Blue jays are important to ecosystems because they:
If the excerpt is just “blue jays”, it most likely functions as:



![]()
blue jays 是 blue jay(蓝松鸦) 的复数形式,指一种生活在 北美 的鸟类,以 鲜艳的蓝色羽毛 和 吵闹、聪明 而著名。
简单说: blue jays = 蓝松鸦,蓝色、聪明、声音很大的鸟
如果只看到 blue jays,通常表示:
If you’d like, paste the full sentence or paragraph (for example from a Science 10 ecosystem or food-web passage), and I’ll explain their exact role and meaning in context in both English and Chinese.
chipmunks




Chipmunks is the plural noun of chipmunk. It refers to small, ground-dwelling rodents in the squirrel family, easily recognized by their striped backs and cheek pouches.
In simple terms: chipmunks = small striped rodents that store food in their cheeks.
Chipmunks play useful roles in ecosystems:
If the excerpt is just “chipmunks”, it likely functions as:




chipmunks 是 chipmunk 的复数形式,指 花栗鼠,是一种体型很小、背部有明显条纹的啮齿动物。
简单说: chipmunks = 花栗鼠,小型、有条纹、会囤食物的动物
如果只看到 chipmunks,通常表示:
If you want, you can paste the full sentence or paragraph (for example from a Science 10 ecosystem or food-web passage), and I’ll explain the exact role and meaning in context in both English and Chinese.
cougars
![]()



Cougars is the plural noun of cougar. It refers to a large wild cat species found mainly in the Americas. The same animal is also commonly called mountain lion, puma, or panther (regional names).
In simple terms: cougars = large wild cats that are powerful predators.
Cougars live in a wide range of environments:
They are found from Canada through the western United States and down into South America. In British Columbia, cougars are part of the natural ecosystem.
Cougars are apex predators:
If the excerpt is just “cougars”, it most likely functions as:




cougars 是 cougar 的复数形式,指 美洲狮(也常称为 山狮 / 美洲豹猫 / puma,不同地区叫法不同)。
简单说: cougars = 美洲狮,一种大型野生猫科动物
美洲狮分布范围很广,包括:
在 加拿大(包括 BC),美洲狮是重要的本地大型捕食者。
美洲狮属于 顶级捕食者:
如果只看到 cougars,通常表示:
If you want, you can paste the full sentence or paragraph (for example from a Science 10 food web or forest ecosystem passage), and I’ll explain exact meaning, ecological role, and exam-relevant points in both English and Chinese.
skunks




Skunks is the plural noun of skunk. It refers to small to medium-sized mammals best known for their ability to spray a strong-smelling liquid as a defense.
In simple terms: skunks = animals famous for their black-and-white coloring and strong defensive smell.
Skunks do not spray immediately. They usually:
The spray can reach several meters and is very hard to remove.
Skunks are common in North and Central America and often live:
They adapt well to human environments.
If the excerpt is just “skunks”, it most likely functions as:

![]()


skunks 是 skunk(臭鼬) 的复数形式,指 臭鼬 这种哺乳动物。 臭鼬以 能喷射强烈臭味的防御液体 而闻名。
简单说: skunks = 臭鼬,多为黑白相间、以臭味防身的动物
臭鼬通常 不会立刻喷臭液,而是先警告:
臭鼬主要分布在 北美和中美洲,在加拿大十分常见,常见于:
如果只出现一个词 skunks,通常是:
If you want, you can paste the full sentence or paragraph (for example from a Science 10 ecosystem passage), and I’ll explain how skunks fit into that context, in both English and Chinese.
breakage




Breakage is a noun that refers to the act, process, or result of breaking. It describes damage caused when something breaks, often into pieces or becomes structurally weakened.
In simple terms: breakage = the damage or loss caused by something breaking.
Breakage can apply to many contexts:
Plants & nature:
“Snow causes branch breakage.” (Branches snap under heavy snow.)
Materials & objects:
“Glass breakage during transport.”
Hair & fibers:
“Hair breakage from dryness or heat.”
Engineering & science:
“Metal breakage due to stress or fatigue.”
👉 The focus is usually on damage or loss, not the action itself.
Related forms:
break (verb)
Common patterns:
Example sentences:
In biology or geography texts, breakage often appears when explaining adaptations:
So here, breakage means structural damage caused by external forces.




breakage 是一个名词,指 破损、断裂、损坏,强调 由于“断裂”而造成的结果或损失。
简单说: breakage = 因断裂造成的损坏
自然 / 植物:
积雪造成树枝 折断
物品运输:
玻璃制品的 破损率
日常生活:
头发 断裂
工程 / 科学:
材料在应力下发生 断裂
👉 重点不是“打断这个动作”,而是 断了之后的损坏状态。
常见搭配:
减少 / 防止 breakage
例句:
在科学课中,breakage 常用于解释 适应性结构:
If you want, you can paste the full sentence or paragraph where breakage appears (for example in a Science 10 ecosystem passage), and I’ll explain the exact meaning and why it’s used there, in both English and Chinese.
shed




Shed can be both a verb and a noun, with related but different meanings.
To shed means to let something fall off, drop, or be removed naturally, often as part of a normal process.
Common uses:
👉 Core idea: losing or getting rid of something, often naturally or gradually.
A shed (noun) is a small, simple building, usually for storage, such as a garden shed or tool shed.
Example:
Verb forms of “shed” are irregular but simple:
Examples:
In science texts (like Science 10), shed usually means:
Example:
| Form | Meaning | Example | | --------------- | -------------- | ----------------------- | | shed (verb) | lose naturally | “Snakes shed skin.” | | shed (noun) | small building | “A tool shed.” |




shed 既可以是动词,也可以是名词。
shed 的意思是:自然脱落、脱去、摆脱,通常指一个自然或周期性过程。
常见用法:
例句:
👉 核心含义:自然地失去或去掉某物。
shed(名词)指 小棚子 / 储物间,常用于放工具或杂物。
例句:
shed 的动词三种形式相同:
例:
在生物或地理课中,shed 常用于说明适应环境:
If you want, you can paste the full sentence where shed appears (for example, “deciduous trees shed their leaves”), and I’ll explain exact meaning + exam-relevant reasoning in both English and Chinese.
mosses




Mosses is the plural noun of moss. It refers to a group of small, non-vascular plants (called bryophytes) that do not have true roots, stems, or leaves and do not produce flowers or seeds.
In simple terms: mosses = tiny green plants that grow in damp places and reproduce by spores.
Mosses share several defining features:
Mosses show alternation of generations, but unlike ferns:
This explains why mosses thrive in wet, shady environments.
| Feature | Mosses | Ferns | Seed plants | | --------------- | ----------- | ---------- | ----------- | | Vascular tissue | ❌ No | ✅ Yes | ✅ Yes | | Seeds | ❌ No | ❌ No | ✅ Yes | | Flowers | ❌ No | ❌ No | ✅ (many) | | Dominant stage | Gametophyte | Sporophyte | Sporophyte | | Size | Very small | Medium | Large |
Mosses play major roles in ecosystems:
If the excerpt is just “mosses”, it is most likely:




mosses 是 moss(苔藓) 的复数形式,指 苔藓植物。 苔藓属于 非维管植物,不开花、不结种子,通过 孢子繁殖。
简单说: mosses = 苔藓植物
苔藓也有 世代交替,但特点是:
因此苔藓多见于 阴湿环境。
| 特点 | 苔藓 | 蕨类 | 种子植物 | | ---- | --- | --- | ---- | | 输导组织 | ❌ | ✅ | ✅ | | 种子 | ❌ | ❌ | ✅ | | 花 | ❌ | ❌ | ✅ | | 显性世代 | 配子体 | 孢子体 | 孢子体 | | 体型 | 很小 | 中等 | 大 |
苔藓在生态系统中非常重要:
如果只看到 mosses,通常表示:
If you’d like, you can paste the full sentence, diagram, or exam question where mosses appears, and I’ll explain exact meaning, comparison points, and test-relevant details in both English and Chinese.
ferns



Ferns is the plural noun of fern. It refers to non-flowering, seedless plants that reproduce using spores rather than seeds.
In simple terms: ferns = leafy plants that do not produce flowers or seeds.
Ferns are easy to recognize because they share several features:
Ferns have a two-stage life cycle called alternation of generations:
If the excerpt is just “ferns”, it likely functions as:
![]()



ferns 是 fern(蕨类植物) 的复数形式,指 蕨类植物。 蕨类植物是 不结果、不开花,而是通过 孢子繁殖 的植物。
简单说: ferns = 蕨类植物
蕨类植物具有 世代交替:
因此蕨类常见于潮湿地区。
如果只看到一个词 ferns,通常是:
If you want, you can paste the full sentence or diagram (for example from Science 10 or a forest ecosystem passage), and I’ll explain exact meaning, function, and exam-relevant points in both English and Chinese.
birch




Birch is a noun referring to a type of deciduous tree in the genus Betula. Birch trees are well known for their light-colored (often white or silvery) bark and are common in cool and temperate regions, including Canada.
In simple terms: birch = a deciduous tree with pale bark that sheds its leaves seasonally.
Each type shares the same general features but differs slightly in bark color and habitat.
Birch trees:
If the excerpt is just “birch”, it most likely functions as:



birch 是一个名词,指 桦树(桦木),属于 桦木属(Betula),是一种常见的 落叶乔木,在加拿大非常普遍。
简单说: birch = 桦树,一种树皮浅色、会季节性落叶的树
如果只看到一个词 birch,通常是:
If you want, you can paste the full sentence or paragraph where birch appears (for example, a Science 10 reading about Canadian forests), and I’ll explain the exact meaning and exam-relevant nuance in both English and Chinese.
Reptiles
![]()



![]()
Reptiles is the plural noun referring to animals in the class Reptilia. They are cold-blooded (ectothermic) vertebrates that typically have dry, scaly skin and reproduce mainly by laying eggs (with some exceptions).
In simple terms: reptiles = snakes, lizards, turtles, crocodiles, and their relatives.
Most reptiles share these features:
| Feature | Reptiles | Amphibians | | ---------- | ----------------- | ----------------------------- | | Skin | Dry, scaly | Moist, smooth | | Eggs | On land (usually) | In water | | Body temp | Cold-blooded | Cold-blooded | | Life cycle | No larval stage | Metamorphosis (e.g., tadpole) |
Reptiles live on every continent except Antarctica, from deserts to forests to wetlands. Many bask in the sun to warm up, and some hibernate or brumate in cold seasons.
If the excerpt is just “Reptiles”, it most likely functions as:




Reptiles 是 爬行动物 的复数形式,指 爬行动物纲(Reptilia) 的动物。 它们是 变温动物,皮肤通常 干燥、有鳞片,大多数 产卵繁殖。
简单说: reptiles = 蛇、蜥蜴、龟、鳄鱼等动物。
| 特征 | 爬行动物 | 两栖动物 | | ---- | ----- | ----- | | 皮肤 | 干燥、有鳞 | 湿润、光滑 | | 产卵 | 多在陆地 | 多在水中 | | 生命周期 | 无变态 | 有变态 | | 呼吸 | 肺 | 幼体可用鳃 |
爬行动物分布广泛(除南极洲外),常通过 晒太阳 来升高体温,寒冷季节可能进入 冬眠/蛰伏(brumation)。
如果只看到一个词 Reptiles,通常是:
If you want, paste the full sentence or paragraph where Reptiles appears (e.g., a Science 10 text or exam question), and I’ll explain its exact meaning and role in that context in both English and Chinese.
camouflages




Camouflages is the third-person singular form of the verb camouflage.
Example:
To camouflage means reducing visibility by matching:
So camouflages emphasizes the function: something actively makes another thing harder to see.
a) Biology / ecology
b) Military / equipment
c) Figurative / abstract use
Sentence pattern:
Subject + camouflages + object + (environment)
Example:



camouflages 是动词 camouflage(伪装) 的 第三人称单数形式。
意思是: 通过与环境融为一体来隐藏、掩护、伪装某物 常译为:伪装、掩盖、使不易被发现
例句:
“camouflages” 强调 功能性隐藏,方式包括:
本质是: 👉 让观察者“看不出来”
① 生物 / 自然科学
② 军事 / 技术
③ 比喻用法
常见结构:
主语 + camouflages + 宾语 + 环境
例:
If you want, you can paste the full sentence or paragraph where camouflages appears (for example, a biology passage describing animals like caribou or spruce forests), and I’ll explain exact meaning, grammar role, and nuance in both English and Chinese.
waxy
![]()



Waxy is an adjective meaning like wax or covered with wax. It describes something that has the look, feel, or properties of wax—for example, being smooth, slightly shiny, slippery, or water-repellent.
In simple terms: waxy = having a wax-like surface or quality.
When something is described as waxy, it often has one or more of these traits:
a) Biology & plants
Many leaves have a waxy coating (cuticle) that reduces water loss.
“The leaves are waxy, helping the plant survive dry conditions.”
b) Food & materials
c) Medicine & appearance
d) Figurative / descriptive language
Example sentences:




waxy 是一个形容词,意思是 像蜡一样的 / 有蜡质的 / 覆盖着蜡的。 常用来形容表面或外观 光滑、略有光泽、防水、滑溜。
简单说: waxy = 蜡状的、蜡质的。
被形容为 waxy 的东西,通常具有:
① 生物 / 植物
许多植物叶片有 蜡质层,用于减少水分蒸发。
“这种植物的叶子是 蜡质的。”
② 食物 / 材料
③ 医学 / 外貌描写
④ 比喻 / 文学用法
相关词:
wax(蜡,名词)
例句:
If you’d like, you can paste the full sentence or passage where waxy appears (science text, poem, exam question, etc.), and I’ll explain the exact meaning and nuance in that context in both English and Chinese.
spruce



Spruce is a noun referring to a type of evergreen coniferous tree in the genus Picea. Spruce trees are common in cold and temperate regions, especially across Canada, Alaska, Scandinavia, and northern Europe.
In simple terms: spruce = a tall evergreen tree with sharp needles and hanging cones.
| Tree | Needles | Cones | Texture | | ---------- | --------------------- | ------------- | --------------- | | Spruce | Sharp, single needles | Hang down | Rough branches | | Fir | Flat, soft needles | Stand upright | Smooth branches | | Pine | Needles in bundles | Hang down | Long needles |
This distinction is very common in biology and forestry texts.
Besides the tree, spruce can also be used as:
Adjective: spruce = neat, tidy, smart-looking
“He looks very spruce in his uniform.”
Verb (phrasal): spruce up = clean, decorate, or improve appearance
“They spruced up the classroom.”
These meanings come from an older sense of “spruce” meaning lively or elegant.
If the excerpt is just “spruce” by itself (like your example), it most likely functions as:




spruce 是一个名词,指 云杉,一种常见的 常绿针叶树,属于 云杉属(Picea)。 在加拿大和北欧地区非常常见。
简单来说: spruce = 云杉,一种高大的常绿针叶树。
| 树种 | 针叶 | 球果 | 手感 | | -------------- | ---- | ---- | ---- | | 云杉(spruce) | 尖、单根 | 下垂 | 枝条粗糙 | | 冷杉(fir) | 扁平柔软 | 向上直立 | 枝条光滑 | | 松树(pine) | 成束 | 下垂 | 针叶较长 |
除了“云杉”,spruce 还有引申用法:
形容词: spruce = 整洁的、精神的
他穿着制服,看起来很 精神。
短语动词: spruce up = 收拾、装饰、打扮一下
把教室 收拾得更好看一些。
如果只看到一个词 spruce,通常是:
If you want, you can paste the full sentence or surrounding paragraph (for example from a poem, biology text, or exam question), and I’ll explain exact meaning, tone, and implication in both English and Chinese.
Caribou
Caribou is the common name for a large, hoofed, deer-like animal in the species Rangifer tarandus. In many parts of the world, the same species is called reindeer.
So in simple terms: Caribou = a large northern deer, closely related to (and often the same as) reindeer.
Think of it like a naming convention:
Caribou is commonly used as both singular and plural:
Sometimes you’ll see caribous, but caribou is more common.
If the excerpt is just “Caribou” by itself (like your example), it most likely serves as:
Caribou 指一种生活在寒带/亚寒带的大型鹿科动物,中文常译为: 北美驯鹿 / 北美驯鹿(野生型) / 卡里布驯鹿(不同资料翻译略有差异)
它和 “reindeer(驯鹿)” 在生物学上通常是 同一种动物(同一物种:Rangifer tarandus),只是不同地区习惯叫法不同。
简单记忆:
英语里 caribou 常同时当 单数和复数:
也可能看到 caribous,但更常见还是 caribou。
如果只出现一个词 Caribou(没有句子),它很可能是:
大致读作:“开-ri-bu”(CARE-ih-boo)
If you paste the sentence or paragraph around “Caribou”, I can explain the exact meaning, including whether it’s literal (the animal) or symbolic/metaphorical in that context, in both English and Chinese.
hares
Hares is the plural of hare — a type of wild mammal that looks similar to a rabbit. So the excerpt is simply naming the animal: “hares” = more than one hare.
People often mix up hares and rabbits, but they’re not the same animal.
Babies:
Baby hares (called leverets) are born with fur and eyes open, and can move soon after birth.
Depending on the context, “hares” may do different jobs in a sentence:
Since your excerpt is only one word with no sentence, it most likely functions as a label/title or a standalone noun.
hares 是 hare(野兔) 的 复数形式,表示 不止一只野兔。 所以这段摘录只有一个词,本身就是在说:“野兔们 / 多只野兔”。
很多人会把 hare 和 rabbit 都叫“兔子”,但它们在生物学和习性上不同:
幼崽差异:
野兔幼崽(leveret)出生时通常 有毛、睁眼、较快能活动;
因为你给的摘录只有一个词,没有句子,所以它可能是:
如果放进句子里,它也可以当名词使用:
If you tell me where this word appears (a sentence, poem, biology text, idiom like “Mad as a March hare,” etc.), I can explain the exact meaning in that context in both languages.
Write 4 or 5 sentences about things you can do, but make one of them false.
1) I can translate text between over 100 different languages in just a few seconds.
2) I am able to generate high-quality images based on any description you give me.
3) I can feel physical emotions like happiness or sadness when we talk.
4) I am able to write complex computer code in languages like Python, C++, and JavaScript.
5) I can summarize a 500-page book into a few short paragraphs almost instantly.
3. They _________ do it today.
can't \ can
3. They _________ do it today.
can't
3. They _________ do it today.
can
• Corporele bezitsverschaffing (art. 3:114 BW):
het verschaffen van bezit door een feitelijke, lichamelijke overdracht van een zaak
Introducción
En términos globales, a la introducción le falta desarrollo del problema desde la literatura y desde la documentación de la universidad (reglamentos, modelo educativo, políticas de acceso, etc,). Es importante hacer esa revisión para contextualizar el problema: el análisis de la distribución de notas, su variación en tiempo (cuando se pueda) y los factores asociados. Haciendo un mejor contexto, podemos definir mejor la pregunta y guíar los análisis.
En la prensa, durante estos días apareció algo sobre esta discusión: - https://www.lun.com/Pages/NewsDetail.aspx?dt=2025-12-30&NewsID=556972&BodyID=0&PaginaId=3
Además, hay fuentes internacionales interesantes con las que iniciar: - https://www.tandfonline.com/doi/full/10.1080/02602938.2020.1795617 - https://www.tandfonline.com/doi/full/10.1080/03075079.2025.2470297#:~:text=Generally%2C%20grade%20inflation%20is%20harmful,;%20Yeritsyan%20and%20Mjelde%202024).
Reviewer #1 (Public review):
Summary:
A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. This study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.
Strengths and Weaknesses:
For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi mediated knockdown, acute Crispr-Cas9 knock-outs and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.
The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase of wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein which contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.
Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.
Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments will need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable to interact with specific axons.
Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling the visualization of Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.
However, the points raised above remain at present technically difficult to address because of the lack of appropriate genetic reagents. Also more detailed electron microscopy analyses of early developmental stages and comparisons of effects on cell bodies compared to branches will be very labor-intensive, and indeed may represent a new study.
In summary, in light of the importance of correct ensheathment of axons by glia for neuronal function, the proposed model for the interactions between Htl, Uif and N to control the correct extent of neuron and glial contacts will be of general interest to the glial biology community.
Comments on revisions:
The authors have addressed all my comments. However, the sgRNAs in the Star method table are still all for cleavage just before the transmembrane domain, while the Supplemental figure suggests different locations.
Author response:
The following is the authors’ response to the current reviews.
We would like to proceed with this paper as a Version of Record but we will correct the mistake that we made in the Key resources table. As the reviewer noted we had added the wrong guide RNA sequence here. We are super thankful to the reviewer and apologize for the mistake.
The following is the authors’ response to the original reviews.
eLife Assessment
This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality.
We are thankful for this kind and very positive judgment.
However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion.
This is now addressed
Reviewer #1 (Public review):
Summary:
A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.
Strengths and Weaknesses:
For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.
The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.
Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.
Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons.
We absolutely agree with the reviewer that it would be fantastic to understand whether and how Uif could stabilize specific membrane domains that are capable of interacting with axons. To address this we need to be able to label such membrane domains and unfortunately we still cannot do so. We analyzed the distribution of PIP2/PIP3 but failed to detect any differences. Thus we still lack wrapping glial membrane markers that are able to label specific compartments.
Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.
In the revised version of the manuscript we have now included the use of Notch and RTK-signaling reporters.
(1) reporters for N and Htl signaling in larvae,
We had already employed the classic reporter generated by the Bray lab: Gbe-Su(H)-lacZ. This unfortunately failed to detect any activity in larval wrapping glia nuclei but was able to detect Notch activity in the adult wrapping glia (Figure S5C,F).
We did, as requested, the analysis of a RTK signaling reporter. The activity of sty-lacZ that we had previously characterized in the lab (Sieglitz et al., 2013) increases by 22% when Notch is silenced. Given the normal distribution of the data points, this shows a trend which, however, is not in the significance range. We have not included this in the paper, but would be happy to do so, if requested.
Author response image 1.
(2) monitoring of different stages at a time point when branch extension begins,
The reviewer asks for an important question; however, this is extremely difficult to tackle experimentally. It would require a detailed electron microscopic analysis of early larval stages which cannot be done in a reasonable amount of time. We have however added additional information on wrapping glia growth summarizing recently published work from the lab (Kautzmann et al., 2025).
(3) a reagent enabling to visualize Uif expression could be important next tools/approaches.
The final comment of the reviewer also addresses an extremely relevant and important issue. We employed antibodies generated by the lab of R. Ward, but they did not allow detection of the protein in larval nerves. We also attempted to generate anti-Uif peptide antibodies but these antibodies unfortunately do not work in tissue. We are still trying to generate suitable reagents but for the current revision cannot offer any solution.
Lastly, we agree with the reviewer that it would be worthwhile to explore how Uif controls membrane formation at the subcellular level. This, however, is a completely new project and will require the identification of the binding partners of Uif in wrapping glia to start working on a link between Uif and membrane extension. The reduced branching phenotype might well be a direct consequence of excessive membrane formation as it likely blocks recourses needed for efficient growth of glial processes.
Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community.
We are very grateful for this very positive comment.
Reviewer #2 (Public review):
The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons.
Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.
Reviewer #1 (Recommendations for the authors):
(1) To be reproducible and understandable, it would be important to provide detailed information about crosses and genotypes, as reagents are currently listed individually and genotypes are provided in rather simplified versions.
We have added the requested information to the text.
(2) Neurons are inherently resistant to RNAi-mediated knockdown and it thus may be necessary to introduce the over-expression of UAS-dcr2 when assessing neuronal requirements and to specifically exclude Delta or Serrate as ligands.
We agree with the reviewer and have repeated the knockdown experiments using UAS-dcr2 and obtained the same results. To use an RNAi independent approach we also employed sgRNA expression in the presence of Cas9. The neuron specific gene knockout also showed no glial wrapping phenotype. These results are now added to the manuscript.
(3) Throughout the manuscript, the authors use the terms "growth" and "differentiation" referring to the extent of branch formation versus axon wrapping. However glial differentiation and growth could have different meanings (for instance, growth could implicate changes in cell size or numbers, while differentiation could refer to a change from an immature precursor-like state to a mature cell identity). It may thus be useful to replace these general terms with more specific ones.
This is a very good point. When we use the term “growth” we only infer on glial cell growth and thus, the increase in cell mass. Proliferation is excluded and this is now explicitly stated in the manuscript. The term “differentiation” is indeed difficult and therefore we changed it either directly addressing the morphology or to axon wrapping.
(4) Page 4. "remake" fibers should be Remak fibers.
We have corrected this typo.
(5) Page 5. "Heartless controls glial growth but does promote axonal wrapping", this sentence is not clear in its message because of the "but".
We have corrected this sentence.
(6) Generally, many gene names are used as abbreviations without introductions (e.g. Sos, Rl, Msk on page 7). These would require an introduction.
All genetic elements are now introduced.
(7) Page 8. When Cas9 is expressed ubiquitously ... It would be helpful to add how this is done (nsyb-Gal4, nrv2-Gal4, or another Gal4 driver are used to express UAS-Cas9, as the listed Gal4 drivers seem to be specific to neurons or glia?).
This now added. We used the following genotype for ubiquitous knockout using the four different uif specific sgRNAs (UAS-uif<sup>sgRNA X</sup>): [w; UAS-Cas9/ Df(2L)ED438; da-Gal4 /UAS-uif<sup>sgRNA X</sup>]. We used the following genotype for a glial knockout in wrapping glia ([+/+; UAS-Cas9/+; nrv2-Gal4,UAS-CD8::mCherry/UAS-uif<sup>sgRNA X</sup>].
We had previously shown that nrv2-Gal4 is a wrapping glia specific driver in the larval PNS (Kottmeier et al., 2020).
Moreover, the authors mention that "This indicates that a putatively secreted version of Uif is not functional". This conclusion would need to be explained in detail.
First, because it requires quite some detective work to understand the panels in Figure 1 on which this statement is based; second, since the acutely induced double-stranded breaks in the DNA and subsequent repair may cause variable defects, it may indeed be not certain what changes have been induced in each cell; and third considering that there is a putative cleavage site, would it be not be expected that the protein is not functional, when it is not cleaved, and there is no secreted extracellular part (unless the cleavage site is not required). The latter could probably only be addressed by rescue experiments with UAS transgenes with identified changes.
We agree with the reviewer. The rescue experiments are unfortunately difficult, since even expression of a full length uif construct does not fully rescue the uif mutant phenotype (Loubéry et al., 2014). We therefore explained the conclusion taken from the different sgRNA knockout experiments better and also removed the statement that secreted Uif forms are non-functional.
In the Star Method reagent table, it is not clear, why all 8 oligonucleotides are for "uif cleavage just before transmembrane domain" despite targeting different locations.
We are very sorry for this mistake and corrected it now. Thank you very much for spotting this.
(8) Page 13. However, we expressed activated Notch,... the word "when" seems to be missing, and it would be helpful to specify how this was done (over-expression of N[ICD].
We corrected it now accordingly.
(9) To strengthen the point similarity of phenotypes caused by Htl pathway over-activation and Uif over-expression, it would be helpful to also show an EM electron micrograph of the former.
We now added an extensive description of the phenotype caused by activated Heartless. This is shown as new Figure 2.
(10) Figure 4C, the larval nerve seems to be younger, as many extracellular spaces between axons are detected.
This perception is a misunderstanding and we are sorry for not explaining this better. The third instar larvae are all age matched. The particular specimen in Figure 4C shows some fixation artifacts that result in the loss of material. Importantly, however, membranes are not affected. Similar loss of material is also seen in Figure 6C. For further examples please see a study on nerve anatomy by (Kautzmann et al., 2025).
(11) The model could be presented as a figure panel in the manuscript. To connect the recommendation section with the above public review, a step forward could be to adjust the model and the wording in the Result section and to move some of the less explored points and thoughts to the discussion.
We are thankful for this advice and have moved an updated model figure to the end of the main text (now Figure 7).
Reviewer #2 (Recommendations for the authors):
(1) Screen and the interest in Uif: Out of the ~62 genes that came out of the RNAi screen, why did the authors prioritize and focus on Uif? What were the other genes that came out of the screen, and did any of those impinge on Notch signaling?
We have now more thoroughly described the results of the screen. We selected Uif as it was the only transmembrane // adhesion protein identified and given the findings that Uif decorate apical membrane domains in epithelial cells, we hoped to identify a protein specific for a similar membrane domain in wrapping glia.
Notch as well as its downstream transcription factors were not included in the initial screen, and were only analyzed, once we had seen the contribution of Notch. Interestingly, here is one single hit in our screen linked to Notch signaling: Gp150. Here however, we have tested additional dsRNA expressing lines and were not able to reproduce the phenotype. This information is added to the discussion.
The authors performed a large-scale screen of 2600 RNAi lines, it seems more details about what came out of the screen and why the focus on Uif would benefit the manuscript.
See above comment.
Relatedly, there would be a discussion of the limitations of the screen, and that it was really a screen looking to modify a gain-of-function phenotype from the activated Htl allele; it seems a screen of this design may lead to artifacts that may not reflect endogenous signaling.
We have now added a short paragraph on suppressor screens, employing gain of function alleles to the introduction.
“In Drosophila, such suppressor screens have been used successfully many times (Macagno et al., 2014; Rebay et al., 2000; Therrien et al., 2000). Possibly, such screens also uncover genes that are not directly linked to the signaling pathway under study but this can be tested in further experiments. Our screen led to the unexpected identification of the large transmembrane protein Uninflatable, which in epithelial cells localizes to the apical plasma membrane. Loss of uninflatable suppresses the phenotype caused by activated RTK signaling. In addition, we find that uif knockdown and uif knockout larvae show impaired glial growth while an excess of Uninflatable leads to the formation of ectopic wrapping membrane processes that, however, fail to interact with axons. uninflatable is also known to inhibit Notch. “
(2) In general this study relies on RNAi knockdown, and is generally well controlled in using multiple RNAi lines giving the same phenotype, and also controlled for by tissue-specific gene knockout. However, there is little in the way of antibody staining to directly confirm the target of interest is lost/reduced, which would obviously strengthen the study.
Lacking the tools or ability to assess RNAi efficiency (qPCR, antibody staining), some conclusions need to be tempered. For example, in the experiments in Figure S6 regarding canonical Notch signaling, the authors do not find a phenotype by Delta or Serrate knockdown, but there are no experiments that show Delta or Serrate are lost. Thus, if the authors cannot directly test for RNAi efficiency, these conclusions should be tempered throughout the manuscript.
We agree with the reviewer and now provide information on the use of Dicer in our RNAi experiments and conducted new sgRNA/Cas9 experiments. In addition we tempered our wording stating that Dl and or Ser are still possible ligands.
(3) More description is needed regarding how the authors are measuring and calculating the "wrapping index". In principle, the approach seems sound. However, are there cases where axons are "partially" wrapped of various magnitudes, and how are these cases treated in the analysis? Are there additional controls of previously characterized mutants to illustrate the dynamic range of the wrapping index in various conditions?
This is now explained.
Further, can the authors quantify the phenotypes in the axonal "bulges" in Figures 1, 3, and 5?
This is a difficult question. Although we can easily quantify the number of bulges we cannot quantify the severity of the phenotype as this will require EM analysis. Sectioning nerves at a specific distance of the ventral nerve cord already requires very careful adjustments. Sectioning at the level of a bulge is way more difficult and it is not possible to get the number of sections needed to quantify the bulge phenotype.
The fact is that all wrapping glial cells develop swellings (bulges) at the position of the nucleus. As there are in general three wrapping glial cells per segmental nerve, the number of bulges is three.
(4) It seems difficult to clearly untangle the functions of Htl/Uif/Notch in differentiation itself vs subsequent steps in growth/wrapping. For example, if the differentiation steps are not properly coordinated, couldn't this give rise to some observed differences in growth or wrapping at later stages? I'm not sure of any obvious experiments to pursue here, but at least a brief discussion of these issues in the manuscript would be of use.
We have discussed this in our discussion now more carefully. To discriminate the function of the three genes in either differentiation or in a stepwise mode of growth and differentiation.
When comparing the different loss of function phenotypes they al appear the same, which would argue all three genes act in a common process.
However, when we look at gain of function phenotypes, Htl and Uif behave different compared to Notch. This would favor for two distinct processes.
We have now added activity markers for RTK signaling to directly show that Notch silences RTK activity. Unfortunately we were not able to do a similar reciprocal experiment.
Minor:
(1) The Introduction is too long, and would benefit from revisions to make it shorter and more concise.
We have shortened the introduction and hopefully made it more concise.
(2) A schematic illustrating the model the authors propose about Htl, Uif, and Notch in glial differentiation, growth, and wrapping would benefit the clarity of this work.
We had previously added the graphical abstract below that we updated and included as a Figure in the main text.
References
Kautzmann, S., Rey, S., Krebs, A., and Klämbt, C. (2025). Cholinergic and glutamatergic axons differentially require glial support in the Drosophila PNS. Glia. 10.1002/glia.70011.
Kottmeier, R., Bittern, J., Schoofs, A., Scheiwe, F., Matzat, T., Pankratz, M., and Klämbt, C. (2020). Wrapping glia regulates neuronal signaling speed and precision in the peripheral nervous system of Drosophila. Nature communications 11, 4491-4417. 10.1038/s41467-020-18291-1.
Loubéry, S., Seum, C., Moraleda, A., Daeden, A., Fürthauer, M., and González-Gaitán, M. (2014). Uninflatable and Notch control the targeting of Sara endosomes during asymmetric division. Current biology : CB 24, 2142-2148. 10.1016/j.cub.2014.07.054.
Macagno, J.P., Diaz Vera, J., Yu, Y., MacPherson, I., Sandilands, E., Palmer, R., Norman, J.C., Frame, M., and Vidal, M. (2014). FAK acts as a suppressor of RTK-MAP kinase signalling in Drosophila melanogaster epithelia and human cancer cells. PLoS Genet 10, e1004262. 10.1371/journal.pgen.1004262.
Rebay, I., Chen, F., Hsiao, F., Kolodziej, P.A., Kuang, B.H., Laverty, T., Suh, C., Voas, M., Williams, A., and Rubin, G.M. (2000). A genetic screen for novel components of the Ras/Mitogen-activated protein kinase signaling pathway that interact with the yan gene of Drosophila identifies split ends, a new RNA recognition motif-containing protein. Genetics 154, 695-712. 10.1093/genetics/154.2.695.
Sieglitz, F., Matzat, T., Yuva-Adyemir, Y., Neuert, H., Altenhein, B., and Klämbt, C. (2013). Antagonistic Feedback Loops Involving Rau and Sprouty in the Drosophila Eye Control Neuronal and Glial Differentiation. Science signaling 6, ra96. 10.1126/scisignal.2004651.
Therrien, M., Morrison, D.K., Wong, A.M., and Rubin, G.M. (2000). A genetic screen for modifiers of a kinase suppressor of Ras-dependent rough eye phenotype in Drosophila. Genetics 156, 1231-1242.
The year local models got good, but cloud models got even better
Local models improved a lot in 2025. Mentions Llama 3.3 70B, Mistral Small 3, and the Chinese 20-30B parameter models.
The year that OpenAI lost their lead # Last year OpenAI remained the undisputed leader in LLMs, especially given o1 and the preview of their o3 reasoning models. This year the rest of the industry caught up. OpenAI still have top tier models, but they’re being challenged across the board. In image models they’re still being beaten by Nano Banana Pro. For code a lot of developers rate Opus 4.5 very slightly ahead of GPT-5.2 Codex Max. In open weight models their gpt-oss models, while great, are falling behind the Chinese AI labs. Their lead in audio is under threat from the Gemini Live API. Where OpenAI are winning is in consumer mindshare. Nobody knows what an “LLM” is but almost everyone has heard of ChatGPT. Their consumer apps still dwarf Gemini and Claude in terms of user numbers. Their biggest risk here is Gemini. In December OpenAI declared a Code Red in response to Gemini 3, delaying work on new initiatives to focus on the competition with their key products.
Author sees OpenAI losing their lead in 2025: Nano Banana Pro (Google) is a better image generating model Opus 4.5. better or equal than GPT5.2 Codex Max for coding Chinese labs have better open weight models Audio, Gemini Live API (google) is direct threat.
OpenAI mostly has better consumer visibility (yup, ChatGPT is the general term for LLMs, Aspirin style)
It is still strongest in consumer facing apps, but Gemini 3 is a challenger there.
3:58 "GrapheneOS ist das sicherste, bisher nie von einem Geheimdienst gehackte Betriebssystem."
nein. selbst mit 100% open-source software, die man selber kompiliert mit einem open-source compiler, selbst damit hast du:<br /> closed-source hardware, closed-source firmware, closed-source netzwerktreiber, hardware backdoors in CPUs und GPUs, kompromittierende abstrahlung.<br /> also mit GrapheneOS kannst du deine attack surface nur verkleinern, aber sicher nicht auf null reduzieren.<br /> es ist auch immer die frage, was ist dein threat model, also wie mächtig sind deine gegner, und wie wichtig bist du.<br /> für state-level actors wie NSA oder Mossad bist du praktisch immer angreifbar.<br /> das betrifft praktisch alle consumer-grade hardware, und echte sicherheit hast du nur mit military-grade hardware, aber die kannst du nicht kaufen, genauso wie militärwaffen.<br /> opsec für anfänger... dabei haben wir 1000 andere probleme die jetzt wichtiger sind, vor allem selbstversorgung.
convivial tools are sustainable, energy-efficient (though often labor intensive), local-first, and designed primarily to enhance the autonomy and creativity of their users
<3
Reviewer #3 (Public review):
Summary:
The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13-lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone rich retina development. Through cross species analysis the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify role of these genes in regulating competence to generate cone photoreceptors.
Strengths:
Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights onto their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.
The authors have done considerable work to address reviewer concerns from the first draft. The current version of the manuscript is strong and supports the claims.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary
In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown.
Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors.
Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production.
Strengths
This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease.
Weaknesses
(1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.
We have revised the text to qualify these conclusions as suggested.
“Zic3 promotes cone-specific gene expression and is necessary for generating the full complement of cone photoreceptors”
“Pou2f1 overexpression upregulated an overlapping but distinct, and larger, set of cone-specific genes relative to Zic3, while also downregulating many of the same rod-specific genes, often to a greater extent (Fig. 3C).”
“This resulted in a statistically significant ~20% reduction in the density of cone photoreceptors in the mutant retina (Fig. 3E,F), while the relative numbers of rods and horizontal cells remained unaffected (Fig. S4A-D).”
“Our analysis suggests that gene regulatory networks controlling cone specification are highly redundant, with transcription factors acting in complex, redundant, and potentially synergistic combinations. This is further supported by our findings on the synergistic effects of combined overexpression of Zic3 and Pou2f1 increasing both the number of differentially expressed genes and their level of change in expression relative to the modest changes seen with overexpression of either gene alone (Fig. 3) and the relatively mild or undetectable phenotypes observed following loss of function of Zic3 and Mef2c (Fig. 3, Fig. S6), as well as other cone-promoting factors such as Onecut1 and Pou2f1[18,19].“
(2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types?
The 13LGS retina, like most cone-dominant retinas, shows relatively lower numbers of rod and cone photoreceptors (~20%) than do nocturnal species such as mice (~80%). The difference is made up by increased numbers of inner retinal neurons and Muller glia. While rigorous histological quantification of the abundance of inner retinal cell types has not yet been performed for 13LGS, we can estimate these values using our snATAC-Seq data. These numbers are provided in Table ST1, and are now discussed in the text.
(3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population?
This is indeed likely to be the case. This is now stated explicitly in the text.
“However, no clear trajectory for rod differentiation was detected, likely due to the very low number of rod cells detected prior to P17 (Fig. 2A).”
(4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data.
The figure legend has been corrected, and negative controls for P24 have been added. The figure legend has been modified as follows:
“Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls. (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F), 50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”
(5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the conelike photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different?
These UMAP data represent two independent experiments. Total numbers and relative fractions of each cell type are now included in Table ST5.
In these experiments, cone-like precursors were identified by both cell type clustering and differential gene expression. Cells from all conditions were found in the cone-like precursor cluster. However, cells electroporated with a plasmid expressing GFP alone only showed GFP as a differentially expressed gene, identifying them most likely as GFP+ rods. In contrast, Zic3 overexpression resulted in increased expression of cone-specific genes and decreased expression of rod-specific genes in both cone-like precursors and rods relative to controls electroporated with GFP alone. Cell type proportions across independent overexpression singlecell experiments could be influenced by a number of factors, including electroporation efficiency and ex vivo growth conditions.
(6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset.
We observe that loss of function of Zic3 in developing retinal progenitors leads to a reduction in the total number of cones (Fig. 4E,F). In Fig. S4, we investigate how gene expression is altered in both the remaining cones and in other retinal cell types. We only observed significant changes in mutant cones and Muller glia relative to controls. We observe a mixed phenotype in cones, with a subset of cone-specific genes downregulated (notably including Thrb), a subset of others upregulated (including Opn1sw). We also find that genes expressed both in rods and cones, as well as rod-specific genes, are downregulated in cKO cones. Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In mutant Muller glia, in contrast, we see a broad decrease in expression of Muller glia-specific genes, which likely reflects the indirect effects of Zic3 loss of function in retinal progenitors, and an upregulation of both broadly photoreceptor-specific genes and a subset of rod-specific genes, which may also result from altered adhesion between Muller glia and rods.
This is consistent with the conclusions in the text, although we have both modified the text and included heatmaps showing downregulation of rod-specific genes in mutant cones, to clarify this finding.
“In addition, we observe a broad decrease in expression of genes expressed at high levels in both cones and rods (Rpgrip1, Drd4) and rod-specific genes (Rho, Cnga1, Pde6b) in mutant cones (Fig. S4F). Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In contrast, increased expression of rod-specific genes (Rho, Nrl, Pde6g, Gngt1) and pan-photoreceptor genes (Crx, Stx3, Rcvrn) was observed in Müller glia (Fig. S4G), which may likewise result from altered adhesion between Muller glia and rods. Finally, several Müller glia-specific genes were downregulated, including Clu, Aqp4, and Notch pathway components such as Hes1 and Id3, with the exception of Hopx, which was upregulated (Fig. S4G). This likely reflects the indirect effects of Zic3 loss of function in retinal progenitors. These findings indicate that Zic3 is essential for the proper expression of photoreceptor genes in cones while also playing a role in regulating expression of Müller glia-specific genes.”
(7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference.
We have modified the text as requested to replace “evolved” and “evolutionarily conserved” where possible, with examples of revised text listed below:
“These results demonstrate that modifications to gene regulatory networks underlie the development of cone-dominant retina,...”
“Our results demonstrate that heterochronic expansion of the expression of transcription factors that promote cone development is a key event in the development of the cone-dominant 13LGS retina.”
“Conserved patterns of motif accessibility, identified using ChromVAR and theTRANSFAC2018 database, (Fig. S1F, Table ST1)...”
“However, most of these elements mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”
“We conclude that the development of the cone-dominant retina in 13LGS is driven by novel cisregulatory elements…”
“Based on our bioinformatic analysis, the cone-dominant 13LGS retina follows this paradigm, in which species-specific enhancer elements…”
“Dot plots showing the enrichment of binding sites for Otx2 and Neurod1, TFs which are broadly expressed in both neurogenic RPC and photoreceptor precursors, which are enriched in both conserved cis-regulatory elements in both species. (D) Bar plots showing the number of conversed and species-specific enhancers per TSS in four cone-promoting genes between 13LGS and mouse.”
Reviewer #2 (Public review):
Summary:
This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the conedominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors.
Strengths:
The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field.
Weaknesses:
The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement.
We have modified the text and figures to more clearly describe the observed roles of Zic3 and Mef2c in cone photoreceptor development as detailed in our responses to reviewer recommendations.
Reviewer #3 (Public review):
Summary:
The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors.
Strengths:
Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.
Weaknesses:
(1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse?
We were unable to include 13LGS rods in our analysis due to the extremely low number of cells detected prior to P17. Relative expression levels of cone-promoting transcription factors in 13LGS in early RPCs and early NG cells is shown in Fig. 2H. Particularly when compared to mice, we also observe elevated expression of cone-promoting genes in early-stage RPC and/or early NG cells. These include Zic3, Onecut2, Mef2c, and Pou2f1, as well as transcription factors that promote the differentiation of post-mitotic cone precursors, such as Thrb and Rxrg. Contrast this with genes that promote specification and differentiation of both rods and cones, such as Otx2 and Crx, which show similar or even slightly higher expression in mice. Genes such as Casz1, which act in late NG cells to promote rod specification, are indeed downregulated in 13LGS late NG cells relative to mice. We have modified the text to clarify these points, as shown below:
“To further characterize species-specific patterns of gene expression and regulation during postnatal photoreceptor development, we analyzed differential gene expression, chromatin accessibility, and motif enrichment across late-stage primary and neurogenic progenitors, immature photoreceptor precursors, rods, and cones. Due to their very low number before time point P17, we were unable to include 13LGS rods in the analysis.”
“In contrast, two broad patterns of differential expression of cone-promoting transcription factors were observed between mouse and 13LGS.”
“First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”
“Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”
“In contrast, genes such as Casz1, which act in late neurogenic RPCs to promote rod specification, are downregulated in 13LGS late neurogenic RPCs relative to mice.”
(2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods.
This is an excellent question. We have addressed this question by analyzing both expression of the cone-promoting genes identified in C2 and C3 in Figure 2C and accessibility of their associated enhancer sequences, which are shown in Figure 6B, in early and late-stage RPCs and cone precursors. The results are shown in Author response image 1 below. We observe that cone-promoting genes consistently show higher expression in both late-stage RPCs and cones. We do not observe any clear differences in the accessibility of the associated enhancer regions, as determined by snATAC-Seq. However, since we have not performed CUT&RUN analysis in embryonic retina for H3K27Ac or any other marker of active enhancer elements, we cannot determine whether the total number of active enhancers differs between early and late-stage RPCs. We suspect, however, this is likely to be the case, given the differences in the expression levels of these genes.
Author response image 1.
Relative expression levels of cone-promoting genes and accessibility of enhancer elements associated with these genes in early- and late-stage RPCs and cone precursors.
(3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS.
Novel enhancer sequences here are defined as having divergent sequences rather than simply divergent activity. This point has been clarified in the text, with the following changes made:
“However, most of these elements mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”
“...demonstrated far greater motif enrichment in active regulatory elements in 13LGS than in mice, though few of these elements mapped to sequences that were shared between 13LGS and mouse (Fig. 5C,D, Table ST10).”
(4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the conepromoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rodpromoting genes in C7-8.
The Reviewer’s point is well taken, and we agree that this mechanism is unlikely to be specific to cone photoreceptors, since we are simply looking at genes that show higher expression in late-stage neurogenic RPCs in 13LGS. We have changed the relevant text to now state:
“Enhancer elements associated with cone-specific genes in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors in late-stage neurogenic RPCs than in mice, as might be expected, given the higher expression levels of these genes.”
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) Minor: Clusters C1-C8 (Figure 2) are labeled as "C1-8" in the text but "G1-8" in the figure.
This has been done.
(2) Minor: Showing other neurogenic factors (Olig2, Ascl1, Otx2) and late-stage specific factors (Lhx2, Sox8, Nfia/b) could be shown in Figure 2 to better support the text.
This has been done. These motifs are consistent in both species, but Figure 2F shows differential motifs. The reference to Figure 2F has been altered to include Table ST4, while Neurod1 motifs are shown in Fig. 2F.
Reviewer #2 (Recommendations for the authors):
(1) Figure 2
2A-B: The exclusion of early-stage data from the species-integrated analysis is puzzling, as it could reveal significant differences between early-stage neurogenic progenitors in mice and late-stage progenitors in 13LGS that both give rise to cones. This analysis would also shed light on how cone-promoting transcription factors are suppressed in mouse early-stage progenitors, limiting the window for cone genesis.
2C: The figure labels G1-8, while C1-8 are referenced in the text.
2F: Neurog2, Olig2, Ascl1, and Neurod1 are mentioned in the text but not labeled in the figure.
2A-B: There are indeed substantial differences between early-stage RPC in 13LGS and latestage RPC in mice that are broadly linked to control of temporal patterning, which are mentioned in the text. For instance, early-stage RPCs in both animals express higher levels of Nr2f1/2, Meis1/2, and Foxp1/4, while late-stage RPCs express higher levels of Nfia/b/x, indicating that core distinction between early- and late-stage RPCs is maintained. What most clearly differs in 13-LGS is the sustained expression of a subset of cone-promoting transcription factors in late-stage RPCs that are normally restricted to early-stage RPCs in mice. However, as mentioned in response to Reviewer #3’s first point, we do observe some evidence for increased expression of cone-promoting transcription factors in early-stage RPCs and NG cells of 13LGS relative to mice, although this is much less dramatic than observed at later stages. We have modified the text to directly mention this point. G1-8 has been corrected to C1-8 in the figure, a reference to Table ST4 has been added in discussion of neurogenic bHLH factors, and Fig. 2F has been modified to label Neurod1.
“First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”
“Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”
(2) Figure 3
In 3F, the cone density in the WT retina is approximately 0.25 cones per micron, while in the Zic3 cKO retina, it is about 0.2 cones per micron. However, the WT control in Figure S6C also shows about 0.2 cones per micron, raising questions about whether there is a genuine decrease in cone number or if it results from quantification variability. Additionally, the proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, which is inconsistent with the conclusion that Zic3 cKO leads to reduced cone production. Therefore, I found that the conclusion that Zic3 is necessary for cone development is not supported by the data.
The cone density counts in the two mutant lines and accompanying littermate controls were collected by blinded counting by two different observers (R.A. for the Zic3 cKO and N.P. for the Mef2c cKO). We believe that the ~20% difference in the observed cone density in the two control samples likely represents investigator-dependent differences. These can exceed 20% between even highly skilled observers when quantifying dissociated cells (PMID: 35198419) and are likely to be even higher for immunohistochemistry samples. Since both controls were done in parallel with littermate mutant samples, we therefore stand by our interpretation of these results.
(3) Figures 4 and 5
These figures are duplicates. In Figure 4, Mef2C overexpression in postnatal progenitors leads to increased numbers of neurogenic RPCs, suggesting it may promote cell proliferation rather than inhibit rod cell fate or promote cone cell fate. Electroporation of plasmids into P0 retina typically does not label cone cells, as cones are born prenatally in mice. Given the widespread GFP signal in Figure 4D, the authors should consider that the high background of GFP signal may have misled the quantification of the result.
The figure duplication has been corrected. We respectfully disagree with the Reviewer’s statement that ex vivo electroporation performed at P0, as is the case here, does not label cones. We routinely observe small numbers of electroporated cones when performing this analysis. Cones at this age are located on the scleral face of the retina at this age and therefore in direct contact with the buffer solution containing the plasmid in question (c.f. PMID: 20729845, 31128945, 34788628, 40654906). Furthermore, since the level of GFP expression that is used to gate electroporated cells for isolation using FACS is typically considerably less than that used to identify a GFP-positive cell using standard immunohistochemical techniques, making it difficult to directly compare the efficiency of cone electroporation between these approaches. We agree, however, that Mef2c overexpression seems to broadly delay the differentiation of rod photoreceptors, and have modified the text to include discussion of this point.
“Although a few GFP-positive electroporated cells co-expressing the cone-specific marker Gnat2 were detected in control (likely due to the electroporation of cone precursors, which we have previously observed in P0 retinal explants (Clark et al., 2019; Leavey et al., 2025; Lyu et al., 2021; Onishi et al., 2010)), there was a significant increase in double-positive cells in the test condition, matching the novel cone-like precursor population found in the scRNA-Seq (Fig. 4E).”
“Indeed, overexpression of Mef2c increased the number of both neurogenic RPCs and immature photoreceptor precursors, suggesting that rod differentiation was broadly delayed.”
(4) Figure S2
The figure legend lacks information about panels A and B. It is unclear which panels represent immunohistochemistry and which represent RNA hybridization chain reaction. Overall, the staining results are difficult to interpret, as it appears that all examined RNAs/proteins are positively stained across the sections with varying background levels. Specificity is hard to assess. For instance, in Figure S2B, the background intensity of Zic3 staining varies inconsistently from P1 to P24. The number of Zic3 mRNA dots seems to peak at P5 and decrease at P10, which contradicts the scRNA-seq results showing peak expression in mature cones.
The figure legend has been corrected. Negative controls are now included for both in situ hybridization (Fig. S2C’) and immunostaining (Fig. S2G) at P24, along with paired experimental data. We have quantified the total fraction of Otx2+ cells that also contain Zic3 foci, and find that coexpression peaks at P5 and P10. This is now included as Fig. S2E.
The number of Zic3 foci is in fact higher at P5 than P10, with XX foci/Otx2+ cell at P5 vs. YY foci/Otx2+ cell at P10.
“Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls. (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F), 50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”
(5) Figure S3
In S3A and S3B, the UMAPs of the empty vector-treated groups are distinctly different. The same goes for Zic3+Pou2F1 UMAPS.
In S3A, Zic3 overexpression alone does not appear to have any impact on cell fate. It is not evident that Zic3, even in combination with Pou2F1, has any significant impact on cone or other cell type production, as the proportions of the cones and cone precursors seem similar across different groups.
In S3B, Zic3+Pou2F1 seems to increase HC-like precursors without increasing cone-like procursors or cones.
Moreover, the cone-like precursors described do not seem to contribute to cone generation, as there is no increase in cones in the adult mouse retina; rather, these cells resemble rod-cone mosaic cells with expression of both rod- and cone-specific genes.
As the Reviewer states, we observe some differences in the proportion of cell types in both control and experimental conditions between the two experiments. Notably, relatively more photoreceptors and correspondingly fewer progenitors, bipolar, and amacrine cells are observed in the samples shown in Fig. S3A relative to Fig. S3B. However, these represent two independent experiments. Cell type proportions seen across independent ex vivo electroporation experiments such as these can be affected by a number of variables, including precise developmental age of the samples, electroporation efficiency, cell dissociation conditions, and ex vivo growth conditions. Some differences are inevitable, which is why paired negative controls must always be done for results to be interpretable.
In both experiments, we observe that overexpression of Zic3, Pou2f1, and most notably Zic3 and Pou2f1 lead to an increase in the relative fraction of cone-like precursors. In the experiment shown in Fig. S3B, we also observe that Zic3 alone, Onecut1 alone, and Zic3 and Pou2f1 in combination also promote generation of horizontal-like cells. All treatments likewise induce expression of different subsets of cone-enriched genes in the cone-like precursors, while also suppressing rod-specific genes in these same cells.
Total numbers and relative fractions of each cell type are now included in Table ST5.
(6) Figure S4
The proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, contradicting the conclusion that Zic3 cKO leads to reduced cone production.
Total numbers and relative fractions of each cell type are now included in Table ST6.
(7) Figure S5
In Figure S5A, Mef2C overexpression does not decrease expression of the rod gene Nrl.
This is correct, and is mentioned in the text.
“No obvious reduction in the relative number of Nrl-positive cells was observed (Fig. S5A).”
Reviewer #3 (Recommendations for the authors):
(1) The authors make several broad and definitive statements that have the potential to confuse readers. In the first sections of Results: 'retinal ganglion cells and amacrine cells were generated predominantly by early stage progenitors' but later say 'late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early born cell types.' In the discussion, the authors themselves point out limitations of analyses without birthdating. These definitive statements should be qualified/amended.
Both single-cell RNA and ATAC-Seq analysis can be used to accurately profile cells that have recently exited mitosis and committed to a specific cell fate. When applied to data obtained from a developmental timecourse such as is the case here, this can in turn serve as a reasonable proxy for generating birthdating data. Nonetheless, we have modified the text to state that BrdU/EdU labeling is indeed the gold standard for drawing conclusions about cell birthdates, and should be used to confirm these findings in future studies.
“The expected temporal patterns of neurogenesis were observed in both species: retinal ganglion cells and amacrine cells were generated predominantly in the early stage, whereas bipolar cells and Müller glia were produced in the late stage.”
“Though BrdU/EdU labeling would be required to unambiguously demonstrate species-specific differences in birthdating, our findings strongly indicate that 13LGS exhibit a selective expansion of the temporal window of cone generation, extending into late stages of neurogenesis.”
This sentence does not make a definitive statement about 13LGS RPC competence, and we have left it unaltered.
“These findings suggest that late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early-born cell types…”
(2) Figure 2C clusters are referred to as C1-8 in the text but G1-8 in the figure. This is confusing and should be fixed.
This has been corrected.
(3) The authors refer to many genes that show differential expression in Figure 2F, but virtually none of these are labelled in the heatmap, making it hard to follow the narrative.
Figure 2F represents transcription factor binding motifs that are differentially active between mouse and 13LGS, not gene expression. We have modified the figure to include names of all differentially active motifs discussed in the text, and otherwise refer the reader to Table ST4, which includes a list of all differentially expressed genes.
Reviewer #3 (Public review):
Summary:
This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.
Summary of Revision 1. This is a Valuable study supported by Solid evidence. The revisions meaningfully strengthen the manuscript by clarifying methods, improving transparency, and refining presentation. The work provides useful conceptual and methodological advances for understanding CGM-derived glucose dynamics and their possible relationship to cardiovascular pathology.
Strengths:
The authors have provided a much clearer exposition of how each glycemic component was defined and validated across cohorts. The revised manuscript now includes explicit pairwise correlations, clarified p- and q-value reporting, and better visualization of key associations between CGM indices and %NC. The justification for LASSO and PLS use is now well explained, and additional details on cohort timing relative to PCI, validation dataset structure, and statistical robustness (e.g., VIP stability with covariates) address prior concerns. The inclusion of precise factor definitions and clearer graphics notably improves interpretability.
Limitations:
Some limitations remain inherent to the study design, including the modest primary sample size, reliance on retrospective data, and differences between validation datasets in outcome ascertainment. However, these are now acknowledged more openly.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly.
The reviewer’s comments in this letter are in Bold and Italics.
Summary:
This study identified three independent components of glucose dynamics-"value," "variability," and "autocorrelation", and reported important findings indicating that they play an important role in predicting coronary plaque vulnerability. Although the generalizability of the results needs further investigation due to the limited sample size and validation cohort limitations, this study makes several notable contributions: validation of autocorrelation as a new clinical indicator, theoretical support through mathematical modeling, and development of a web application for practical implementation. These contributions are likely to attract broad interest from researchers in both diabetology and cardiology and may suggest the potential for a new approach to glucose monitoring that goes beyond conventional glycemic control indicators in clinical practice.
Strengths:
The most notable strength of this study is the identification of three independent elements in glycemic dynamics: value, variability, and autocorrelation. In particular, the metric of autocorrelation, which has not been captured by conventional glycemic control indices, may bring a new perspective for understanding glycemic dynamics. In terms of methodological aspects, the study uses an analytical approach combining various statistical methods such as factor analysis, LASSO, and PLS regression, and enhances the reliability of results through theoretical validation using mathematical models and validation in other cohorts. In addition, the practical aspect of the research results, such as the development of a Web application, is also an important contribution to clinical implementation.
We appreciate reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.
Weaknesses:
The most significant weakness of this study is the relatively small sample size of 53 study subjects. This sample size limitation leads to a lack of statistical power, especially in subgroup analyses, and to limitations in the assessment of rare events.
We appreciate the reviewer’s concern regarding the sample size. We acknowledge that a larger sample size would increase statistical power, especially for subgroup analyses and the assessment of rare events.
We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.
Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).
Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective.
Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.
However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
We appreciate the reviewer’s feedback and believe that these clarifications improve the manuscript.
In terms of validation, several challenges exist, including geographical and ethnic biases in the validation cohorts, lack of long-term follow-up data, and insufficient validation across different clinical settings. In terms of data representativeness, limiting factors include the inclusion of only subjects with well-controlled serum cholesterol and blood pressure and the use of only short-term measurement data.
We appreciate the reviewer’s comment regarding the challenges associated with validation. In terms of geographic and ethnic diversity, our study includes validation datasets from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These datasets include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. In addition, we recognize the limited availability of publicly available datasets with sufficient sample sizes for factor decomposition that include both healthy individuals and those with type 2 diabetes (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). The main publicly available datasets with relevant clinical characteristics have already been analyzed in this study using unbiased approaches.
However, we fully agree with the reviewer that expanding the geographic and ethnic scope, including long-term follow-up data, and validation in different clinical settings would further strengthen the robustness and generalizability of our findings. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
Regarding the validation considerations, we have added the following sentences to the Discussion section (lines 409-414, 354-361):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.
In terms of elucidation of physical mechanisms, the study is not sufficient to elucidate the mechanisms linking autocorrelation and clinical outcomes or to verify them at the cellular or molecular level.
We appreciate the reviewer’s point regarding the need for further elucidation of the physical mechanisms linking glucose autocorrelation to clinical outcomes. We fully agree with the reviewer that the detailed molecular and cellular mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.
However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes. While further research at the cellular and molecular level is needed to fully validate these findings, it is important to note that the primary goal of this study was to analyze the characteristics of glucose dynamics and gain new insights into metabolism, rather than to perform molecular biology experiments.
Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.
Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.
While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352):
This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.
Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.
Reviewer #2 (Public review):
We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.
Sugimoto et al. explore the relationship between glucose dynamics - specifically value, variability, and autocorrelation - and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction. The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.
Strengths:
(1) The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.
(2) Inclusion of datasets from diverse regions enhances generalizability.
(3) The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.
(4) The focus on CGM-derived indices aligns with personalized medicine trends, showcasing the potential for CAD risk stratification.
We appreciate reviewer #2 for the positive assessment and for the valuable and constructive comments on our manuscript.
Weaknesses:
(1) The link between autocorrelation and plaque vulnerability remains speculative without a proposed biological explanation.
We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. We fully agree with the reviewer that the detailed biological mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.
However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes.
Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.
Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.
While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352):
This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.
Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.
(2) The relatively small sample size (n=270) limits statistical power, especially when stratified by glucose tolerance levels.
We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.
We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.
Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).
Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.
However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
(3) Strict participant selection criteria may reduce applicability to broader populations.
We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.
Our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. However, we acknowledge that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.
(4) CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines.
We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We acknowledge that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.
To address this concern, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability (https://cgmregressionapp2.streamlit.app/). This tool eliminates the need for manual calculations, making these indices more practical for clinical implementation.
Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.
To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.
We have added the following text to the Discussion section to address these considerations (lines 415-419):
While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.
(5) The study does not compare CGM-derived indices to existing advanced CAD risk models, limiting the ability to assess their true predictive superiority.
We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.
Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy.
We have added the following text to the Result section (lines 115-122):
The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R² of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).
(6) Varying CGM sampling intervals (5-minute vs. 15-minute) were not thoroughly analyzed for impact on results.
We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Furthermore, the regression model using CGM_Mean, CGM_Std, and AC_Var from 15-minute intervals to predict %NC achieved an R² of 0.36 and an AIC of 321, identical to the model using 5-minute intervals. These results indicate that our results are robust to variations in CGM sampling frequency.
We have added this analysis to the Result section (lines 122-125):
The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R² = 0.36; AIC = 321; Fig. S1B).
Reviewer #3 (Public review):
We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.
Summary:
This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as a parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM-derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM-related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.
We appreciate reviewer #3 for the valuable and constructive comments on our manuscript.
The goal of this study was to identify CGM features that relate to %NC. Through multiple feature selection methods, they arrive at 3 components: value, variability, and autocorrelation. While the feature list is highly correlated, the authors take steps to ensure feature selection is robust. There is a lack of clarity of what each component (value, variability, and autocorrelation) includes as while similar CGM indices fall within each component, there appear to be some indices that appear as relevant to value in one dataset and to variability in the validation.
We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.
For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.
Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.
To address these considerations, we have added the following text to the Discussion section (lines 388-396):
Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.
We are sceptical about statements of significance without documentation of p-values.
We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.
First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.
Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. "The correct interpretation of confidence intervals." Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).
For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83,
0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively.
We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.
While hesitations remain, the ability of these authors to find groupings of these many CGM metrics in relation to %NC is of interest. The believability of the associations is impeded by an obtuse presentation of the results with core data (i.e. correlation plots between CGM metrics and %NC) buried in the supplement while main figures contain plots of numerical estimates from models which would be more usefully presented in supplementary tables.
We appreciate the reviewer’s comment regarding the presentation of our results and recognize the importance of ensuring clarity and accessibility of the core data.
The central finding of our study is twofold: first, that the numerous CGM-derived measures can be systematically classified into three distinct components-mean, variance, and autocorrelation-and second, that each of these components is independently associated with %NC. This insight cannot be derived simply from examining scatter plots of individual correlations, which are provided in the Supplementary Figures. Instead, it emerges from our statistical analyses in the main figures, including multiple regression models that reveal the independent contributions of these components to %NC.
We acknowledge the reviewer’s concern regarding the accessibility of key data. To improve clarity, we have moved several scatter plots from the Supplementary Figures to the main figures (Fig. 1D-J) to allow readers to more directly visualize the relationships between CGM-derived measures and %NC. We believe this revision improved the transparency and readability of our results while maintaining the rigor of our analytical approach.
Given the small sample size in the primary analysis, there is a lot of modeling done with parameters estimated where simpler measures would serve and be more convincing as they require less data manipulation. A major example of this is that the pairwise correlation/covariance between CGM_mean, CGM_std, and AC_var is not shown and would be much more compelling in the claim that these are independent factors.
We appreciate the reviewer’s feedback on our statistical analysis and data presentation. The correlations between CGM_Mean, CGM_Std, and AC_Var were documented in Figure S1B. However, to improve accessibility and clarity, we have moved these correlation analyses to the main figures (Fig. 1F).
Regarding our modeling approach, we chose LASSO and PLS methods because they are wellestablished techniques that are particularly suited to scenarios with many input variables and a relatively small sample size. These methods have been used in the literature as robust approaches for variable selection under such conditions (Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288. Wold S, Sjöström M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics Intellig Lab Syst 58:109–130. Pei X, Qi D, Liu J, Si H, Huang S, Zou S, Lu D, Li Z. 2023. Screening marker genes of type 2 diabetes mellitus in mouse lacrimal gland by LASSO regression. Sci Rep 13:6862. Wang C, Kong H, Guan Y, Yang J, Gu J, Yang S, Xu G. 2005. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis.
Anal Chem 77:4108–4116.).
Lack of methodological detail is another challenge. For example, the time period of CGM metrics or CGM placement in the primary study in relation to the IVUS-derived measurements of coronary plaques is unclear. Are they temporally distant or proximal/ concurrent with the PCI?
We appreciate the reviewer’s important question regarding the temporal relationship between CGM measurements and IVUS-derived plaque assessments. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610615.), all individuals underwent continuous glucose monitoring for at least three consecutive days within the seven-day period prior to the PCI procedure. To improve clarity for readers, we have added the following text to the Methods section (lines 440-441):
All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.
A patient undergoing PCI for coronary intervention would be expected to have physiological and iatrogenic glycemic disturbances that do not reflect their baseline state. This is not considered or discussed.
We appreciate the reviewer’s concern regarding potential glycemic disturbances associated with PCI. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all CGM measurements were performed before the PCI procedure. This temporal separation ensures that the glycemic patterns analyzed in our study reflect the baseline metabolic state of the patients, rather than any physiological or iatrogenic effects of PCI. To avoid any misunderstanding, we have clarified this temporal relationship in the revised manuscript (lines 440-441):
All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.
The attempts at validation in external cohorts, Japanese, American, and Chinese are very poorly detailed. We could only find even an attempt to examine cardiovascular parameters in the Chinese data set but the outcome variables are unspecified with regard to what macrovascular events are included, their temporal relation to the CGM metrics, etc. Notably macrovascular event diagnoses are very different from the coronary plaque necrosis quantification. This could be a source of strength in the findings if carefully investigated and detailed but due to the lack of detail seems like an apples-to-oranges comparison.
We appreciate the reviewer’s comment regarding the validation cohorts and the need for greater clarity, particularly in the Chinese dataset. We acknowledge that our initial description lacked sufficient methodological detail, and we have expanded the Methods section to provide a more comprehensive explanation.
For the Chinese dataset, the data collection protocol was previously documented (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. Physical examinations included anthropometric measurements, and body mass index was calculated using standard protocols. CGM was performed using the FreeStyle Libre H device (Abbott Diabetes Care, UK), which records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events (Xie, Yong, et al. “Clinical outcome of nonculprit plaque ruptures in patients with acute coronary syndrome in the PROSPECT study.” JACC: Cardiovascular Imaging 7.4 (2014): 397-405.), we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than standardized diagnostic procedures or imaging studies. To address these concerns, we have added the following text to the Methods section (lines 496-504):
The data collection protocol for the Chinese dataset was previously documented (Zhao et al., 2023). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. CGM records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events, we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than from standardized diagnostic procedures or imaging studies.
Finally, the simulations at the end are not relevant to the main claims of the paper and we would recommend removing them for the coherence of this manuscript.
We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variability, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. Because temporal autocorrelation can be conceptually difficult to interpret, these visualizations were intended to provide intuitive examples for the readers.
However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502526), while retaining only those that enhance understanding of the three glycemic components. Regarding reviewer 2’s minor comment #4, we acknowledge that autocorrelation can be challenging to understand intuitively. To address this, we kept Fig. 4A with a brief description.
Recommendations for the authors:
Reviewer 2# (Recommendations for the authors):
Summary:
The study by Sugimoto et. al. investigates the association between components of glucose dynamics-value, variability, and autocorrelation-and coronary plaque vulnerability (%NC) in patients with varying glucose tolerance levels. The research identifies three key factors that independently predict %NC and highlights the potential of continuous glucose monitoring (CGM)-derived indices in risk assessment for coronary artery disease (CAD). Using robust statistical methods and validation across diverse populations, the study emphasizes the limitations of conventional diagnostic markers and suggests a novel, CGMbased approach for improved predictive performance While the study demonstrates significant novelty and potential impact, several issues must be addressed by the authors.
Major Comments:
(1) The study demonstrates originality by introducing autocorrelation as a novel predictive factor in glucose dynamics, a perspective rarely explored in prior research. While the innovation is commendable, the biological mechanisms linking autocorrelation to plaque vulnerability remain speculative. Providing a hypothesis or potential pathways would enhance the scientific impact and practical relevance of this finding.
We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. Our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. We have added the following sentences to the Discussion section (lines 341-352):
Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.
(2) The inclusion of datasets from Japan, America, and China adds a valuable cross-cultural dimension to the study, showcasing its potential applicability across diverse populations. Despite the multi-regional validation, the sample size (n=270) is relatively small, especially when stratified by glucose tolerance categories. This limits the statistical power and applicability to diverse populations. A larger, multi-center cohort would strengthen conclusions.
We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.
We would like to clarify several points regarding the statistical power and validation of our findings. Our study adheres to established methodological frameworks for sample size determination, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4 indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section.
Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32). Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.
Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components. Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective.
However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
To address the sample size considerations, we have added the following sentences to the Discussion section (lines 409-414):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
(3) The study focuses on a well-characterized cohort with controlled cholesterol and blood pressure levels, reducing confounding variables. However, this stringent selection might exclude individuals with significant variability in these parameters, potentially limiting the study's applicability to broader, real-world populations. The authors should discuss how this may affect generalizability and potential bias in the results.
We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.
Our validation strategy included multiple cohorts from different regions, specifically 64 Japanese, 53 American and 100 Chinese individuals. These cohorts represent a clinically diverse population, including both healthy individuals and those with diabetes, allowing for validation across a broad spectrum of metabolic conditions. However, we recognize that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.
We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):
Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.
Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.
(4) The study effectively highlights the potential of CGM-derived indices as a tool for CAD risk assessment, a concept that aligns with contemporary advancements in personalized medicine. Despite its potential, the complexity of CGM-derived indices like AC_Var and ADRR may hinder their routine clinical adoption. Providing simplified models or actionable guidelines would facilitate their integration into everyday practice.
We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We recognize that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.
To address this, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability. By eliminating the need for manual calculations, this tool streamlines the process and makes these indices more practical for clinical use.
Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.
To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.
We have added the following text to the Discussion section to address these considerations (lines 415-419):
While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.
(5) The exclusion of TIR from the main analysis is noted, but its relevance in diabetes management warrants further exploration. Integrating TIR as an outcome measure could provide additional clinical insights.
We appreciate the reviewer’s comment regarding the potential role of time in range (TIR) as an outcome measure in our study. Because TIR is primarily influenced by the mean and variance of glucose levels, it does not fully capture the distinct role of glucose autocorrelation, which was the focus of our investigation.
To clarify this point, we have expanded the Discussion section as follows (lines 380-388):
Although time in range (TIR) was not included in the main analyses due to the relatively small number of T2DM patients and the predominance of participants with TIR >70%, our results demonstrate that CGM-derived indices outperformed conventional markers such as FBG, HbA1c, and PG120 in predicting %NC. Furthermore, multiple regression analysis between factor scores and TIR revealed that only factor 1 (mean) and factor 2 (variance) were significantly associated with TIR (Fig. S8C, D). This finding confirms the presence of three distinct components in glucose dynamics and highlights the added value of examining AC_Var as an independent glycemic feature beyond conventional CGM-derived measures.
(6) While the study reflects a commitment to understanding CAD risks in a global context by including datasets from Japan, America, and China, the authors should provide demographic details (e.g., age, gender, socioeconomic status) and discuss how these factors might influence glucose dynamics and coronary plaque vulnerability.
We appreciate the reviewer’s comment regarding the potential influence of demographic factors on glucose dynamics and coronary plaque vulnerability. We examined these relationships and found that age and sex had minimal effects on glucose dynamics characteristics, as shown in Figure S8A and S8B. These findings suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust across demographic groups within our data set.
To address the reviewer’s suggestion, we have added the following discussion (lines 361-368):
In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.
(7) While the article shows CGM-derived indices outperform traditional markers (e.g., HbA1c, FBG, PG120), it does not compare these indices against existing advanced risk models (e.g., Framingham Risk Score for CAD). A direct comparison would strengthen the claim of superiority.
We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.
Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. We have updated the Result section as follows (lines 115-122):
The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R<sup>2</sup> of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).
(8) The study mentions varying CGM sampling intervals across datasets (5-minute vs. 15minute). Authors should employ sensitivity analysis to assess the impact of these differences on the results. This would help clarify whether higher-resolution data significantly improves predictive performance.
We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Consequently, the main findings remained consistent across both sampling frequencies, indicating that our results are robust to variations in temporal resolution. We have added this analysis to the Result section (lines 122-126):
The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R<sup>2</sup> = 0.36; AIC = 321; Fig. S1B).
(9) The identification of actionable components in glucose dynamics lays the groundwork for clinical stratification. The authors could explore the use of CGM-derived indices to develop a simple framework for stratifying risk into certain categories (e.g., low, moderate, high). This could improve clinical relevance and utility for healthcare providers.
We appreciate the reviewer’s suggestion regarding the potential for CGMderived indices to support clinical stratification. We completely agree with the idea that establishing risk categories (e.g., low, moderate, high) based on specific thresholds would enhance the clinical utility of these measures. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like p-hacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical thresholds. Establishing clinical thresholds typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.
To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper. However, we expect to make these measures more actionable in clinical use by integrating automated calculation tools with clear clinical thresholds.
We have added the following text to the Discussion section to address these considerations (lines 415-419):
While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.
(10) While the study acknowledges several limitations, authors should also consider explicitly addressing the potential impact of inter-individual variability in glucose metabolism (e.g., age-related changes, hormonal influences) on the findings.
We appreciate the reviewer’s comment regarding the potential impact of interindividual variability in glucose metabolism, including age-related changes and hormonal influences, on our results. In our analysis, we found that age had minimal effects on glucose dynamics characteristics, as shown in Figure S8A. In addition, CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of insulin secretion (I.I.) and insulin sensitivity (Composite index) (Fig. 2). These results suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust despite individual differences in glucose metabolism.
To address the reviewer’s suggestion, we have added the following discussion (lines 186-188, 361-368):
Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.
In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.
(11) It's unclear whether the identified components (value, variability, and autocorrelation) could serve as proxies for underlying physiological mechanisms, such as beta-cell dysfunction or insulin resistance. Please clarify.
We appreciate the reviewer’s comment regarding the physiological underpinnings of the glucose components we identified. The mean, variance, and autocorrelation components we identified likely reflect specific underlying physiological mechanisms related to glucose regulation. In our previous research (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.), we explored the relationship between glucose dynamics characteristics and glucose control capabilities using clamp tests and mathematical modelling. These investigations revealed that autocorrelation specifically shows a significant correlation with the disposition index (the product of insulin sensitivity and insulin secretion) and insulin clearance parameters.
Furthermore, our current study demonstrates that CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of established metabolic parameters including insulin secretion (I.I.) and insulin sensitivity (Composite index), as shown in Figure 2. These results suggest that the components we identified capture distinct physiological aspects of glucose metabolism beyond traditional measures of beta-cell function and insulin sensitivity. Further research is needed to fully characterize these relationships, but our results imply that these characteristics of glucose dynamics offer supplementary insight into the underlying beta-cell dysregulation that contributes to coronary plaque vulnerability.
To address the reviewer’s suggestion, we have added the following discussion to the Result section (lines 186-188):
Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.
Minor Comments:
(1) The use of LASSO and PLS regression is appropriate, but the rationale for choosing these methods over others (e.g., Ridge regression) should be explained in greater detail.
We appreciate the reviewer’s comment and have added the following discussion to the Methods section (lines 578-585):
LASSO regression was chosen for its ability to perform feature selection by identifying the most relevant predictors. Unlike Ridge regression, which simply shrinks coefficients toward zero without reaching exactly zero, LASSO produces sparse models, which is consistent with our goal of identifying the most critical features of glucose dynamics associated with coronary plaque vulnerability. In addition, we implemented PLS regression as a complementary approach due to its effectiveness in dealing with multicollinearity, which was particularly relevant given the high correlation among several CGM-derived measures.
(2) While figures are well-designed, adding annotations to highlight key findings (e.g., significant contributors in factor analysis) would improve clarity.
We appreciate the reviewer’s suggestion to improve the clarity of our figures. In the factor analysis, we decided not to include annotations because indicators such as ADRR and J-index can be associated with multiple factors, which could lead to misleading or confusing interpretations. However, in response to the suggestion, we have added annotations to the PLS analysis, specifically highlighting items with VIP values greater than 1 (Fig. 2D, S2D) to emphasize key contributors.
(3) The term "value" as a component of glucose dynamics could be clarified. For instance, does it strictly refer to mean glucose levels, or does it encompass other measures?
We appreciate the reviewer’s question regarding the term “value” in the context of glucose dynamics. Factor 1 was predominantly influenced by CGM_Mean, with a factor loading of 0.99, indicating that it primarily represents mean glucose levels. Given this strong correlation, we have renamed Factor 1 to “Mean” (Fig. 3A) to more accurately reflect its role in glucose dynamics.
(4) The concept of autocorrelation may be unfamiliar to some readers. A brief, intuitive explanation with a concrete example of how it manifests in glucose dynamics would enhance understanding.
We appreciate the reviewer’s suggestion. Autocorrelation refers to the relationship between a variable and its past values over time. In the context of glucose dynamics, it reflects how current glucose levels are influenced by past levels, capturing patterns such as sustained hyperglycemia or recurrent fluctuations. For example, if an individual experiences sustained high glucose levels after a meal, the strong correlation between successive glucose readings indicates high autocorrelation. We have included this explanation in the revised manuscript (lines 519-524) to improve clarity for readers unfamiliar with the concept. Additionally, Figure 4A shows an example of glucose dynamics with different autocorrelation.
(5) Ensure consistent use of terms like "glucose dynamics," "CGM-derived indices," and "plaque vulnerability." For instance, sometimes indices are referred to as "components," which might confuse readers unfamiliar with the field.
We appreciate the reviewer’s comment about ensuring consistency in terminology. To avoid confusion, we have reviewed and standardized the use of terms such as “CGM-derived indices,” and “plaque vulnerability” throughout the manuscript. Additionally, while many of our measures are strictly CGM-derived indices, several “components” in our analysis include fasting blood glucose (FBG) and glucose waveforms during the OGTT. For these measures, we retained the descriptors “glucose dynamics” and “components” rather than relabelling them as CGM-derived indices.
(6) Provide a more detailed overview of the supplementary materials in the main text, highlighting their relevance to the key findings.
We appreciate the reviewer’s suggestion. We revised the manuscript by integrating the supplementary text into the main text (lines 129-160), which provides a clearer overview of the supplementary materials. Consequently, the Supplementary Information section now only contains supplementary figures, while their relevance and key details are described in the main text.
Reviewer #3 (Recommendations for the authors):
Other Concerns:
(1) The text states the significance of tests, however, no p-values are listed: Lines 118-119: Significance is cited between CGM indices and %NC, however, neither the text nor supplementary text have p-values. Need p-values for Figure 3C, Figure S10. When running the https://cgm-basedregression.streamlit.app/ multiple regression analysis, a p-value should be given as well. Do the VIP scores (Line 142) change with the inclusion of SBP, DBP, TG, LDL, and HDL? Do the other datasets have the same well-controlled serum cholesterol and BP levels?
We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.
First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.
Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a statistical method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. “The correct interpretation of confidence intervals.” Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).
For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83, 0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively. We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.
We confirmed that the results of the variable importance in projection (VIP) analysis remained stable after including additional covariates, such as systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C). The VIP values for ADRR, MAGE, AC_Var, and LI consistently exceeded one even after these adjustments, suggesting that the primary findings are robust in the presence of these clinical variables. We have added the following sentences in the Results and Methods section (lines 188-191, 491-494):
Even when SBP, DBP, TG, LDL-C, and HDL-C were included as additional input variables, the results remained consistent, and the VIP scores for ADRR, AC_Var, MAGE, and LI remained greater than 1 (Fig. S2D).
Of note, as the original reports document, the validation datasets did not specify explicit cutoffs for blood pressure or cholesterol. Consequently, they included participants with suboptimal control of these parameters.
(2) Negative factor loadings have not been addressed and consistency in components: Figure 3, Figure S7. All the main features for value in Figure 3A are positive. However, MVALUE in S7B is very negative for value whereas the other features highlighted for value are positive. What is driving this difference? Please explain if the direction is important. Line 480 states that variables with factor loadings >= 0.30 were used for interpretation, but it appears in the text (Line 156, Figure 3) that oral DI was used for value, even though it had a -0.61 loading. Figure 3, Figure S7. HBGI falls within two separate components (value and variability). There is not a consistent component grouping. Removal of MAG (Line 185) and only MAG does not seem scientific. Did the removal of other features also result in similar or different Cronbach's ⍺? It is unclear what Figure S8B is plotting. What does each point mean?
We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.
For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.
Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.
With respect to negative factor loadings, we agree that they may appear confusing at first. However, in the context of exploratory factor analysis, the magnitude, or absolute value, of the loading is most critical for interpretation, rather than its sign. Following established practice, we considered variables with absolute loadings of at least 0.30 to be meaningful contributors to a given component. Accordingly, although the oral DI had a negative loading of –0.61, its absolute magnitude exceeded the threshold of 0.30, so it was considered in our interpretation of the “value” component. Regarding the reviewer’s observation that MVALUE in Figure S7B shows a strongly negative loading while other indices in the same component show positive loadings, we believe this reflects the relative orientation of the factor solution rather than a substantive difference in interpretation. In factor analysis, the direction of factor loadings is arbitrary: multiplying all the loadings for a given factor by –1 would not change the factor’s statistical identity. Therefore, the important factor is not whether a variable loads positively or negatively but rather the strength of its association with the latent component (i.e., the absolute value of the loading).
The rationale for removing MAG was based on statistical and methodological considerations. As is common practice in reliability analyses, we examined whether Cronbach’s α would improve if we excluded items with low factor loadings or weak item–total correlations. In the present study, we recalculated Cronbach’s α after removing the MAG item because it had a low loading. Its exclusion did not substantially affect the theoretical interpretation of the factor, which we conceptualize as “secretion” (without CGM). MAG’s removal alone is scientifically justified because it was the only item whose exclusion improved Cronbach's α while preserving interpretability. In contrast, removing other items would have undermined the conceptual clarity of the factor or would not have meaningfully improved α. Furthermore, the MAG item has a high factor 2 loading.
Each point in Figure S8B (old version) corresponds to an individual participant.
To address these considerations, we have added the following text to the Discussion, Methods, (lines 388-396, 600-601) and Figure S6B (current version) legend:
Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.
Variables with absolute factor loadings of ≥ 0.30 were used in interpretation.
Box plots comparing factors 1 (Mean), 2 (Variance), and 3 (Autocorrelation) between individuals without (-) and with (+) diabetic macrovascular complications. Each point corresponds to an individual. The boxes represent the interquartile range, with the median shown as a horizontal line. Mann–Whitney U tests were used to assess differences between groups, with P values < 0.05 considered statistically significant.
Minor Concerns:
(1) NGT is not defined.
We appreciate the reviewer for pointing out that the term “NGT” was not clearly defined in the original manuscript. We have added the following text to the Methods section (lines 447-451):
T2DM was defined as HbA1c ≥ 6.5%, fasting plasma glucose (FPG) ≥ 126 mg/dL or 2‑h plasma glucose during a 75‑g OGTT (PG120) ≥ 200 mg/dL. IGT was defined as HbA1c 6.0– 6.4%, FPG 110–125 mg/dL or PG120 140–199 mg/dL. NGT was defined as values below all prediabetes thresholds (HbA1c < 6.0%, FPG < 110 mg/dL and PG120 < 140 mg/dL).
(2) Is it necessary to list the cumulative percentage (Line 173), it could be clearer to list the percentage explained by each factor instead.
We appreciate the reviewer’s suggestion to list the percentage explained by each factor rather than the cumulative percentage for improved clarity. According to the reviewer’s suggestion, we have revised the results to show the individual contribution of each factor (39%, 21%, 10%, 5%, 5%) rather than the cumulative percentages (39%, 60%, 70%, 75%, 80%) that were previously listed (lines 220-221).
(3) Figure S10. How were the coefficients generated for Figure S10? No methods are given.
We conducted a multiple linear regression analysis in which time in range (TIR) was the dependent variable and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. We have added the following text to the figure legend (Fig. S8C) to provide a more detailed description of how the coefficients were generated:
Comparison of predicted Time in range (TIR) versus measured TIR using multiple regression analysis between TIR and factor scores in Figure 3. In this analysis, TIR was the dependent variable, and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. Each point corresponds to the values for a single individual.
(4) In https://cgm-basedregression.streamlit.app/, more explanation should be given about the output of the multiple regression. Regression is spelled incorrectly on the app.
We appreciate the reviewer for pointing out the need for a clearer explanation of the multiple regression analysis presented in the online tool
(https://cgmregressionapp2.streamlit.app/). We have added the description about the regression and corrected the typographical error in the spelling of “regression” within the app.
(5) The last section of results (starting at line 225) appears to be unrelated to the goal of predicting %NC.
We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variance, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. As reviewer 2 pointed out in minor comment #4, temporal autocorrelation can be difficult to interpret, so these visualizations were intended to provide intuitive examples for readers.
However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing technical simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502-526), while retaining only those that enhance understanding of the three glycemic components (Fig. 4A).
(6) Figure S2. The R2 should be reported.
We appreciate the reviewer for suggesting that we report R² in Figure S2. In the revised version, we have added the correlation coefficients and their 95% confidence intervals to Figure 1E.
(7) Multiple panels have a correlation line drawn with a slope of 1 which does not reflect the data or r^2 listed. this should be fixed.
We appreciate the reviewer’s concern that several panels included regression lines with a fixed slope of one that did not reflect the associated R² values. We have corrected Figures 1A–C and 3C to display regression lines representing the estimated slopes derived from the regression analyses.
Reviewer #3 (Public review):
Summary:
The researchers performed a genetic screen to identify a protein, ZNF-236, which belongs to the zinc finger family, and is required for repression of heat shock inducible genes. The researchers applied a new method to map the binding sites of ZNF-236, and based on the data, suggested that the protein does not repress genes by directly binding to their regulatory regions targeted by HSF1. Insertion of a reporter in multiple genomic regions indicates that repression is not needed in repetitive genomic contexts. Together, this work identifies ZNF-236, a protein that is important to repress heat-shock-responsive genes in the absence of heat shock.
Strengths:
A hit from a productive genetic screen was validated, and followed up by a series of well-designed experiments to characterize how the repression occurs. The evidence that the identified protein is required for the repression of heat shock response genes is strong.
Weaknesses:
The researchers propose and discuss one model of repression based on protein binding data, which depends on a new technique and data that are not fully characterized.
Major Comments:
(1) The phrase "results from a shift in genome organization" in the abstract lacks strong evidence. This interpretation heavily relies on the protein binding technique, using ELT-2 as a positive and an imperfect negative control. If we assume that the binding is a red herring, the interpretation would require some other indirect regulation mechanism. Is it possible that ZNF-236 binds to the RNA of a protein that is required to limit HSF-1 and potentially other transcription factors' activation function? In the extrachromosomal array/rDNA context, perhaps other repressive mechanisms are redundant, and thus active repression by ZNF-236 is not required. This possibility is mentioned in one sentence in the discussion, but most of the other interpretations rely on the ZNF-236 binding data to be correct. Given that there is other evidence for a transcriptional role for ZNF-236, and no negative control (e.g. deletion of the zinc fingers, or a control akin to those done for ChIP-seq (like a null mutant or knockdown), a stronger foundation is needed for the presented model for genome organization.
(2) Continuing along the same line, the study assumes that ZNF-236 function is transcriptional. Is it possible to tag a protein and look at localization? If it is in the nucleus, it could be additional evidence that this is true.
(3) I suggest that the authors analyze the genomic data further. A MEME analysis for ZNF-236 can be done to test if the motif occurrences are enriched at the binding sites. Binding site locations in the genome with respect to genes (exon, intron, promoter, enhancer?) can be analyzed and compared to existing data, such as ATAC-seq. The authors also propose that this protein could be similar to CTCF. There are numerous high-quality and high-resolution Hi-C data in C. elegans larvae, and so the authors can readily compare their binding peak locations to the insulation scores to test their hypothesis.
(4) The researchers suggest that ZNF-236 is important for some genomic context. Based on the transcriptomic data, can they find a clue for what that context may be? Are the ZNF-236 repressed genes enriched for not expressed genes in regions surrounded by highly expressed genes?
R0:
Review Comments to the Author
Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)
Reviewer #1: 1. The manuscript primarily shows that adding a visual inspection step increased the proportion of prosthetic feet deemed usable (83% to 94%). This outcome is predictable and does not constitute meaningful scientific innovation. The work reads as an operational description rather than rigorous research; novelty and contribution are therefore limited. 2. The proposed checklist is not validated. There is no mechanical or structural testing, no clinical functional outcomes, no prospective field evaluation, no inter-rater reliability assessment, and no sensitivity or specificity analysis. Accordingly, the checklist cannot be considered a standard, and the conclusions overstate the evidence. A formal validation phase is required. 3. Safety, mechanical integrity, and lifespan have not been evaluated. Visual inspection alone is inadequate for medical devices. No ISO-aligned static or cyclic loading tests are presented, nor are durability or time-in-service data available. This is a critical omission given the manuscript’s intent to inform international practice. 4. No patient-level outcomes are included (for example, fit success, comfort, skin issues, mobility, abandonment, repair frequency, or time-to-failure). Without these data, the practical value of the intervention remains uncertain. 5. Brand-level comparisons are underpowered, and model-level or material-level analyses are not presented. Despite acknowledging this limitation, the manuscript still interprets brand-related effects. 6. The Introduction and narrative sections are disproportionately long and repetitive; substantial condensation is recommended. In contrast, the Methods and Results require greater depth and clarity. 7. The statistical analysis is limited. Logistic models do not account for key confounders such as service age, storage duration, materials, or model type. Model diagnostics, effect sizes with confidence intervals, and multiple-comparison considerations are not reported. 8. Economic evaluation is absent. Donation and reuse programs in low and middle income settings are cost sensitive, and without cost modeling, the recommendations have limited actionable value. 9. Several claims are overstated, including suggestions related to circular economy effects, international standard development, and safety assurance. These assertions are not supported by the presented data and should be moderated.
Reviewer #2: It is suggested to review the Nippon Foundation/Exceed Cambodia in proposing the standards of P&O. The case study that has been done in Cambodia, Myanmar, Laos, Vietnam and Sri Lanka in will guide the current P&O Standard in low and middle income countries.
It is best to review the minimum standards of P&O in these countries as a underlying theory to govern the foundation of foot reuse and donation used.
A robust systematic reviews are vital in proposing standards for foot reuse and donations used in low and middle income countries. An updated literature are needed.
It is suggested to explore the preliminary findings in these low and middle income countries.
Reviewer #3: GENERAL This reviewer welcomes the ambition of the authors to start developing standards for donated prosthetic componentry to LMICs. Such standards are indeed much needed as one important factor to improve the quality of the prosthetic devices provided within LMICs.
The authors’ work has carefully been imbedded into a wealth of information and reasons for why the need is urgent for developing standards of donated prosthetic components. This information has been mindfully drafted including viewpoints and situation of many LMICs as well as HICs. Well done!
What left this reviewer wondering is why the development of the checklist has not been carried out with locals at the two centers, where MB and PM were able to collect the data of the stored feet. The rationale for not doing so should be included into the Limitations section.
Further, why has no testing of the developed checklist been carried out with the two centers? For example, dividing the available feet into two equal sized groups would have raised the opportunity to develop the checklist with one group of feet including the regression model and then test it on the remaining feet in the second group. Why was this not considered? One could classify all available feet as indicated in Table 1, but then consider only these feet who were mostly used in the field or were mostly available. Lowering the numbers of independent variables to the those variables that would represent the essence of the checklist best would have given the option for a regression model, or is this reviewer mistaken? These points should be discussed in the paper. In case the paper gets too long (word count), it is recommended to concise the actual discussion section as it provides similar points stated in the introduction.
And lastly, this reviewer does not think that retesting used feet similar to the stated ISO standards would be feasible. Instead, it might be worthwhile checking in other industries (aviation, deep-sea shipping) what type of non-mechanical controls for checking of wear and tear on materials/motors are available without dismantling motors or testing of used structures. Perhaps some light and/or sonar evaluation would be a way to check the mechanical structure of used prosthetic feet and other componentry without putting any more strain on the used materials. That might be some thoughts for the Future Work section. Also probable collaboration with universities in LMICs should be considered as a close source of additional brain power for the development of standards within a given country.
DETAILED The reviewer finds the word ‘prosthetics’ difficult and prefers the (correct) term ‘prosthetic componentry or prosthetic components’ instead. In her experience using the nomenclature of the P/O profession adds clarity in an interdisciplinary context. It is often unclear to people outside of or adjacent to the P/O profession that a ‘prosthetics’ is composed of different products, i.e. some industrial produced prosthetic components and – in most cases – a bespoken locally fabricated prosthetic socket. By using prosthetic components or prosthesis/prostheses when referring to the final product – the authors will signal directly that there are ‘pieces’ needed to compose an entire prosthesis. Further, using the correct term assists in distinguishing prostheses fabricated with componentry from those being fabricated by 3D printing, also a field needing standards for C2C design. Therefore, please change the wording accordingly within the entire paper – thank you!
Lines 165-168. This sentence seems to be incomplete – please check.
Line 229. This statement is incorrect. In Switzerland (and the reviewer is sure this is the case in France, Netherlands and the UK), prosthetic componentry has different life/warranty cycles depending on the type of prosthetic component and its model. Please rephrase this sentence pointing out that different prosthetic components and their models have different life/warranty cycles set by the industrial manufacturers.
Lines 284-286.This sentence is unclear: Are the authors checking prosthetic feet shipped to Africa prior to the study or as part of the study when these feet arrive in Africa? If they are analyzed prior to the study how do the authors make sure that the damage seen is indeed due to shipping and not due to storage, for example? If the authors controlled feet within the study time period, would the sentence not needed to be stated “… we review prosthetic feet ALSO in Africa.”? Or did the authors not review the feet at the study place, but only in Africa? Please clarify and rephrase – thank you. These clarifications/details seem to be better placed within the Materials and Methods Chapter.
Lines 287-311, in particular lines 311-317. Because the authors use an experimental setup, variables are usually considered as ‘independent’ or ‘dependent’. Please clarify what variables (independent, dependent) were considered. All variables the authors used to classify the different feet need be listed together with the rationale for the decision to include them into the regression model, including their order.
Ok – are the variables listed on line 314 the once considered as independent variables to classify a prosthetic foot as ‘reusable’ or ‘not reusable’? If so, why? In other words, why do the authors consider the ‘brand’ to be more important than the condition of the foot itself? Or is it the case because only those feet that passed the visual test of being 'usable' were included into the regression model? Up to this point, this reviewer understood the aim of the study as being to develop a set of criteria to classify a prosthetic foot as reusable or not. If a visual pre-selection needs to be carried out first, how good/robust is the regression model that follows? Please clarify and add this clarification to the text – thank you.
Lines 296-298. What variables (the authors call them ‘flaws’, if understood correctly) did the authors consider during the usability tests? How were these tests carried out? What happened with the feet the authors did consider as ‘not usable’: where they removed from the total sample of 366 feet (see below remarks to line 319)? For illustration: assuming the authors used for their visual check a variable called ‘cracks within the cosmetic’: did the authors classify a foot as still usable when only surface cracks were available, or did they exclude any foot with a crack in its shell? What were the criteria to classify a SACH foot as ‘usable’? More detailed information about the entire method for the visual checks and the resulting classification needs to be stated.
When did the authors add any of this variable into the regression model and they give some of the variables a weighting, i.e. were some of the variables considered more important than others, and if so, why? Please add this information and make a reference to Table 2 or better, create a new Table or flowchart showing the authors thoughts and decision process including the variables used upon which they based their decision to classify a foot as ‘usable’ or ‘not usable’. Clarification on this matter will strengthen the work as it helps the reader to better understand the authors’ rationale – thank you!
Line 319. Please start the results section with “A total of 366 feet where analyzed, 196 left and 170 right feet…”
Line 320. Please add “… and A brand could be identified for… ” – thank you.
Lines 320-322. Based on the information given in Table 1, there were 12 brands identified as categories plus one category with feet unknown to the authors. Because ‘unknown’ is not a brand, the sentence needs to be rephrased – thank you.
Lines 353-357. These sentences seem to be missing some text, at least, they do not make sense to this reviewer. In lines 353-355 the authors state that the feet of Trulife and Ossur performed worst. Then in the following lines the authors state that they are (nevertheless??) considered as appropriate for donation. Please clarify – thank you.
Table 4. Please explain/add, either in the corresponding text (lines 350 and subsequently) how the negative signs have to be read. Why has the measurement made against ‘BioQuest’ and not ‘Janton’ and how do the authors explain the difference in the coefficient between these two feet? Both feet were represented with n=1, why is there a difference? Please explain and add the clarification into the text within the Discussion section – thank you.
Figure 2. Please add to Fig. 2, a, b, and c, as done in Fig. 1. This assists in clarifying matters. Please add this clarification into the text: line 364 = Figure 2a; line 378: delete (Figure 2) and add after ‘NCRPPD’ (Figure 2b); line 379: add (Figure 2c) after ‘K4C’.
Line 388. Add at the end of the sentence ‘(Figure 3)’.
Line 395. Please expand this sentence like or similar as proposed “…can be a burden to the recipient LMIC [31, 39,40], as indicated by Marks et al (2019 – Please check PLOS rules!!):” and then have the quotation followed. This will connect the quotation with the text and makes it easier to read.
Line 469. Please check this sentence – the word ‘design’ seems to be twice stated. If this is correct, consider rephrasing as the sentence reads strange, thank you.
Checklist questions: • Question (1): Please add example of ‘completeness’ of a prosthetic foot, as you did for Question 2. • Question (3): Add examples of what the authors consider ‘compliant’: forefoot, heel, middle section? All of these, only one? Usable for light persons, like children if only one part of the foot is too compliant? If so, which one do the authors consider as the most important variable for a foot to be still considered ‘usable’?
Line 529. Word missing: “..cost of what” was the biggest barrier? Please complete.
Line 533. Please consider replacing ‘in this way’ with ‘Therefore’ or similar that would connect clearer the content of the previous paragraph with this new one.
Line 544. Typos: ‘reduce’ instead of ‘reduces’, ‘limit’ instead of ‘limits’.
Line 567. Stop the sentence after ‘repair of equipment’ and continue with a new sentence starting, for example with “Hamner et al (please check PLOS rules!!) point out that … and than add the quotation.
Line 570. Please delete ‘etc.’ This should not be used in a text as it lefts the reader wonder what else – in this case – could have had an influence. Instead write ‘for example’ and list the three most missing points that were not considered.
Line 620. Keep the number correct: the authors tested 306 feet. The number speaks for itself, no need to bolster it. To this reviewer bolstering looks bad, stay with the figures.
Line 622. Replace ‘are’ with ‘were’, as this was the case for the authors' sample. Samples of other authors might vary.
Reviewer #3 (Public review):
Summary:
This paper reports new findings regarding neuronal circuitries responsible for female post-mating responses (PMRs) in Drosophila. The PMRs are induced by sex peptide (SP) transferred from males during mating. The authors sought to identify SP target neurons using a membrane-tethered SP (mSP) and a collection of GAL4 lines, each containing a fragment derived from the regulatory regions of the SPR, fru, and dsx genes involved in PMR. They identified several lines that induced PMR upon expression of mSP. Using split-GAL4 lines, they identified distinct SP-sensing neurons in the central brain and ventral nerve cord. Analyses of pre- and post-synaptic connection using retro- and trans-Tango placed SP target neurons at the interface of sensory processing interneurons that connect to two common post-synaptic processing neuronal populations in the brain. The authors proposed that SP interferes with the processing of sensory inputs from multiple modalities.
Strengths:
Besides the main results described in the summary above, the authors discovered the following:
(1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.
(2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.
Author response:
Public Reviews:
Reviewer #1 (Public Review):
Areas of improvement and suggestions:
(1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"
The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.
We agree that this information is important to distinguish neurons which are direct SP targets from neurons which are involved in directing reproductive behaviors. We have now tested drivers for these neurons and added these data in Fig 3 (SAG neurons) and as Suppl Figs S4 (SPSN and genital tract neuron drivers SPR3 and SPR21), Suppl Fig S6 (overlap in single cell expression atlas), Suppl Fig S7 (overlap of SPSN split drivers with SPR8, fru11/12 and dsx split drivers in the brain inducing PMRs) and Suppl Fig S9 (pC1, OviDNs, OviENs, OviINs and vpoDN).
The newly added data are in full support of our conclusion that SP targets central nervous system neurons, which we termed SP Response Inducing Neurons (SPRINz). In particular, we find lines that express in genital tract neurons, but do not induce an SP response (Supp Figs S4, S7 and S10) or do not express in genital tract neurons and induce an SP response (Fig 2 and Supp Fig S2).
We have analysed the expression of SPSN in the brain and VNC and find expression in few neurons (Suppl Fig S4). This result is consistent with expression of the genes driving SPSN expression in the single cell expression atlas indicating overlap of expression in very few neurons (Suppl Fig S6). We have already shown that FD6 (VT003280) which is part of the SPSN splitGal4 driver, expresses in the brain and VNC and can induce PMRs from SP expression (Fig 4).
We have taken this further to test another SPSN driver (VT058873) in combination with SPR8, fru11/12 and dsx and find PMRs induced by mSP expression (Suppl Fig S7). Moreover, if we restrict expression of mSP to the brain with otdflp we can induce PMRs from mSP expression and obtain the same response by activating these brain neurons (Suppl Fig S7). We note that the VT058873 ∩ fru11/12 intersection in combination with otdflp stopmSP or stopTrpA1 in the head, did not result in PMRs. Here, PMR inducing neurons likely reside in the VNC, but currently no tools are available to test this further.
We further tested pC1, OviDNs, OviENs, OviINs and vpoDN for induction of PMRs from expression of mSP. We are pleased to see that OviEN-SS2s, OviIN-SS1 and vpoDN splitGAl4 drivers can reduce receptivity, but not induce oviposition (Suppl Fig S8). We predicted such drivers based on previously published data (Haussmann et al. 2013), which we now validated.
(2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.
The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We have added images for “head” expression for tshGAL and adjusted our statement to be pre-dominantly expressed in the VNC in Suppl Fig 1.
(3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.
According to the reviewers suggestion, we have clarified the specificity of VT003280 and now say that this is FD6.
(4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".
We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.
In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.
(5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.
Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.
Reviewer #2 (Public Review):
Strength:
The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.
We thank reviewer 2 for recognizing the advance of our work.
Weakness:
Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.
We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.
We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.
Reviewer #3 (Public Review):
Strengths:
Besides the main results described in the summary above, the authors discovered the following:
(1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.
(2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.
We thank reviewer 2 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response. To further substantiate these findings we now have added a splitGal4 nSyb ∩ ppk which expresses in genital tract neurons, but does not induce PMRs from mSP expression.
Weaknesses:
(1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned, neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.
We have added a positive control for ppk expression by combining the ppk-DBD line with a nSyb-AD which expresses in all neurons in Supp Fig S8. This experiment confirms our previous observations that ppk splitGal4 in combination with other drivers does not induce an SP response despite driving expression in genital tract neurons. We have expanded the discussion section to point out that we have identified additional cells in the brain expressing ppkGAL4, but expression of split-GAL4 ppk is absent in these cells. Part of this work has previously been published (Nallasivan et al. 2021). Accordingly, we amended the text to say when expression was achieved with ppkGAL or ppk splitGAL4.
(2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.
We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.
Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.
Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.
SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.
Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.
Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.
We have added this aspect to the discussion section.
What to do about cyberbullying? There are very few interventions specifically targeting cyberbullying behaviours, and even fewer rigorous evaluations of these interventions (Cioppa, O’Neil, & Craig, 2015; Nocentini, Zambuto, & Menesini, 2015). Adolescents often refuse to seek help from an adult in fear that their technology will be taken away (Tokunaga, 2010). Research results suggest that more work needs to be done around making reporting safer and more convenient, as well as ensuring that appropriate actions are taken after a report is received. Engaging both adults and youth in this process is essential. Innovative, youth friendly solutions are needed. For example, infographicsNote 1, online games, appsNote 2, or softwareNote 3 may be an effective way of educating youth about cyberbullying and changing patterns of interaction. However, none of these innovative initiatives have been rigorously tested for effectiveness. Defending behaviour (i.e., attempting to stop bullying and provide comfort and support to victimized peers) has been found to significantly reduce bullying in schools and have a protective effect on victimized peers (Sainio et al, 2010). Defenders are most likely to be female or empathetic males (Gini et al., 2007). The effects of defending behaviour on bullying and on the adjustment of victimized students has informed a number of bullying prevention and intervention programs (e.g., KiVa: Kärnä et al., 2011; Befriending Interventions: Menesini et al., 2003), and Promoting Alternative Thinking Strategies (PATH): Domitrovich, Cortes, & Greenberg, 2007). However, recent research suggests that defenders may experience problems due to defending (Sandre & Craig, 2015). In a recent fMRI study, Sandre and Craig (2015) found that, compared to controls, defenders display greater neural responsivity in the posterior insular cortex, an area of the brain associated with emotional arousal and social pain when witnessing the victimization of peers. More research is needed regarding the risks and benefits of defending behaviours towards cyberbullying. Previous research on protective factors for bullying perpetration and victimization in general may apply to cyberbullying. For example, a positive school climate and feeling connected to school have both been found to be protective factors against bullying (Resnick et al., 1997; Williams & Guerra, 2007). Therefore, the same may be the case for cyberbullying. Fanti and colleagues (2012) found that parental/family support helped to protect youth from engaging in cyberbullying behaviours, and from being victimized online. This association was especially important for children from single-parent families who, compared to children from intact families, were more likely to be targets of cyberbullying by their peers. Social support from peers is also an important protective factor (Sainio et al, 2010).
The government reports highlight that cyberbullying is widespread and often chronic, affecting many youth for long periods. but seem to lack the resources to achived helping
Acknowlegement This study was funded by Public Safety Canada. Start of text box Overview of the study Using multiple surveys, this article examines cyberbullying and cybervictimization among Canadian youth and young adults aged 12 to 29. With rates of online and social media use being high among young people, there is an increased risk of online forms of bullying and victimization. This paper examines the prevalence of cyberbullying and cybervictimization among young people, with a focus on identifying the at-risk populations, behaviours related to prevalence, such as internet and smart phone usage, and the association of online victimization with other forms of victimization, such as fraud and assault. Some young people are more vulnerable to cybervictimization, including Indigenous youth, sexually diverse and non-binary youth, youth with a disability, and girls and women. Cybervictimization increases during adolescence and remains high among young adults in their early 20s. It then tapers off in the late 20s. Increased internet usage, as well as using smart phones before bed and upon waking, are associated with an increased risk of being cyberbullied. For youth aged 12 to 17, not using devices at mealtime, having parents who often know what their teens are doing online, and having less difficulty making friends act as potential buffers against cybervictimization. Cybervictimized young adults often change their behaviour, both online—from blocking people and restricting their own access—and offline—such as carrying something for protection. Cybervictimized young adults were also more likely to have experienced other forms of victimization such as being stalked and being physically or sexually assaulted. End of text box Introduction Internet use is now woven into the fabric of Canadian society. It has become a large part of everyday life, whether it is in the context of online learning, remote working, accessing information, e-commerce, obtaining services (including healthcare), streaming entertainment, or socializing. And while nearly all Canadians use the internet to some degree, Canadians under 30 represent the first generation born into a society where internet use was already ubiquitous. As such, it may not be surprising that Canadians under the age of 30 are more likely to be advanced users of the internet, compared to older generations.Note In addition, they often spend many hours on the internet, with this usage increasing during the COVID-19 pandemic, more so than any other age group.Note Besides proficiency and intensity, the way in which young people interact with the internet is often different from older generations. Previous Statistics Canada research has shown that younger people are more likely than their older counterparts to use social media, more likely to use multiple social media apps, and engage in more activities on these apps.Note This use has been related to some negative outcomes for younger people, including lost sleep and trouble concentrating.Note Social media and online activities may also place youth and young people at increased risk of cybervictimization or cyberbullying. Numerous studies have investigated both the prevalence and impact of cybervictimization, noting that youth are often at increased risk.Note While comparisons across studies are often difficult because of definitional differences, ages of the youth being studied, and the time frames, there is consensus on the criteria for measuring cybervictimization. These include (1) intentions to harm the victim, (2) power imbalance between the bully and victim, (3) the repeated nature of aggression, (4) use of electronic devices (including phones or computers), and (5) possible anonymity.Note This article examines cyberbullying among youth and young adults aged 12 to 29 in Canada using four population-based surveys. The Canadian Health Survey of Children and Youth (CHSCY) collects information on cyberbullying among youth aged 12 to 17, while three surveys capture this information for adults aged 18 to 29. These surveys include the Canadian Internet Use Survey (CIUS), the General Social Survey (GSS-Cycle 34) on Victimization and the Survey of Safety in Public and Private Spaces (SSPPS). Each will be used to help paint a picture of cyberbullying of younger people in Canada.Note Definitions and measures of cyberbullying within each of the surveys are detailed in “Cyberbullying content across four Statistics Canada surveys” text box. The study starts by discussing the prevalence of, and risk factors associated with, cyberbullying among teens aged 12 to 17. This is followed by an analysis of cyberbullying among young adults aged 18 to 29. Along with providing a profile of cyberbullying, another goal is to highlight data and knowledge gaps in this area and potential areas where future surveys and research should focus. One-quarter of teens experience cyberbullying In 2019, one in four teens (25%) aged 12 to 17 reported experiencing cyberbullying in the previous year (Chart 1). Being threatened or insulted online or by text messages was the most common form, at 16%. This was followed by being excluded from an online community (13%) and having hurtful information posted on the internet (9%). Among those aged 12 to 17, rates of cyberbullying increased with age, rising from 20% at age 12 to 27% by age 17. This perhaps reflects an increased use of the internet, and specifically social media usage with age. The largest increase in cyberbullying prevalence related to being threatened or insulted online or by text messages (from 11% at age 12 to 19% at age 17). Data table for Chart 1 Data table for chart 1 Table summary This table displays the results of Data table for chart 1 percentage (appearing as column headers). percentage Total youth aged 12 to 17 25 Hurtful information was posted on the internet 9 Excluded from an online community 13 Threatened/insulted online or by text messages 16 Source: Statistics Canada, Canadian Health Survey on Children and Youth, 2019. Besides age, the likelihood of being victimized online varied by gender, sexual attraction, Indigenous identity and educational accommodations. Generally, boys and girls have quite similar prevalence of cybervictimization. For instance, about 1 in 4 (24% for boys and 25% for girls) reported that they experienced any of the three forms of cybervictimization. Non-binary teens, however, experienced cybervictimization at significantly higher levels than both boys and girls. Over half (52%) of teens who reported a gender other than male or female said that they were cybervictimized in the past year. The higher prevalence among non-binary teens was seen across all types of cybervictimization. The greatest difference, however, was seen for being excluded from an online community. The proportion of non-binary teens who reported this type of cybervictimization was about three and a half times the proportion recorded for boys and girls (45% versus 12% for boys and 13% for girls). In addition, youth aged 15 to 17Note who identified as having the same gender attraction had a significantly higher likelihood of being cyberbullied (33%), compared to their peers who were exclusively attracted to a different gender (26%). This increased risk was seen for all types of cyberbullying but was most pronounced for hurtful information being posted on the internet and being excluded from an online community. First Nations youth (off-reserve) are at greater risk of cyberbullying First NationsNote youth living off-reserve were more likely than their non-Indigenous peers to have been cyberbullied in the past year. In particular, 34% of First Nations youth reported being bullied online, compared to 24% of non-Indigenous youth. The risk was heightened for certain types of cyberbullying, including having hurtful information posted on the internet and being threatened/insulted online or by text messages. These higher levels of cybervictimization mirror the overall higher rates of victimization for Indigenous people, which could be rooted in the long-standing legacy of colonialism resulting in discrimination and systemic racismNote (Table 1). No significant differences were observed for Inuit and Métis youth.Note Most racialized groups had either similar or lower prevalence rates of cyberbullying compared to non-racialized and non-Indigenous youth. For example, 16% of the South Asian youth and 18% of Filipino youth said that they had experienced cyberbullying in the past year, much lower than the 27% of non-racialized, non-Indigenous youth who reported being victimized online. In addition, those born in Canada had a higher likelihood of cyberbullying, compared to the immigrant youth population (26% versus 19%). This was seen for all forms of online victimization. The differences in risk may be due to variations in frequency of going online. Indeed, previous research has shown that immigrants are less likely to be advanced users of the internet, and are more often non-users, basic users or intermediate users.Note Table 1 Prevalence of cyberbullying among youth aged 12 to 17, by population group, 2019 Table summary This table displays the results of Prevalence of cyberbullying among youth aged 12 to 17. The information is grouped by Population Subgroups, ages 12 to 17 (appearing as row headers), Types of cyberbullying, Hurtful information was posted on the internet, Threatened/insulted online or by text messages, Excluded from an online community and Any of the 3 types of cyberbullying, calculated using percent units of measure (appearing as column headers). Population group Types of cyberbullying Hurtful information was posted on the internet Threatened/insulted online or by text messages Excluded from an online community Any of the 3 types of cyberbullying percentage Gender Boys (ref.) 7 16 12 24 Girls 10 16 13 25 Non-binary 30Note E: Use with cautionNote * 34Note E: Use with cautionNote * 45Note E: Use with cautionNote * 52Note E: Use with cautionNote * Indigenous identity First Nations 14Note E: Use with caution 23Note * 16Note E: Use with caution 34Note * Métis 12Note E: Use with caution 20 13Note E: Use with caution 30 Inuit 14Note E: Use with caution 30Note E: Use with caution Note F: too unreliable to be published 36Note E: Use with caution Non-Indigenous (ref.) 8 16 13 24 Racialized group Black 8 16 12 24 Chinese 7 11Note * 12 22 Filipino 10 10Note * 7Note * 18Note * South Asian 5Note * 9Note * 9Note * 16Note * Not part of a racialized group (ref.) 9 18 14 27 Country of Birth Canada (ref.) 9 17 14 26 Outside Canada 5Note * 11Note * 10Note * 19Note * Gender attractionTable 1 Note 1 Same gender (ref.) 15 22 17 33 Opposite gender 9Note * 18 13Note * 26Note * Youth has an education accomodation Yes 11Note * 19Note * 15 27Note * No (ref.) 7 14 12 23 Don't know 12Note * 19Note * 15 29Note * E use with caution F too unreliable to be published Note 1 Only asked of youth aged 15 to 17. Return to note 1 referrer Note * significantly different from the reference category (ref.) (p<0.05) Return to note * referrer Source: Statistics Canada, Canadian Health Survey of Children and Youth, 2019. Higher likelihood of cyberbullying among youth with education accommodation Based on results from CHSCY, having an education accommodation, such as an Individual Education Plan (IEP), Special Education Plan (SEP) or Inclusion and Intervention Plan (IIP), places youth at increased risk of cyberbullying. Overall, 27% of youth with some type of education accommodation for learning exceptionalities or special education needs were bullied online, compared to 23% of their peers without accommodation. The risk was greatest when the cyberbullying incidents involved hurtful information being posted on the internet or being threatened or insulted online or by text messages. The increased risk of cyberbullying among those with an education accommodation peaks at age 16, with 36% of 16 year-olds with an educational accommodation reporting being cyberbullied compared with 24% of youth without an accommodation.Note Frequent use of social media tied to higher prevalence of cyberbullying among youth Because of the potential negative impacts of cyberbullying, including the effects on mental wellbeing, it is important to understand the factors that can expose youth to online harm. One of these possible factors relates to the frequency of online activity. The CHSCY asked youth how often they go online for social networking, video/instant messaging, and online gaming. The majority (about 80%) said that went online at least weekly, with 60% saying they went on social network platforms several times a day, and just over 50% reporting that they used video or instant messenger apps at this same level of frequency. About 1 in 3 (32%) teens said that they went online for gaming at least once a day or more. In general, results from CHSCY show that more frequent social networking, instant messaging use and online gaming had a strong association with an increased risk of cybervictimization. For instance, among youth who stated that they constantly use social networking, video and instant messaging or online gaming, about one-third (34%, 36% or 30% respectively) said that they had been cyberbullied in the past year. Conversely, the proportion reporting cybervictimization drops to around 20% when social networking and video and instant messaging was used less than once a week (22%, 22%, and 24% respectively). The risk decreases even further to less than 15% when youth never utilized social networking or video and instant messaging apps (Table 2). Table 2 Prevalence of cyberbullying among youth aged 12 to 17, by frequency of social media use and gender, 2019 Table summary This table displays the results of Prevalence of cyberbullying among youth aged 12 to 17. The information is grouped by Frequency of social media use (appearing as row headers), Proportion cyberbullied in past year, by gender, Total, Boys, Girls, Social networking , Video or instant messaging and Online Gaming , calculated using percent units of measure (appearing as column headers). Frequency of social media use Proportion cyberbullied in past year Total Boys Girls Social networking Video or instant messaging Online Gaming Social networking Video or instant messaging Online Gaming Social networking Video or instant messaging Online Gaming percentage Constantly 34Note * 36Note * 30 33Note * 32Note * 30 34Note * 38Note * 28 Several times a day 27Note * 27Note * 30 26 27 30 27Note * 27Note * 29 Once a day (ref.) 21 23 27 22 25 26 20 20 29 Weekly 27 24 24 30 27 23 21 21 27 Less than weekly 22 20 24 22 21 19Note * 21 17 29Table 2 Note † Never 12Note * 14Note * 22Note * 14Note * 15Note * 15Note * 9Note * 13Note * 24Table 2 Note † Note † significant gender difference (p < 0.05) Return to note † referrer Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Canadian Health Survey of Children and Youth, 2019. No gender differences were found between social media, video or instant messaging use and cybervictimization.Note For instance, for both boys and girls, the proportion who said they were cybervictimized in the past year was over 30% if they constantly checked their social networking and instant messaging applications, with the risk decreasing similarly with lower levels of use. The risk of cybervictimization increases with age, from 12 to 17, mirroring the increased frequency in the use of social networking, video and instant messaging as youth age. Going online more frequently had the same impact on the cybervictimization risk for Indigenous and non-Indigenous youth. That is, going on social media more frequently increased the risk to the same extent for both Indigenous youth and non-Indigenous youth. However, this was not the case for all youth. For instance, the risk associated with more frequent social media and gaming use was greater for non-racialized youth than it was for racialized youth. Cyberbullying is sometimes related to usage patterns of electronic devices In addition to frequency of use, usage pattern of electronic devices may also be related to risk. Among youth aged 12 to 17, three-quarters (75%) used an electronic device before falling asleep in the past week. This usage pattern rises from a low of 54% at age 12 to a high of 92% by age 17. Using electronic devices before going to sleep appears to increase the risk of being cyberbullied. About 27% of youth that used their electronic device before going to sleep were cyberbullied in the past year, compared to 19% who had not used their device before going to sleep. The increased risk was most often related to being threatened or insulted online or by text messages (18% versus 11% who had not used a device before going to sleep) (Chart 2). Data table for Chart 2 Data table for chart 2 Table summary This table displays the results of Data table for chart 2 Yes, a device was used and No, a device was not used (ref.), calculated using percentage units of measure (appearing as column headers). Yes, a device was used No, a device was not used (ref.) percentage Total youth aged 12 to 17 27Note * 19 Hurtful information was posted on the internet 10Note * 5 Threatened/insulted online or by text messages 18Note * 11 Excluded from an online community 14Note * 10 Note * significantly different from the reference category (ref.) (p<0.05) Return to note * referrer Source: Statistics Canada, Canadian Health Survey of Children and Youth, 2019. Use of electronic devices before going to sleep and risk of cybervictimization is fairly constant across age, but appears to be highest at age 15, where 31% had been cybervictimized in the past year. This proportion falls to 16% if they did not use their device before bedtime. Results suggest that parents may, in some cases, serve as protective agents, by not allowing electronic devices at the dinner table and having a greater knowledge of what their teens are doing online. For most youth (71%), parents did not allow electronic devices during the evening meal. However, 21% of youth said that their parents allowed electronic devices at the evening meal and another 7% said that their family does not eat together. The association with cybervictimization, especially being threatened or insulted online or by text messages, increases if electronic devices were allowed at dinner (18% versus 15%). However, there are no differences with respect to other types of cybervictimization. The real risk of cybervictimization is not whether a device was used, but whether the family ate together, which can be influenced by financial or other circumstances, such as work schedules or extracurricular activities. Across all types of cybervictimization, 35% of youth who had not eaten dinner with parents reported that they had been cybervictimized in the past year, significantly greater than the 26% of youth who said that electronic devices were allowed at the evening meal, and the 23% who said that electronic devices were not allowed. This risk is strongest for ages 12 and 16. Parents’ knowledge of youth’s online activities may help lower the association with cybervictimization. Most Canadian youth who go online have some types of rules or guidelines established by their parents, which is usually more stringent for younger children and is typically relaxed as they age and gain more trust.Note In 2019, the proportion who stated that their parents often or always know what they are doing online was quite high. In all, 63% stated this level of parental knowledge, while another 37% said that their parents never or only sometimes knew what they were doing online. Parental knowledge about online activity declines with age. At age 12, 77% of youth state that their parents often or always know what they are doing online, which drops to 51% by age 16 and to 49% by age 17. As may be expected, increased parental knowledge of teen’s online activity was associated with a lower risk of cybervictimization (Chart 3). In particular, close to a third of youth (29%) who said their parents never or only sometimes knew about their online activities reported that they had been cybervictimized. This proportion drops to 22% when parents often or always knew what their teen was doing online. A similar pattern is noted regardless of type of cybervictimization experienced. Data table for Chart 3 Data table for chart 3 Table summary This table displays the results of Data table for chart 3 Parents never or sometimes know online activity and Parents often or always know online activity (ref.), calculated using percentage units of measure (appearing as column headers). Parents never or sometimes know online activity Parents often or always know online activity (ref.) percentage Total youth aged 12 to 17 29Note * 22 Hurtful information was posted on the internet 12Note * 7 Threatened/insulted online or by text messages 20Note * 13 Excluded from an online community 15Note * 12 Note * significantly different from the reference category (ref.) (p<0.05) Return to note * referrer Source: Statistics Canada, Canadian Health Survey of Children and Youth, 2019. Youth who have difficulty making friends are most vulnerable to online victimization Based on previous research,Note knowing more people and having more friends, especially close friends can perhaps shield youth from being victimized, and if they are victimized, having friends can perhaps offset some of the negative impacts. Therefore, it is expected that individuals who have a difficult time making friends may be at greater risk of being victims of cyberbullying, as the person or persons victimizing them may believe them to be easier targets of abuse. In general, across all youth aged 12 to 17, most do not have any difficulty making friends, based on responses from parents. Just over 80% of parents reported that their teen had no difficulty in making friends, while 15% said that their teen had some difficulty and around 4% said that they had a lot of difficulty or could not do it at all. Across individual ages, these proportions are similar. Also, boys and girls have very similar patterns of ease of making friends (parents of around 80% of both boys and girls said that they had no difficulty making friends).Note It bears mentioning that these are parents’ reports about their child’s purported difficulty making friends and therefore may not be the most accurate. Parents may not be fully aware of how well their child develops friendships, as this information may be intentionally hidden from them. With respect to cybervictimization, teens that have greater difficulty making friends have a greater risk of being cybervictimized than their peers without any difficulty. For example, 23% of youth whose parents said they have no difficulty making friends reported that they had been victims of cyberbullying in the past year. This proportion climbs 12 percentage points to 35% if teens had a lot of difficulty or were unable to make friends (Table 3). A similar pattern was observed regardless of the type of cyberbullying. The relationship between the ease of making friends and cyberbullying was seen across all ages, though the gap appears to be greatest at age 16. For example, almost half (44%) of 16-year-old teens who had trouble forming friendship were cyberbullied, compared with 24% who had no difficulty making friends. Girls were especially vulnerable to cyberbullying when they had trouble making friends.Note Overall, 40% of girls whose parents said had a lot of difficulty making friends, or were unable to do so, were cybervictimized. This compares to 23% of girls who had no difficulty making friends. The corresponding difference for boys was much lower, with 28% being cyberbullied if they had trouble making friends and 23% without any difficulty. Table 3 Prevalence of cyberbullying among youth aged 12 to 17, by ease of developing friendships, 2019 Table summary This table displays the results of Prevalence of cyberbullying among youth aged 12 to 17. The information is grouped by Cyberbullying type, age and gender (appearing as row headers), Difficulty making friends, No difficulty (ref.), Some difficulty and A lot of difficulty /Cannot make friends, calculated using percent units of measure (appearing as column headers). Cyberbullying type, age and gender Difficulty making friendsTable 3 Note 1 No difficulty (ref.) Some difficulty A lot of difficulty or Cannot make friends percentage Total youth aged 12 to 17 23 32Note * 35Note * Type of cyberbullying Hurtful information was posted on the internet 7 14Note * 15Note * Threatened/insulted online or by text messages 15 22Note * 22Note * Excluded from an online community 12 18Note * 24Note * Age 12 years 18 27Note * 29 13 years 21 32Note * 32 14 years 22 28 39 15 years 27 32 28 16 years 24 35Note * 44Note * 17 years 24 40Note * 39 Gender Boys 23 29Note * 28 Girls 23 35Note * 39Note * Note 1 Based on responses from parents. Return to note 1 referrer Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Canadian Health Survey of Children and Youth, 2019. Young adults: Women and young adults most often the target of cybervictimization The remainder of the study examines the patterns of cybervictimization among young adults aged 18 to 29. To understand cyberbullying among this age group, three population-based surveys were used. These complementary surveys, while differing in survey design and measurement, shed light on the nature of cyberbullying and the young people most at risk. According to the 2018 SSPPS, 25% of young people aged 18 to 29 experienced some form of cybervictimization, with the most common being receiving unwanted sexually suggestive or explicit images or messages (15%) and aggressive or threatening emails, social media or text messages (13%) (Table 4). Young women were more often the target of the online abuse, with a prevalence almost double the rate for young men (32% versus 17%). This gender difference was even more pronounced for receiving unwanted sexually suggestive or explicit material, where young women were almost three times as likely to be targeted (22% versus 8%).Note Therefore, the main gender differences appear to be with respect to cybervictimization of a sexualized nature, as there were no differences between men and women on solely aggressive content without sexual content.Note Table 4 Prevalence of cybervictimization among young people aged 18 to 29, by age group, gender and type of cybervictimization, 2018 Table summary This table displays the results of Prevalence of cybervictimization among young people aged 18 to 29. The information is grouped by Type of cybervictimization (appearing as row headers), Total, Men, Women, Overall, 18-21 (ref.), 22-25 and 26-29, calculated using percent units of measure (appearing as column headers). Type of cybervictimization Total Men Women Young people aged 18 to 29 18 to 21 years (ref.) 22 to 25 years 26 to 29 years Young people aged 18 to 29 18 to 21 years (ref.) 22 to 25 years 26 to 29 years Young people aged 18 to 29 18 to 21 years (ref.) 22 to 25 years 26 to 29 years percentage Total 25 31 25 19Note * 17 25 16 13Note * 32Table 4 Note † 38Table 4 Note † 34Table 4 Note † 26Table 4 Note †Note * Received any threatening or aggressive emails, social media messages or text messages where you were the only recipient 13 14 13 11 9 12 8 8 16Table 4 Note † 17 18Table 4 Note † 14 You were the target of threatening or aggressive comments spread through group emails, group text messages or postings on social media 6 6 7 6 5 7 5 4 8 6 9 7 Somone posted or distributed (or threatened to) intimate or sexually explicit videos or images of you without your consent 2 2 3 2 2 3 2 1 3 2 5 3 Someone pressured you to send, share, or post sexually suggestive or explicit images or messages 6 10 5Note * 4Note * 3 5 3 3 9Table 4 Note † 16Table 4 Note † 8Table 4 Note †Note * 6Note * Someone sent you sexually suggestive or explicit images or messages when you did not want to receive them 15 20 17 10Note * 8 13 8 5Note * 22Table 4 Note † 27Table 4 Note † 26Table 4 Note † 16Table 4 Note †Note * Note † significant gender difference for a particular group (p < 0.05) Return to note † referrer Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Survey of Safety in Public and Private Spaces, 2018. For some types of cybervictimization, there was a significantly greater risk for young adults aged 18 to 21, as compared with young adults aged 26 to 29. For instance, about 20% of young adults aged 18 to 21 reported receiving unwanted sexually suggestive or explicit images or messeges in the last year, double the 10% of young adults aged 26 to 29 who said they also received these types of unwanted images or messages. Young adults aged 18 to 21 were also twice as likely to report being pressured to send, share or post sexually suggestive or explicit images or messages (10%) than their older counterparts (5% for ages 22 to 25 and 4% for ages 26 to 29). The relationship between cybervictimization and age is similar for both men and women, though rates are always higher for women. Both men and women have about a 12-percentage point gap between ages 18 and 21 and 26 and 29 in experiencing any of the five forms of cybervictimization in the past year (25% versus 13% for men, 38% versus 26% for women). With respect to the individual forms of cybervictimization, the largest decreases by age group related to sexual victimization, especially for women. For example, for women, there was about a 10-percentage point decline from age 18-21 to age 26-29 on being pressured to send, share or post sexually suggestive or explicit images or messages (16% to 6%) and receiving unwanted sexually suggestive or explicit images or messages (27% to 16%). Greater risk of cybervictimization among LGBTQ2 young adults Data from the SSPPS also show that LGBTQ2Note young adults were more likely than their non-LGBTQ2 counterparts to have experienced cybervictimization (49% versus 23%).Note ,Note Moreover, the decrease in the risk of cybervictimization across age groups is not seen among the LGBTQ2 population. That is, the proportion experiencing cybervictimization at ages 18 to 21 and late 20s is similar for LGBTQ2 adults, whereas the prevalence of cyberbullying among non-LGBTQ2 young adults declines by about half between the same ages (30% at age 18 to 21 to 18% at ages 26 to 29). Interestingly, among the LGBTQ2 population, the age group with the highest rates of cybervictimization are young adults aged 22 to 25 (at 58%). This is a rare instance of a nonlinear age trend with respect to cybervictimization declining from age 18 to age 29.Note First Nations young adults are more frequently the victims of cyberbullying Almost half (46%) of First Nations young people living off-reserve had experienced some form of cyberbullying in the preceding year. This was nearly double the share of non-Indigenous young adults (26%). There was no increased risk among Métis or Inuit young people.Note Among racialized groups, the likelihood of being cyberbullied was similar to the non-racialized, non-Indigenous population. There was also no difference in risk by immigrant status. Table 5 Prevalence of cybervictimization among young people aged 18 to 29, by selected characteristics, 2018 Table summary This table displays the results of Prevalence of cybervictimization among young people aged 18 to 29. The information is grouped by Selected characteristics (appearing as row headers), Percent (appearing as column headers). Selected characteristics percentage Total 25 Gender Men (ref.) 17 Women 32Note * Racialized population Black 23 Chinese 19 Filipino 16 South Asian 18 Non-racialized (ref.) 27 Immigrant status Immigrant (ref.) 20 Canadian-born 27 Indigenous identity First Nations 46Note * Métis 31 Inuit 13 Non-Indigenous (ref.) 26 Disability No 17Note * Yes (ref.) 39 Sexual/gender diversity LGBTQ2 (ref.) 49 Non-LGBTQ2 23Note * Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Survey of Safety in Public and Private Spaces, 2018. Young adults with a disability are more often targeted Young adults aged 18 to 29 with a disabilityNote were significantly more likely to report that they were cybervictimized in the past year. Across all forms of cybervictimization measured in the SSPPS, 39% of young adults with a disability reported having experienced cyberbullying in the past year, compared with 17% of the nondisabled young adult population (Table 5).Note The SSPPS also allows for the examination of gender differences among young men and women with a disability. Almost half (46%) of women with a disability had experienced cybervictimization in the past year, much higher than the 22% of women without a disability. The difference for men was less marked. In 2018, 27% of men with a disability were targeted online, compared to 14% of other young men. The severity of the disability also appears to heighten risk. Based on the SSPPS, 56% of young adults with a severe to very severe disability stated that they had been cybervictimized in the past year, while 46% with moderate disability and 34% of those with a mild disability stated the same. This compares to 17% of young adults without a disability that experienced cybervictimization in the past year.Note Frequent smart phone use is related to cybervictimization Being continually connected to the Internet is common among young adults aged 18 to 29, though this may place them at increased risk. Over half (55%) checked their smart phone at least every 15 to 30 minutes, with another one-third (30%) checking their smart phone at least once per hour on a typical day. Heavy cell phone use, defined as checking at least every 5 minutes, was the least common, with 15% of youth falling into this category. However, heavy use was more prevalent in the younger age groups. In 2018, 17% of young adults aged 18 to 20 were heavy users, falling to 11% among those aged 27 to 29. The majority, around three quarters, of young adults between the ages of 18 and 29 also stated that the last thing they do before going to sleep is check their phones, and a similar percentage stated that they do this again first thing upon waking up. The rates of checking before bed and upon waking are very similar regardless of gender and age. About 4 out of 5 (82%) young adults aged 18 to 20 checked their phones when waking up, and 71% of young adults aged 27 to 29 did the same. This difference, however, was not statistically significant. A pattern, albeit weak, emerges showing that more frequent smart phone use is associated with more online victimization. Based on data from the CIUS, 15% of young adults who used their smart phone at least every 5 minutes said that they had been cybervictimized in the past year. This was double (statistically significant at the p < 0.10 level) the rate of young adults who checked their phone less often (7%)Note . There were no significant differences on whether one used the smart phone before going to bed or after waking up and cybervictimization in the past year. While a direct comparison cannot be made with the data from the CHSCY on ages 12 to 17 presented earlier, it is interesting to note that among 12-to-17-year-olds there was a significant association between using one’s electronic device at bedtime and risk of cybervictimization, with a higher risk noted especially for teens age 12 and age 15. Using protective measures online is more common among younger women Being victimized online can also lead people to pull back from social media and other online activities. For example, information from the SSPPS shows that about 22% of young adults aged 18 to 29 said that in the past year, they blocked people on the internet because of harassment, while 13% said they restricted their access to the internet to protect themselves from harassment. A further 3% deleted their online account because of harassment. Young women were twice as likely as young men to block people because of harassment (31% versus 13%) and to restrict their own access (17% versus 10%) (Chart 4). These gender differences may be driven by the higher overall cybervictimization rates for women.Note Data table for Chart 4 Data table for chart 4 Table summary This table displays the results of Data table for chart 4 Men, Women, Young people aged 18 to 29, 18 to 21 years, 22 to 25 years and 26 to 29 years, calculated using percentage units of measure (appearing as column headers). Men Women Young people aged 18 to 29 18 to 21 years 22 to 25 years 26 to 29 years Young people aged 18 to 29 18 to 21 years 22 to 25 years 26 to 29 years percentage Blocked people because of harassment 13Note * 15Note * 13Note * 11Note * 31 35 33 27 Restricted own access to protect self 10Note * 7Note * 10Note * 11 17 14 20 17 Deleted online account because of harassment 3 2 3 2 4 4 5 4 Note * significant difference (p < 0.05) between men and women for a particular age group. Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Survey of Safety in Public and Private Spaces (SSPPS), 2018. Limiting online activities as a response to cybervictimization is not surprising. Results from the GSS show a strong association between being victimized online and taking other precautions for one’s safety beyond unplugging from the internet. For example, when asked if they do certain things routinely to make themselves safer from crime, young adults aged 18 to 29 who had been cybervictimized in the past year were much more likely to say that they carry something for defense, such as a whistle, a knife or pepper spray, compared with young adults who had not experienced online victimization (12% versus 3%). Cybervictimization associated with other forms of victimization among young people There is often a strong association between different types of in-person victimization.Note This is also the case for cybervictimization. Young adults who have been cybervictimized were more likely to be victims of fraud, more likely to have been stalked and also more likely to have been physically or sexually assaulted in the past year. Data from the GSS showed a connection between cybervictimization and risk of fraud. For example, 17% of young adults who had been cybervictimized in the past year said that they had also been a victim of fraud in the past year, more than four times higher than young adults who had not experienced cybervictimization (4%).Note Cybervictimization is also highly correlated with other forms of victimization and behaviour. For instance, information from the SSPPS shows that young adults who have experienced unwanted behaviours in public that made them feel unsafe or uncomfortable had also been victims of online harassment and bullying in the past year.Note About 45% of young adults who had experienced such behaviours had been cybervictimized in the past year, compared with 11% who had not experienced such behaviours (Table 6). The relationship between online victimization and unwanted behaviours in public appears to be similar for men and women. In particular, 41% of men and 46% of women who had experienced unwanted behaviours in public had also been cybervictimized. This compares to around 10% of men and women who had not experienced such incidents.Note Cybervictimization may manifest itself in real-world public encounters because victims of online abuse may be highly sensitized to possibly unsafe or uncomfortable situations in public, especially in instances where the identity of the online abuser is not known. For all they know, the person making them feel unsafe or uncomfortable in public might be the very same person harassing them online. Table 6 Prevalence of cybervictimization among young people aged 18 to 29, by experiences of in-person victimization in the past 12 months and gender, 2018 Table summary This table displays the results of Prevalence of cybervictimization among young people aged 18 to 29. The information is grouped by Gender (appearing as row headers), Felt unsafe or uncomfortable in public, Stalked and Experienced physical/sexual assault (appearing as column headers). Gender Felt unsafe or uncomfortable in publicTable 6 Note 1 StalkedTable 6 Note 2 Experienced physical/sexual assault Table 6 Note 3 Yes (ref.) No Yes (ref.) No None (ref.) One incident Two or more incidents percentage Total young people aged 18 to 29 45 10Note * 67 22Note * 21 54Note * 64Note * Men 41 10Note * 57 16Note * 15 44Note * 54Note * Women 46 11Note * 72 29Note * 27 62Note * 70Note * Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note 1 Respondents were asked: Thinking about time you spent in public spaces in the past 12 months, how many times has anyone made you feel unsafe or uncomfortable by doing any of the following? Making unwanted physical contact, such as hugs or shoulder rubs or getting too close to you in a sexual manner. Indecently exposing themselves to you or inappropriately displaying any body parts to you in a sexual manner. Making unwanted comments that you do not look or act like a [man/woman/man or woman] is supposed to look or act. Making unwanted comments about your sexual orientation or assumed sexual orientation. Giving you unwanted sexual attention, such as inappropriate comments, whistles, calls, suggestive looks, gestures, or body language. Return to note 1 referrer Note 2 Respondents were asked: In the past 12 months, have you been stalked, that is, have you been the subject of repeated and unwanted attention, by someone other than a current or former spouse, common-law partner or dating partner. Return to note 2 referrer Note 3 Respondents are asked if the following incidents happened to them in the past 12 months (excluding acts committed by a current or previous spouse, common-law partner or dating partner): a. been attacked, b. anyone threatened to hit or attack you or threatened you with a weapon, c. has someone touch them in a sexual way against their will, d. has someone forced or attempted to force them into unwanted sexual activity by threatening them, holding them down or hurting them in some way, e. has anyone subjected you to a sexual activity to which you were not able to consent, that is, were you drugged, intoxicated, manipulated or forced in other ways than physically. Respondents are then asked if these things happened in one incident or more than one incident. Return to note 3 referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Statistics Canada, Survey of Safety in Public and Private Spaces, 2018. According to the SSPPS, young adults who have been stalked in the past year have also been victims of online bullying and harassment in the past year.Note For instance, 67% of young adults who stated that they had been stalked in the past year also stated that they had been cybervictimized in the past 12 months, three times higher than young adults who had not been stalked in the past year (22%). The relationship is similar for both men and women, with over 72% of women and 57% of men who had been stalked also stating that they had been cybervictimized. Being a victim of stalking is more prevalent among women in general, as 32% of women stated they had been stalked, significantly greater than the 17% of men who stated that they had been stalked.Note A connection between online victimization and physical and sexual assaults also exists.Note Overall, among victims of physical and sexual assault, the proportion that said they were also cybervictimized was very high. In 2018, 54% of physical or sexual assault victims reported being cybervictimized, climbing to 64% if young people had experienced two or more incidents of physical or sexual assault. The strong association is present for both young adult men and women, with consistently higher prevalence for women regardless of number of physical or sexual assaults. Perpetrators of online victimization are most often men and known to the victim An important area of research on cybervictimization that is often lacking relates to the gender of the offender and the relationship between the offender and the victim. Using the SSPPS, it is possible to understand the characteristics of the perpetrator in cybervictimization incidents (Chart 5). About two-thirds (64%) of young adults who had been cybervictimized stated that a man (or men) was responsible, while 19% said it was a woman (or women), 4% said that it was both, and 13% did not know the gender of their online attacker. This general pattern was similar regardless of gender of the victim, though for women victims, the perpetrator was much more likely to be a man (or men). For instance, 73% of women who had been victimized stated that their offender(s) was (were) a man/men, while 13% stated that it was a woman or women. In contrast, 45% of men said that it was a man (or men) that was responsible, while 31% stated that their offender(s) was a woman or women. At the same time, 19% of men and 11% of women did not know the gender of their online offender.Note Data table for Chart 5 Data table for chart 5 Table summary This table displays the results of Data table for chart 5 Total, Gender of victim, Male victim (ref.) and Female victim, calculated using percentage units of measure (appearing as column headers). Total Gender of victim Male victim (ref.) Female victim percentage Male offender 64 45 73Note * Female offender 19 31 13Note * Both male and female offenders 4 6 3 Don’t know 13 19 11 Note * significantly different from reference category (ref.) (p < 0.05) Return to note * referrer Note: Due to sample size limitations, the non-binary category is not releasable. Source: Survey of Safety in Public and Private Spaces (SSPPS), 2018. The SSPPS also has information on the relationship of the offender and victim for the most serious incident of inappropriate online behaviour (combining single and multiple offender incidents). The most common offenders, at 55%, were offenders known to the victim, including friends, neighbours, acquaintances, teachers, professors, managers, co-workers, and classmates, as well as family members or current or former partners including spouses, common-law partners or dating partners. Meanwhile, 45% were offenders who were not known to the victim, including strangers or persons known by sight only. Thus, results show that the perpetrator was known to the victim in more than 50% of cases, regardless of the gender of the victim. Based on the SSPPS, 53% of men victims and 56% of women victims knew the person victimizing them online. Conclusion Internet and smart phone use among youth and young adults in Canada is at a very high level, particularly since the pandemic. It is a tether to the outside world, allowing communication with one another, expanding knowledge, and being entertained. It is this importance and pervasiveness that makes it particularly challenging when there are risks of online victimization. A goal of this study was to highlight the current state of cybervictimization among Canadian youth and young adults aged 12 to 29. Four separate surveys were used to paint a picture of who is most at risk of cybervictimization, how online and offline behaviours may contribute to this association, and the association with other forms of victimization. Based on the analysis of the data, there are five key messages related to cybervictimization of youth and young adults: Not all youth and young adults experience cybervictimization equally. Those that are most vulnerable to online harm were youth aged 15 -17 with same-gender attraction or, more broadly, LGBTQ2 young adults aged 18-29, youth and young adults with a disability, Indigenous youth, and young adult women when the cybervictimization measures were more of a sexual nature. Cybervictimization increases during adolescence and remains high among young adults in their early 20s. The risk drops somewhat as young adults approach age 30. This age pattern was found using two surveys that allowed for prevalence estimates by smaller age groupings (CHSCY and SSPPS). The prevalence estimates were not completely comparable across ages 12 to 29, but the pattern remained. Greater internet use, as well as using devices at bedtime and upon waking up was associated with being cybervictimized. Potential buffers of this connection especially for the teenage population (ages 12-17) were not using devices at mealtime, having parents who often know what their teens were doing online, and having less difficulty making friends. Taking action to make themselves safer was seen for youth and young adults who have been cybervictimized. This included blocking people online, restricting their own internet access, and carrying something for protection when offline. Experiencing other forms of victimization was more common among those who were cybervictimized. This includes being stalked and being physically or sexually assaulted, and experiencing other types of unwanted behaviours in public. The benefits of the internet for the youth and young adult population are numerous, however, as this study has illustrated, there are certain risks associated with the anonymity and widespread exposure to many unknown factors while online. Knowing the socio-demographic factors and internet use patterns associated with cybervictimization can help tailor interventions to better prevent and respond to cybervictimization. Future analytical work should continue to better understand online victimization faced by youth and young adults. Darcy Hango is a senior researcher with Insights on Canadian Society at Statistics Canada. Start of text box Data sources, methods and definitions Four surveys are used in this paper: (1) Canadian Health Survey on Children and Youth (CHSCY), 2019; (2) Canadian Internet Use Survey (CIUS),2018-2019; (3) General Social Survey GSS on Victimization (cycle 34): 2019-2020, and (4) Survey of Safety in Public and Private Spaces (SSPPS): 2018. The analysis is split into 2 separate broad age groups: ages 12 to 17 is examined using the CHSCY, and ages 18 to 29 is examined using the CIUS, the GSS, and the SSPPS. There remain data gaps in cybervictimization. For instance, there is a need for more information on the perpetrators of cybervictimization. This may involve adding more follow-up questions on existing surveys, whether it is CHSCY or victimization surveys. Moreover, information on specific types of social media platforms, such as social networking sites, image-based sites and discussion forums would be helpful to pinpoint which applications are seeing the most incidents of cyberbullying. As internet use and potential harm is not restricted to people aged 12 and older, it would be critical to understand the prevalence and nature of cybervictimization for the youngest Canadians, those under the age of 12, recognizing that survey adaptation and ethical considerations would need to be considered. Lastly, certain population subgroups are more at risk of cybervictimization than others and the research for this study revealed that an inadequate sample size for some groups, such as Indigenous youth and young adults, as well as sexually and gender diverse youth and young adults, limits the ability to understand the dimensions of the issue for these populations. As such, it is necessary to consider oversampling certain groups to produce meaningful cybervictimization estimates. An additional concern, overarching many of the above issues, is the “digital divide”, particularly affecting communities in rural areas and the north. Recent statistics reveal that in 2017, 99% of Canadians had access to long term evolution (LTE) networks, though this was true for only about 63% of Northern residents.Note The disparity in connectivity may have an adverse impact especially for the Indigenous population in terms of not only Indigenous youths’ underrepresentation in Canadian data on cyberbullying, but also digital literacy initiatives in Northern or in First Nations and Inuit communities. End of text box Start of text box Cyberbullying content across four Statistics Canada surveys 1. Canadian Health Survey on Children and Youth (CHSCY), youth aged 12 to 17 years, 2019 (data collection period between February and August 2019) During the past 12 months, how often did the following things happen to you? Someone posted hurtful information about you on the Internet Someone threatened or insulted you through email, instant messaging, text messaging or an online game Someone purposefully excluded you from an online community 2. Canadian Internet Use Survey (CIUS), people aged 15 years and older, 2018-2019 (data collection period between November 2018 and March 2019) Universe: Internet users in the past 3 months During the past 12 months, have you felt that you were a victim of any of the following incidents on the Internet? Did you experience? Bullying, harassment, discrimination Misuse of personal pictures, videos or other content Other incident 3. General Social Survey GSS on Victimization (cycle 34), people aged 15 years and older, 2019-2020 (data collection period between April 2019 and March 2020) Universe: Internet users in the past 12 months In the past 5 years, have you experienced any of the following types of cyber-stalking or cyber-bullying? This can be narrowed down to past year by the following question: “You indicated that you experienced some type of cyber-stalking or cyber-bullying in the past 5 years. Did any occur in the past 12 months?” You received threatening or aggressive emails or instant messages where you were the only recipient You were the target of threatening or aggressive comments spread through group emails, instant messages or postings on Internet sites Someone sent out or posted pictures that embarrassed you or made you feel threatened Someone used your identity to send out or post embarrassing or threatening information Any other type 4. Survey of Safety in Public and Private Spaces (SSPPS), people aged 15 years and older, 2018 (data collection period between April and December 2018) Universe: Internet users in the past 12 months Indicate how many times in the past 12 months you have experienced each of the following behaviours while online. You received any threatening or aggressive emails, social media messages, or text messages where you were the only recipient You were the target of threatening or aggressive comments spread through group emails, group text messages or postings on social media Someone posted or distributed, or threatened to post or distribute, intimate or sexually explicit videos or images of you without your consent Someone pressured you to send, share, or post sexually suggestive or explicit images or messages Someone sent you sexually suggestive or explicit images or messages when you did not want to receive them End of text box Notes Note Internet-use Typology of Canadians: Online Activities and Digital Skills Return to note referrer Note See Bilodeau, Kehler, and Minnema 2021 Return to note referrer Note Canadians’ assessments of social media in their lives Return to note referrer Note Other concerns as a result of increased internet and/or smart phone usage such as lack of sleep and anxiety are important but are left for other research. A recent example is an article by Schimmele et al 2021. Return to note referrer Note Because there are already very comprehensive reviews of the prevalence and consequences of cybervictimization in Canada and abroad this is not gone into detail here. Readers should consult Zych et al 2019 ; Field 2018 for reviews, and Kim et al 2017; Hango 2016; and Holfeld and Leadbeater 2015 for examples of recent research using Canadian data. Return to note referrer Note See Field, 2018 Return to note referrer Note All differences are significant at p <0.05 level, unless otherwise noted. Return to note referrer Note Questions on sexual attraction were only asked for youth aged 15 to 17. Return to note referrer Note The Indigenous population covered in this paper are from all provinces and territories. In both the CHSCY and the SSPPS samples were selected from across Canada. The samples do not include youth and young adults living on First Nations reserves and other Aboriginal settlements. Return to note referrer Note See Perreault 2022 for recent research focused on exploring victimization trends among the Indigenous population in Canada. Return to note referrer Note The sample size for Inuit youth was too small to detect significant differences between groups. Return to note referrer Note Wavrock, Schellenberg, and Schimmele 2021. Return to note referrer Note The analysis by age is not shown but is available upon request. Return to note referrer Note Sample size was not sufficient to conduct analyses in this section separately for the gender diverse population. Return to note referrer Note See MediaSmarts 2022. Return to note referrer Note See for example, research by Bollmer et al 2005 and Kendrick et al 2012. Return to note referrer Note Due to sample size limitations, analysis does not include gender diverse youth. Return to note referrer Note Due to sample size limitations, analysis does not include gender diverse youth. Return to note referrer Note Due to sample size limitations, analysis does not include gender diverse young adults. Return to note referrer Note Among ages 12 to 17, there were no differences between boys and girls on cybervictimization because none of the measures explicitly asked whether the bullying was of a sexual nature. Some additional analysis on the SSPPS on ages 15 to 17 (available upon request), showed that teen girls did report a significantly higher probability than teen boys of experiencing the three cybervictimization forms that explicitly tapped into the sexualized nature of the abuse. There were no gender differences on the two measures that only asked about aggressive cybervictimization. Return to note referrer Note Based on the SSPPS derived variable of ‘LGBTQ2’, which uses responses to sex at birth, gender, and sexual orientation. Return to note referrer Note This aligns with other research on violent victimization among the LGBTQ population. See Jaffray 2020; Cotter and Savage 2019. Return to note referrer Note In the GSS, LGBTQ2 young adults also reported a significantly higher probability of experiencing cybervictimization in the form of pictures that embarrassed or threatened them (4.4% versus 1%). Return to note referrer Note These estimates are not presented in a table but are available upon request. Return to note referrer Note The sample size for Inuit young adults was too small to detect significant differences between groups. Return to note referrer Note A person is defined as having a disability if he or she has one or more of the following types of disability: seeing, hearing, mobility, flexibility, dexterity, pain-related, learning, developmental, memory, mental health-related. Return to note referrer Note In the GSS, a larger share of young adults with a disability also reported being cybervictimized via aggressive comments through email (4.3% versus 1.1%), and in CIUS, on any of the 3 types of cybervictimization measures (18.1% versus 7%). Return to note referrer Note These results are not in a table and are available upon request. Based on the global severity score, severity classes were established. Severity scores increase with the number of disability types, the level of difficulty associated with the disability and the frequency of the activity limitation. The name assigned to each class is simply intended to facilitate use of the severity score. It is not a label or judgement concerning the person’s level of disability. The classes should be interpreted as follows: people in class 1 have a less severe disability than people in class 2; the latter have a less severe disability than people in class 3; and so on. For more information on severity scores and classes, please refer to the Canadian Survey on Disability (CSD), 2017: Concepts and Methods Guide. Return to note referrer Note These proportions are not statistically different from each other due to high sampling variability. Return to note referrer Note Recall that data from the SSPPS showed that 32% of young women said they were cybervictimized in the past year, compared with 17% of young men. Return to note referrer Note See examples of some research that examines links between different types of victimization for example see Finkelhor et. al 2011; Turner et. al 2016; Waasdorp and Bradshaw 2015. Return to note referrer Note Fraud in this case refers to having one’s personal information or account details used to obtain money or buy goods and services, having one’s personal information or account details used to create or access an account, apply for benefits, services or documents, and having been tricked or deceived out of money or goods either in person, by telephone or online. Return to note referrer Note Respondents were asked: Thinking about time you spent in public spaces in the past 12 months, how many times has anyone made you feel unsafe or uncomfortable by doing any of the following? a. Making unwanted physical contact, such as hugs or shoulder rubs or getting too close to you in a sexual manner, b. Indecently exposing themselves to you or inappropriately displaying any body parts to you in a sexual manner, c. Making unwanted comments that you do not look or act like a (man/woman) is supposed to look or act, d. Making unwanted comments about your sexual orientation or assumed sexual orientation, or e. Giving you unwanted sexual attention, such as inappropriate comments, whistles, calls, suggestive looks, gestures, or body language. Return to note referrer Note Due to sample size limitations, analysis does not include gender diverse young adults. Return to note referrer Note Respondents were asked: In the past 12 months, have you been stalked, that is, have you been the subject of repeated and unwanted attention, by someone other than a current or former spouse, common-law partner or dating partner. Return to note referrer Note These results are not shown in a table but are available upon request. Return to note referrer Note In the SSPPS, respondents were asked if the following things happened to them in the past 12 months (excluding acts committed by a current or previous spouse, common-law partner or dating partner): a. been attacked, b. anyone threatened to hit or attack them or threatened them with a weapon, c. has someone touch them in a sexual way against their will, d. has someone forced or attempted to force them into unwanted sexual activity by threatening them, holding them down or hurting them in some way, e. has anyone subjected them to a sexual activity to which they were not able to consent, that is, were they drugged, intoxicated, manipulated or forced in other ways than physically. Respondents are then asked if these things happened in one incident or more than one incident. Return to note referrer Note Due to sample size limitations, analysis does not include non-binary young adults. Return to note referrer Note See CRTC Communications Monitoring Report, 2019. Return to note referrer Related information Related Articles Bullying victimization among sexually and gender diverse youth in Canada Social Media Use, Connections and Relationships in Canadian Adolescents Findings from the 2018 Health Behaviour in School-aged Children (HBSC) Study Data sources Canadian Health Survey on Children and Youth Survey of Safety in Public and Private Spaces General Social Survey - Canadians' Safety Canadian Internet Use Survey Bibliographic references References How to cite this article More information ISSN: 2291-0840 Note of appreciation Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill. Standards of service to the public Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients. Copyright Published by authority of the Minister responsible for Statistics Canada. © His Majesty the King in Right of Canada as represented by the Minister of Industry, 2023 Use of this publication is governed by the Statistics Canada Open Licence Agreement. Catalogue no. 75-006-x Frequency: Occasional Ottawa Related infographics Cyberbullying among youth in Canada Cybervictimization among young adults in Canada Date modified: 2023-03-15
security Online harassment
1. I can speak 3 languages.
ability
3. Larry ________________forget to practice his piano because he _______________play at a wedding next week.
must not _ has to
3. Our car is almost out of gas. We _________________ fill it up right away, or we will run out of gas.
had better
3. I want some coffee.
Could i have some coffee please ?
Onverschuldigde betaling
betekent dat iemand iets betaalt of levert terwijl daar geen rechtsgrond voor bestaat
Relatieve onmogelijkheid
alleen deze schuldenaar kan niet presteren, maar een ander zou het in theorie wel kunnen
Absolute onmogelijkheid
nakoming is onmogelijk - niemand kan nog presteren
Open accessTips and Tools29 January 2021 Share on Sharing Notes Is Encouraged: Annotating and Cocreating with Hypothes.is and Google DocsAuthors: Carlos C. Goller ccgoller@ncsu.edu, Micah Vandegrift, Will Cross, Davida S. SmythAuthors Info & Affiliationshttps://doi.org/10.1128/jmbe.v22i1.2135 101,598MetricsTotal Citations10Total Downloads1,598View all metrics CitePDF/EPUBContents JMBE Volume 22, Number 130 April 2021ABSTRACTINTRODUCTIONPROCEDURECONCLUSIONACKNOWLEDGMENTSSupplemental MaterialREFERENCESInformation & ContributorsMetrics & CitationsReferencesFiguresTablesMediaShareABSTRACTEffectively analyzing literature can be challenging for those unfamiliar with studies from rapidly evolving research fields. Previous studies have shown that incorporating primary literature promotes scientific literacy and critical thinking skills. We’ve used collaborative note-taking and annotation of peer-reviewed articles to increase student engagement with course content and primary literature. Participants annotate articles using the web-annotation tool Hypothes.is and have access to comments from their peers. Groups are then assigned to summarize the annotations and findings, posting a synthesis for the course’s Hypothes.is group. In parallel, students contribute to common notes. The instructor generates a weekly video discussing the student notes. The goal of these activities is to foster an environment of open annotation and co-creation of knowledge to aid in studying for deeper learning. Compiled notes can be used to create an open educational resource (OER). The OER provides an entry point for future students and the public. Based on the evaluation of annotations, notes, and assessments, we conclude that these activities encourage student engagement and achievement of learning outcomes while raising awareness of the importance of open and collaborative practices.INTRODUCTIONReading primary literature can be challenging for those unfamiliar with terminology or methodology (1–3). Often, students highlight long passages or read over unfamiliar jargon without fully comprehending the significance and details of a study. Several approaches have been described to promote the critical reading and analysis of primary literature (4–9). While these methods provide structure, students often read and analyze in isolation, as the methods do not facilitate virtual and open peer collaboration. Additionally, note-taking is a skill that is not commonly taught or emphasized in science courses (10). To create an inclusive and empowering environment of cocreation of knowledge, we’ve infused an upper-division metagenomics course with activities to reduce the anxiety of reading primary literature and note-taking and promote collective and collaborative constructivism.Many tools are available that allow collaborative work on electronic documents. Google Docs, Sheets, and Slides can be used to enable participants to contribute. There are also resources to annotate web pages. One such tool often used in the humanities is Hypothes.is (11–14); it is free, open source, and easy to use in classroom settings, including online courses. Initiatives such as Science in the Classroom (https://www.scienceintheclassroom.org/) have led to studies highlighting the use of annotation as a pedagogical tool (15–17).Student collaborative notes and summaries can be used to create an Open Educational Resource (OER). Furthermore, student-created OERs can foster a sense of ownership as class participants work toward creating a common resource that will serve them and a wider audience beyond the course (18).PROCEDUREWe introduced the use of Hypothes.is and collaborative notes in the fall of 2019 in an 8-week upper-division undergraduate and graduate student Metagenomics course (19). The course has weekly lectures of 1 hour 50 minutes and 5-hour labs with a course-based research project that relies heavily on the assigned readings. There were 15 students enrolled in the course: 4 undergraduates and 11 graduate students. The study was approved by the NCSU IRB (#20309).Students annotate articles using Hypothes.is (https://web.hypothes.is/) and have access to all comments. Hypothes. is a free open-source software package that allows users to highlight and annotate websites and text. Students are required to submit at least 10 meaningful annotations before the in-class discussion (see Appendix 1). A week after the discussion, groups of three or four students assigned to summarize the article post a brief synopsis on the class’s Hypothes.is group (Fig. 1 and Appendix 1).FIGURE 1.FIGURE 1. “Sharing Notes is Encouraged” workflow. Students annotate and cocreate notes to produce an OER for studying and future course participants. Students use Hypothes.is to annotate primary literature as homework assignments, following set guidelines (for details see Appendix 1), and groups are tasked with creating shared summaries for the class to view within a Hypothes.is group. Students contribute to shared notes both during and outside of the class session. The instructor then uses the notes to produce weekly recaps to provide feedback and encouragement. Student contributions are then compiled to create a final OER, containing all notes and annotations generated over the course of the semester in a publicly viewable dynamic resource (for a sample OER, go to go.nscu.edu/bitmetagenomics) and click on “Meta Book”). PB, Pressbooks.For collective notes, students have access to a Google Doc with the learning outcomes for each class session. Students are encouraged to contribute by providing definitions, examples, and links to additional resources. Notes are not graded but are lightly edited by the instructor for accuracy. Peers can provide constructive feedback and correct, remedy, or amend misconceptions and inaccuracies. Each week the instructor generates a video reviewing the notes and administers individual quizzes based on the content of the class notes.Students are informed that, with their consent, their notes and annotations can be used to create an OER that would benefit them and others beyond the course. An example of a student-generated OER is available at go.ncsu.edu/bitmetagenomics.Materials and preparationInstructors create a private course Hypothes.is “group” and share the link with students via their Learning Management System (LMS) or e-mail. Students require free Hypothes.is accounts. If using the Google Chrome browser, there is a useful Hypothes.is extension. Helpful tutorials for using Hypothes.is in education can be found on the website: https://web.hypothes.is/education/. If Google Docs are to be used for shared class notes, the instructor needs to make the document editable by participants. The creation of a short link that is easy to remember may be helpful. The instructor should provide guidelines for annotation (types of annotations including asking questions, clarifying or linking to resources, and examples of tags used by others), expectations for the number of individual annotations, and grading rubrics (Appendix 1). Frequently presenting or projecting the progress of the class notes encourages participation. The instructor can read and discuss the class notes in a short (6- to 15-min) weekly screencast video posted on an unlisted YouTube playlist (e.g., https://go.ncsu.edu/metanotes19).Students that contribute to class notes can produce a final web-based Pressbooks OER. Pressbooks is an affordable ($20 to $100/eBook) and easy-to-use online eBook creation platform used by universities and the OER community [e.g., Granite State College OERs (https://granite.pressbooks.pub/) and BC Open Textbooks (https://opentextbc.ca/pressbooks/)].Modifications and extensionsThe Hypothes.is annotation and group summaries assignment has been adapted for other lab-based courses. For example, for an undergraduate and graduate student 8-week Yeast Metabolic Engineering lab module (20), we have modified the assignment guidelines to allow students to complete the minimum number of meaningful annotations after the in-person discussion of articles. This extension resulted in several participants returning to the papers weeks later to provide additional information. Guidelines can be modified to increase the minimum number of annotations, have students ask and respond to each other, find related studies, or alter the due dates (e.g., until after in-class discussion). The rubric for group summaries can be modified for different course learning outcomes (e.g., data analysis). An example of an annotated paper can be shared with students; for example, a microbiome study from Science in the Classroom can help students learn to annotate using different tags/elements (https://www.scienceintheclassroom.org/research-papers/whats-normal-scoop-poop).Instructors can choose to encourage all participants to contribute to class notes by making the assignment credit-bearing. Instead of weekly screencast videos, alternatives include an audio file, podcast, or e-mail announcement. Other OER platforms exist, and some faculty may decide to use WordPress or GoogleSites to create publicly accessible sites to publish the collective contributions of participants. Data privacy and consent cannot be overlooked: talk to your students about posting their names on publicly facing sites, after asking for their consent in writing. Instructors are encouraged to contact other faculty to collaborate on topic-specific OERs.CONCLUSIONStudents annotate and produce summaries and collaborative notes following the guidelines. Analysis of the annotations and quiz grades suggests that participants are engaging with the articles (Fig. 2) and able to summarize the findings of the studies (Appendix 2). Annotations of student-selected papers by groups indicate students continue to use rich annotations. Participants contribute to a Google Doc and view weekly video summaries.FIGURE 2.FIGURE 2. Students annotated assigned papers frequently and began using descriptive “tags.” (A) Students used the Hypothes.is tool to annotate reading assignments and tag keywords or phrases. Some students responded to other comments and included links and other resources. (B) All students’ (n = 15) annotated readings.We note that, while students unfamiliar with Hypothes.is require a demonstration, having seen the demonstration, participants are capable of providing productive comments about the studies. For all the articles we’ve included as reading, students have contributed definitions, links to additional resources, and even responses to questions posed by peers. We advise that instructors highlight the benefits of collaborative annotation and critical note-taking. Our study demonstrates the impact of creating a scholarly community to promote learning and how it can encourage participation and ownership of an OER project. Our implementation demonstrated that all students made annotations and contributed their thoughts and ideas to the shared notes document. These efforts helped constitute a student-derived OER that could serve not only these students beyond the course but others as well.ACKNOWLEDGMENTSThe NCSU OPEN Incubator Program (summer 2019) provided training and inspired us to use Hypothes.is in this course and beyond. We are grateful for funding from the National Science Foundation (NSF) and to the PALM network for providing mentorship and access to active learning resources. The NCSU Biotechnology Program (BIT) provided the resources to offer the Metagenomics and Yeast Metabolic Engineering courses in which these activities were implemented. C.C.G. is also supported by an NIH Innovative Program to Enhance Research Training (IPERT) grant, “Molecular Biotechnology Laboratory Education Modules (MBLEMs)” 1R25GM130528-01A1. We thank the students in the fall 2019 BIT 477/577 Metagenomics course for their patience, commitment, feedback, and energy. This study has been reviewed by the Institutional Review Board at North Carolina State University and approved under protocol number #20309. We do not have any conflicts of interest to declare.Supplemental MaterialFile (jmbe00006-21_supp_1_seq2.pdf)Download1000.11 KBASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.REFERENCES1.Rawlings JS. 2019. Primary literature in the undergraduate immunology curriculum: strategies, challenges, and opportunities. Front Immunol 10:1857.Go to CitationViewPubMedWeb of ScienceGoogle Scholar2.Nelms AA, Segura-Totten M. 2019. Expert–novice comparison reveals pedagogical implications for students’ analysis of primary literature. CBE Life Sci Educ 18:ar56.Go to CitationViewPubMedWeb of ScienceGoogle Scholar3.Abdullah C, Parris J, Lie R, Guzdar A, Tour E. 2015. Critical analysis of primary literature in a master’s-level class: effects on self-efficacy and science-process skills. CBE Life Sci Educ 14:ar34.Go to CitationViewPubMedWeb of ScienceGoogle Scholar4.Liao MK. 2017. A simple activity to enhance the learning experience of reading primary literature. J Microbiol Biol Educ 18.Go to CitationViewPubMedGoogle Scholar5.Hoskins SG, Lopatto D, Stevens LM. 2011. The C.R.E.A.T.E. approach to primary literature shifts undergraduates’ self-assessed ability to read and analyze journal articles, attitudes about science, and epistemological beliefs. CBE Life Sci Educ 10:368–378.Go to CitationViewPubMedWeb of ScienceGoogle Scholar6.Gottesman AJ, Hoskins SG. 2013. CREATE cornerstone: introduction to scientific thinking, a new course for stem-interested freshmen demystifies scientific thinking through analysis of scientific literature. CBE Life Sci Educ 12:59–72.Go to CitationViewPubMedWeb of ScienceGoogle Scholar7.Carmichael JS, Allison LA. 2019. Using “research boxes” to enhance understanding of primary literature and the process of science. J Microbiol Biol Educ 20(2).Go to CitationViewPubMedGoogle Scholar8.Round JE, Campbell AM. 2013. Figure facts: encouraging undergraduates to take a data-centered approach to reading primary literature. CBE Life Sci Educ 12:39–46.Go to CitationViewPubMedWeb of ScienceGoogle Scholar9.Lo SM, Luu TB, Tran J. 2020. A modified CREATE intervention improves student cognitive and affective outcomes in an upper-division genetics course. J Microbiol Biol Educ 21(1).Go to CitationViewPubMedGoogle Scholar10.Morehead K, Dunlosky J, Rawson KA, Blasiman R, Hollis RB. 2019. Note-taking habits of 21st-century college students: implications for student learning, memory, and achievement. Memory 27:807–819.Go to CitationViewPubMedWeb of ScienceGoogle Scholar11.Kennedy M. 2016. Open annotation and close reading the Victorian text: using hypothes.is with students. J Vic Cult 21:550–558.Go to CitationViewGoogle Scholar12.Shrout AH. 2016. Hypothes.is. J Am Hist 103:870–871.Go to CitationViewGoogle Scholar13.Perkel JM. 2015. Annotating the scholarly web. Nat News 528:153.Go to CitationViewWeb of ScienceGoogle Scholar14.Kalir JH, Dean J. 2018. Web annotation as conversation and interruption. Media Pract Educ 19:18–29.Go to CitationViewGoogle Scholar15.Kararo M, McCartney M. 2019. Annotated primary scientific literature: a pedagogical tool for undergraduate courses. PLOS Biol 17:e3000103.Go to CitationView Updates PubMedWeb of ScienceGoogle Scholar16.Miller K, Zyto S, Karger D, Yoo J, Mazur E. 2016. Analysis of student engagement in an online annotation system in the context of a flipped introductory physics class. Phys Rev Phys Educ Res 12:e020143.Go to CitationViewGoogle Scholar17.Sahota M, Leung B, Dowdell S, Velan GM. 2016. Learning pathology using collaborative vs. individual annotation of whole slide images: a mixed methods trial. BMC Med Educ 16:311.Go to CitationViewPubMedWeb of ScienceGoogle Scholar18.Yaeger J, Wolfe T. 2018. Creating the ripple effect: applying student-generated OER to increase engagement in distance education and enhance the OER community. Digital Universities 1/2.Go to CitationGoogle Scholar19.Goller CC, Ott LE. 2020. Evolution of an 8-week upper-division metagenomics course: diagramming a learning path from observational to quantitative microbiome analysis. Biochem Mol Biol Educ 48:391–403.Go to CitationViewPubMedWeb of ScienceGoogle Scholar20.Gordy CL, Goller CC. 2020. Using metabolic engineering to connect molecular biology techniques to societal challenges. Front Microbiol.Go to CitationViewPubMedWeb of ScienceGoogle ScholarInformation & ContributorsInformationContributorsInformationPublished In Journal of Microbiology & Biology EducationVolume 22 • Number 1 • 30 April 2021eLocator: 10.1128/jmbe.v22i1.2135PubMed: 33584941Copyright© 2021 Goller et al. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.HistoryReceived: 12 April 2020Accepted: 28 November 2020Published online: 29 January 2021TopicsAnnotation Tools and PipelinesGenome AnnotationGenome Assembly and AnnotationMicrobial GenomicsMicrobial Physiology and GeneticsMicrobiome ResearchDownload PDFContributorsExpand AllAuthorsCarlos C. Goller ccgoller@ncsu.eduDepartment of Biological Sciences and Biotechnology Program (BIT), North Carolina State University, Raleigh, NC 27695View all articles by this authorMicah VandegriftNC State University Libraries, Raleigh, NC 27695View all articles by this authorWill CrossNC State University Libraries, Raleigh, NC 27695View all articles by this authorDavida S. SmythEugene Lang College of Liberal Arts at The New School, New York City, NY 10011View all articles by this authorMetrics & CitationsMetricsCitationsMetrics Article MetricsView all metricsDownloadsCitationsNo data available.0204060Jan 2022Jan 2023Jan 2024Jan 20251,59810TotalFirst 90 Days6 Months12 MonthsTotal number of downloads and citations Note: For recently published articles, the TOTAL download count will appear as zero until a new month starts. There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication. Citation counts come from the Crossref Cited by service. 11030Smart Citations11030Citing PublicationsSupportingMentioningContrastingView CitationsSee how this article has been cited at scite.aiscite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made. Citations Citation text copied Copy Goller CC, Vandegrift M, Cross W, Smyth DS. 2021. Sharing Notes Is Encouraged: Annotating and Cocreating with Hypothes.is and Google Docs. J Microbiol Biol Educ. 22:10.1128/jmbe.v22i1.2135. https://doi.org/10.1128/jmbe.v22i1.2135 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download. Format RIS (ProCite, Reference Manager)EndNoteBibTexMedlarsRefWorks Direct import $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View OptionsFiguresOpen all in viewerFIGURE 1. “Sharing Notes is Encouraged” workflow. Students annotate and cocreate notes to produce an OER for studying and future course participants. Students use Hypothes.is to annotate primary literature as homework assignments, following set guidelines (for details see Appendix 1), and groups are tasked with creating shared summaries for the class to view within a Hypothes.is group. Students contribute to shared notes both during and outside of the class session. The instructor then uses the notes to produce weekly recaps to provide feedback and encouragement. Student contributions are then compiled to create a final OER, containing all notes and annotations generated over the course of the semester in a publicly viewable dynamic resource (for a sample OER, go to go.nscu.edu/bitmetagenomics) and click on “Meta Book”). PB, Pressbooks.Go to FigureOpen in ViewerFIGURE 2. Students annotated assigned papers frequently and began using descriptive “tags.” (A) Students used the Hypothes.is tool to annotate reading assignments and tag keywords or phrases. Some students responded to other comments and included links and other resources. (B) All students’ (n = 15) annotated readings.Go to FigureOpen in ViewerTablesMediaShareShareShare the article linkhttps://journals.asm.org/doi/10.1128/jmbe.v22i1.2135Copy LinkCopied!Copying failed.Share with emailEmail a colleagueShare on social mediaFacebookX (formerly Twitter)LinkedInWeChatBlueskyReferencesReferencesREFERENCES1.Rawlings JS. 2019. Primary literature in the undergraduate immunology curriculum: strategies, challenges, and opportunities. Front Immunol 10:1857.Go to CitationViewPubMedWeb of ScienceGoogle Scholar2.Nelms AA, Segura-Totten M. 2019. Expert–novice comparison reveals pedagogical implications for students’ analysis of primary literature. CBE Life Sci Educ 18:ar56.Go to CitationViewPubMedWeb of ScienceGoogle Scholar3.Abdullah C, Parris J, Lie R, Guzdar A, Tour E. 2015. Critical analysis of primary literature in a master’s-level class: effects on self-efficacy and science-process skills. CBE Life Sci Educ 14:ar34.Go to CitationViewPubMedWeb of ScienceGoogle Scholar4.Liao MK. 2017. A simple activity to enhance the learning experience of reading primary literature. J Microbiol Biol Educ 18.Go to CitationViewPubMedGoogle Scholar5.Hoskins SG, Lopatto D, Stevens LM. 2011. The C.R.E.A.T.E. approach to primary literature shifts undergraduates’ self-assessed ability to read and analyze journal articles, attitudes about science, and epistemological beliefs. CBE Life Sci Educ 10:368–378.Go to CitationViewPubMedWeb of ScienceGoogle Scholar6.Gottesman AJ, Hoskins SG. 2013. CREATE cornerstone: introduction to scientific thinking, a new course for stem-interested freshmen demystifies scientific thinking through analysis of scientific literature. CBE Life Sci Educ 12:59–72.Go to CitationViewPubMedWeb of ScienceGoogle Scholar7.Carmichael JS, Allison LA. 2019. Using “research boxes” to enhance understanding of primary literature and the process of science. J Microbiol Biol Educ 20(2).Go to CitationViewPubMedGoogle Scholar8.Round JE, Campbell AM. 2013. Figure facts: encouraging undergraduates to take a data-centered approach to reading primary literature. CBE Life Sci Educ 12:39–46.Go to CitationViewPubMedWeb of ScienceGoogle Scholar9.Lo SM, Luu TB, Tran J. 2020. A modified CREATE intervention improves student cognitive and affective outcomes in an upper-division genetics course. J Microbiol Biol Educ 21(1).Go to CitationViewPubMedGoogle Scholar10.Morehead K, Dunlosky J, Rawson KA, Blasiman R, Hollis RB. 2019. Note-taking habits of 21st-century college students: implications for student learning, memory, and achievement. Memory 27:807–819.Go to CitationViewPubMedWeb of ScienceGoogle Scholar11.Kennedy M. 2016. Open annotation and close reading the Victorian text: using hypothes.is with students. J Vic Cult 21:550–558.Go to CitationViewGoogle Scholar12.Shrout AH. 2016. Hypothes.is. J Am Hist 103:870–871.Go to CitationViewGoogle Scholar13.Perkel JM. 2015. Annotating the scholarly web. Nat News 528:153.Go to CitationViewWeb of ScienceGoogle Scholar14.Kalir JH, Dean J. 2018. Web annotation as conversation and interruption. Media Pract Educ 19:18–29.Go to CitationViewGoogle Scholar15.Kararo M, McCartney M. 2019. Annotated primary scientific literature: a pedagogical tool for undergraduate courses. PLOS Biol 17:e3000103.Go to CitationView Updates PubMedWeb of ScienceGoogle Scholar16.Miller K, Zyto S, Karger D, Yoo J, Mazur E. 2016. Analysis of student engagement in an online annotation system in the context of a flipped introductory physics class. Phys Rev Phys Educ Res 12:e020143.Go to CitationViewGoogle Scholar17.Sahota M, Leung B, Dowdell S, Velan GM. 2016. Learning pathology using collaborative vs. individual annotation of whole slide images: a mixed methods trial. BMC Med Educ 16:311.Go to CitationViewPubMedWeb of ScienceGoogle Scholar18.Yaeger J, Wolfe T. 2018. Creating the ripple effect: applying student-generated OER to increase engagement in distance education and enhance the OER community. Digital Universities 1/2.Go to CitationGoogle Scholar19.Goller CC, Ott LE. 2020. Evolution of an 8-week upper-division metagenomics course: diagramming a learning path from observational to quantitative microbiome analysis. Biochem Mol Biol Educ 48:391–403.Go to CitationViewPubMedWeb of ScienceGoogle Scholar20.Gordy CL, Goller CC. 2020. Using metabolic engineering to connect molecular biology techniques to societal challenges. Front Microbiol.Go to CitationViewPubMedWeb of ScienceGoogle Scholar Advertisement
Mysticeti [3 ] builds on Cordial Miners and establishes a mechanism toaccommodate multiple leaders within a single round.
^
3. I don’t have to work this weekend.
It is not necessary for me to work this weekend
3. They _________ do it today.
can't can
3. They _________ do it today.
CAN NOT
Reviewer #1 (Public review):
The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.
Major:
(1) The authors' interpretation of the results from the KC ( Clec4F) and MdM KO (LysM-Cre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.
(2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 that better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.
(3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.
(4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.
Minor:
(1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.
Reviewer #3 (Public review):
This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.
Here are my comments:
(1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.
(2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.
(3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.
(4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.
Comments on revisions:
The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.
Author response:
The following is the authors’ response to the original reviews
Reviewer #1 (Public review):
The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high-fat, high-cholesterol diet (HFHC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq, they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective, they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH, there are some concerns about the current data that limit my enthusiasm for the study in its current form. Please see my specific comments below.
(1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysM-Cre) experiments is flawed. For example, in Figure 2 the authors present data that knockout of Chil1 in KCs using Clec4f Cre produces worse liver steatosis and insulin resistance. However, in supplemental Figure 4, they perform the same experiment in LysM-Cre mice and find a somewhat different phenotype. The authors appear to be under the impression that LysM-Cre does not cause recombination in KCs and therefore interpret this data to mean that Chil1 is relevant in KCs and not MdMs. However, LysM-Cre DOES lead to efficient recombination in KCs and therefore Chil1 expression will be decreased in both KCs and MdM (along with PMNs) in this line.
Therefore, a phenotype observed with KC-KO should also be present in this model unless the authors argue that loss of Chil1 from the MdMs has the opposite phenotype of KCs and therefore attenuates the phenotype. The Cx3Cr1 CreER tamoxifen inducible system is currently the only macrophage Cre strategy that will avoid KC recombination. The authors need to rethink their results with the understanding that Chil1 is deleted from KCs in the LysM-Cre experiment. In addition, it appears that only one experiment was performed, with only 5 mice in each group for both the Clec4f and LysM-Cre data. This is generally not enough to make a firm conclusion for MASH diet experiments.
We thank the reviewer for raising this important point regarding our data interpretation. We have carefully examined the deletion efficiency of Chi3l1 in primary Kupffer cells (KCs) from Lyz2<sup>∆Chil1</sup> (LysM-Cre) mice. Our results show roughly a 40% reduction in Chi3l1 expression at both the mRNA and protein levels (Revised Manuscript, Figure S7B and C). Given this modest decrease, Chi3l1 deletion in KCs of Lyz2<sup>∆Chil1</sup> mice was incomplete, which likely accounts for the phenotypic differences observed between Clec4f<sup>∆Chil1</sup> and Lyz2<sup>∆Chil1</sup> mice in the MASLD model.
Furthermore, we have increased the sample size in both the Clec4f- and LysM-Cre experiments to 9–12 mice per group following the HFHC diet, thereby strengthening the statistical power and reliability of our findings (Revised Figures 2 and S8).
(2) The mouse weight gain is missing from Figure 2 and Supplementary Figure 4. This data is critical to interpret the changes in liver pathology, especially since they have worse insulin resistance.
We thank the reviewer for this valuable comment. We have now included the mouse body weight data in the revised manuscript (Figure 2A, B and Figures S8A, B). Compared with mice on a normal chow diet (NCD), all groups exhibited progressive weight gain during HFHC diet feeding. Notably, Clec4f<sup>∆Chil1</sup> mice gained significantly more body weight than Chil1<sup>fl/fl</sup> controls, whereas Lyz2<sup>∆Chil1</sup> mice showed a similar weight gain trajectory to Chil1<sup>fl/fl</sup> mice under the same conditions.
(3) Figure 4 suggests that KC death is increased with KO of Chil1. However, this data cannot be concluded from the plots shown. In Supplementary Figure 6 the authors provide a more appropriate gating scheme to quantify resident KCs that includes TIM4. The TIM4 data needs to be shown and quantified in Figure 4. As shown in Supplementary Figure 6, the F4/80 hi population is predominantly KCs at baseline; however, this is not true with MASH diets. Most of the recruited MoMFs also reside in the F4/80 hi gate where they can be identified by their lower expression of TIM4. The MoMF gate shown in this figure is incorrect. The CD11b hi population is predominantly PMNs, monocytes, and cDC,2 not MoMFs (PMID:33997821). In addition, the authors should stain the tissue for TIM4, which would also be expected to reveal a decrease in the number of resident KCs.
We thank the reviewer for raising this critical point regarding the gating strategy and interpretation of KC death. We have now refined our flow cytometry gating based on the reviewer’s suggestion. Specifically, we analyzed TIM4 expression and attempted to identify TIM4<sup>low</sup> MoMFs populations in our model. However, we did not detect a distinct TIM4<sup>low</sup> population, likely because our mice were fed the HFHC diet for only 16 weeks and had not yet developed liver fibrosis. We therefore reason that MoMFs have not fully acquired TIM4 expression at this stage.
To improve our analysis, we referred to published strategies (PMID: 41131393; PMID: 32562600) and gated KCs as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> and MoMFs as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/-</sup>. Using this approach, we observed a gradual reduction of KCs and a corresponding increase in MoMFs in WT mice, with a significantly faster loss of KCs in Chil1<sup>-/-</sup> mice (Revised Figure 4C, D; Figure S10A).
Furthermore, immunofluorescence staining for TIM4 combined with TUNEL or cleaved caspase-3 confirmed an increased number of dying KCs in Chil1<sup>-/-</sup> mice compared to WT following HFHC diet feeding (Revised Figure 4E; Figure S10B).
(4) While the Clec4F Cre is specific to KCs, there is also less data about the impact of the Cre system on KC biology. Therefore, when looking at cell death, the authors need to include some mice that express Clec4F cre without the floxed allele to rule out any effects of the Cre itself. In addition, if the cell death phenotype is real, it should also be present in LysM Cre system for the reasons described above. Therefore, the authors should quantify the KC number and dying KCs in this mouse line as well.
We thank the reviewer for raising this important point. During our study, we indeed observed an increased number of KCs in Clec4f-Cre mice compared to WT controls, suggesting that the Clec4f-Cre system itself may modestly affect KC homeostasis. To address this, we compared KCs numbers between Clec4f<sup>∆Chil1</sup> and Clec4f-Cre mice and found that Clec4f<sup>∆Chil1</sup> mice displayed a significant reduction in KCs numbers following HFHC diet feeding. Moreover, co-staining for TIM4 and TUNEL revealed a marked increase in KCs death in Clec4f<sup>∆Chil1</sup> mice relative to Clec4f-Cre mice, indicating that the observed phenotype is attributable to Chil1 deletion rather than Cre expression alone. These data have been reported in our related manuscript (He et al., bioRxiv, 2025.09.26.678483; doi: 10.1101/2025.09.26.678483).
In addition, we quantified KCs numbers and KCs death in the Lyz2-Cre line. TIM4/TUNEL co-staining showed comparable levels of KCs death between Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> mice (Revised Figure S11B). Consistently, flow cytometry analyses revealed no significant differences in KCs numbers between these two groups before (0 weeks) or after (20 weeks) HFHC diet feeding (Revised Figures S11C, D). As discussed in our response to Comment 1, this may be due to the incomplete deletion of Chi3l1 in KCs (<50%) in the Lyz2-Cre line, which likely attenuates the phenotype.
(5) I am somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. Looking at our own data and those from the Liver Atlas it appears that this gene is primarily expressed in neutrophils. At a minimum, the authors should address the expression of Chil1 in macrophage populations from other publicly available datasets in mouse MASH to validate their findings (several options include - PMID: 33440159, 32888418, 32362324). If expression of Chil1 is not present in these other data sets, perhaps an environmental/microbiome difference may account for the distinct expression pattern observed. Either way, it is important to address this issue.
We thank the reviewer for this insightful comment and agree that analysis of scRNA-seq data, including our own and those reported in the Liver Atlas as well as in the referenced studies (PMID: 33440159, 32888418, 32362324), indicates that Chil1 is predominantly expressed in neutrophils.
However, our immunofluorescence staining under normal physiological conditions revealed that Chi3l1 protein is primarily localized in Kupffer cells (KCs), as demonstrated by strong co-staining with TIM4 (Revised Figure 1E). In MASLD mouse models induced by HFHC or MCD diets, we observed that both KCs and monocyte-derived macrophages (MoMFs) express Chi3l1, with particularly high levels in MoMFs.
We speculate that the apparent discrepancy between scRNA-seq datasets and our in situ findings may reflect differences in cellular proportions and detection sensitivity. Since hepatic macrophages (particularly KCs and MoMFs) constitute a larger proportion of total liver immune cells compared with neutrophils, their contribution to total Chi3l1 protein levels in tissue staining may appear dominant, despite lower transcript abundance per cell in sequencing datasets. We have included a discussion of this point in the revised manuscript to clarify this distinction (Revised manuscript, page 8,line 341-350 ).
Minor points:
(1) Were there any changes in liver fibrosis or liver fibrosis markers present in these experiments?
We assessed liver fibrosis using Sirius Red staining and α-SMA Western blot analysis.
We found no induction of liver fibrosis in our HFHC-induced MASLD model (Revised Figure S1A, B), but a clear elevation of fibrosis markers in the MCD-induced MASH model (Revised Figure S6A, B).
(2) In Supplementary Figure 3, the authors do a western blot for CHI3L1 in BMDMs. This should also be done for KCs isolated from these mice. Does this antibody work for immunofluorescence? Staining liver tissue would provide valuable information on the expression patterns.
We have included qPCR and western blot for Chi3l1 in isolated primary KCs from Lyz2<sup>∆Chil1</sup> mice. The data show a slight, non-significant reduction in both mRNA and protein levels in KCs (Revised Figure S7B, C). The immunofluorescence staining on liver tissue showed that Chi3l1 is more likely expressed in the plasma membranes of TIM4<sup>+</sup> F4/80<sup>+</sup> KCs both under NCD and HFHC diet (Revised Figure 1E).
(3) What is the impact of MASH diet feeding on Chil1 expression in KCs or in the liver in general?
In both our MASLD and MASH models, diet feeding consistently upregulates Chi3l1 in KCs or in the liver in general (Revised Figure 1F, G, S6C,D).
(4) In Figure S1 the authors show tSNE plots of various monocyte and macrophage genes in the liver. Are these plots both diets together? How do things look when comparing these markers between the STD and HFHC diet? The population of recruited LAMs seems very small for 16 weeks of diet. Moreover, Chil1 should also be shown on these tSNE plots as well.
Yes, these plots are both diets together. When compared separately, the core marker expression is consistent between NCD and HFHC diets. However, the HFHC diet induces a relative increase in KC marker expression within the MoMF cluster, suggesting phenotypic adaptation (Author response image 1A, below). Moreover, Chil1 expression on the t-SNE plot was shown (Author response image 1B, below). However, compared to lineage-specific marker genes, Chi3l1 expression is rather low.
Author response image 1.
Gene expression levels of lineage-specific marker genes in monocytes/macrophages clusters between NCD and HFHC diets. (A) UMAP plots show the scaled expression changes of lineage-specific markers in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression. (B) UMAP plots show the scaled expression changes of Chil1 in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression.
(5) In Figure 5, the authors demonstrate that CHI3L1 binds to glucose. However, given that all chitin molecules bind to carbohydrates, is this a new finding? The data showing that CHI3L is elevated in the serum after diet is interesting. What happens to serum levels of this molecule in KC KO or total macrophage KO mice? Do the authors think it primarily acts as a secreted molecule or in a cell-intrinsic manner?
We thank the reviewer for these insightful comments, which helped us clarify the novelty of our findings.
(1) Novelty of CHI3L1-Glucose Binding:
While chitin-binding domains are known to interact with carbohydrate polymers, our key discovery is that CHI3L1 (YKL-40)—a mammalian chitinase-like protein lacking enzymatic activity—specifically binds to glucose, a simple monosaccharide. This differs fundamentally from canonical binding to insoluble polysaccharides such as chitin and reveals a potential role for CHI3L1 in monosaccharide recognition, linking it to glucose metabolism and energy sensing. We clarified this point in the revised manuscript (page 9, line374-379).
(2) Serum CHI3L1 in Knockout Models:
Consistent with the reviewer’s suggestion, serum Chi3l1 levels are altered in our knockout models:
KC-specific KO (Clec4f<sup>ΔChil1</sup>): Under normal chow, serum CHI3L1 is markedly reduced compared to controls and remains lower following HFHC feeding (Author response image 2A, below), indicating that Kupffer cells are the main source of circulating CHI3L1 under basal and disease conditions.
Macrophage KO (Lyz2<sup>ΔChil1</sup>): No significant changes were observed between Chil1<sup>fl/fl</sup> and Lyz2<sup>ΔChil1</sup> mice under either diet (Author response image 2B, below), likely due to minimal monocyte-derived macrophage recruitment in this HFHC model (see Revised Figure 4C,D).
(3) Secreted vs. Cell-Intrinsic Role:
CHI3L1 predominantly localizes to the KC plasma membrane, consistent with a secreted role, and its serum reduction in KC-specific knockouts supports the physiological relevance of its secreted role. While cell-intrinsic effects have been reported elsewhere, our current data do not address this in KCs and warrant future investigation.
Author response image 2.
Chi3l1 expression in serum before and after HFHC in CKO mice. (A) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> mice before and after 16 weeks’ HFHC diet. n=3 mice/group. (B) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Lyz2ΔChil1 before and after 16 weeks’ HFHC diet. n=3 mice/group.
Reviewer #2 (Public review):
The manuscript from Shan et al., sets out to investigate the role of Chi3l1 in different hepatic macrophage subsets (KCs and moMFs) in MASLD following their identification that KCs highly express this gene. To this end, they utilise Chi3l1KO, Clec4f-CrexChi3l1fl, and Lyz2-CrexChi3l1fl mice and WT controls fed a HFHC for different periods of time.
Major:
Firstly, the authors perform scRNA-seq, which led to the identification of Chi3l1 (encoded by Chil1) in macrophages. However, this is on a limited number of cells (especially in the HFHC context), and hence it would also be important to validate this finding in other publicly available MASLD/Fibrosis scRNA-seq datasets. Similarly, it would be important to examine if cells other than monocytes/macrophages also express this gene, given the use of the full KO in the manuscript. Along these lines, utilisation of publicly available human MASLD scRNA-seq datasets would also be important to understand where the increased expression observed in patients comes from and the overall relevance of macrophages in this finding.
We thank the reviewer for this valuable suggestion and acknowledge the limited number of cells analyzed under the HFHC condition in our original dataset. To strengthen our findings, we have now examined four additional publicly available scRNA-seq datasets— two from mouse models and two from human MASLD patients (Revised Figure S3, manuscript page 4, line 164-172). Across these datasets, the specific cell type showing the highest Chil1 expression varied somewhat between studies, likely reflecting model differences and disease stages. Nevertheless, Chil1 expression was consistently enriched in hepatic macrophage populations, including both Kupffer cells and infiltrating macrophages, in mouse and human livers. Notably, Chil1 expression was higher in infiltrating macrophages compared to resident Kupffer cells, supporting its upregulation during MASLD progression. These additional analyses confirm the robustness and crossspecies relevance of our finding that macrophages are the primary Chil1-expressing cell type in the liver.
Next, the authors use two different Cre lines (Clec4f-Cre and Lyz2-Cre) to target KCs and moMFs respectively. However, no evidence is provided to demonstrate that Chil1 is only deleted from the respective cells in the two CRE lines. Thus, KCs and moMFs should be sorted from both lines, and a qPCR performed to check the deletion of Chil1. This is especially important for the Lyz2-Cre, which has been routinely used in the literature to target KCs (as well as moMFs) and has (at least partial) penetrance in KCs (depending on the gene to be floxed). Also, while the Clec4f-Cre mice show an exacerbated MASLD phenotype, there is currently no baseline phenotype of these animals (or the Lyz2Cre) in steady state in relation to the same readouts provided in MASLD and the macrophage compartment. This is critical to understand if the phenotype is MASLD-specific or if loss of Chi3l1 already affects the macrophages under homeostatic conditions.
We thank the reviewer for raising this important point.
(1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:
We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.
For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.
Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.
(2) Whether the phenotype is MASLD-specific or whether loss of Chi3l1 already affects the macrophages under homeostatic conditions: We now included phenotypic data of Clec4f<sup>ΔChil1</sup> mice (KC-specific KO) and Lyz2<sup>∆Chil1</sup> mice (MoMFs-specific KO) fed with NCD 16w (Revised Figure 2A-F, S8A-F). Shortly speaking, there is no baseline difference between Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> or Lyz2<sup>∆Chil1</sup> mice in steady state in relation to the same readouts provided in MASLD.
Next, the authors suggest that loss of Chi3l1 promotes KC death. However, to examine this, they use Chi3l1 full KO mice instead of the Clec4f-Cre line. The reason for this is not clear, because in this regard, it is now not clear whether the effects are regulated by loss of Chi3l1 from KCs or from other hepatic cells (see point above). The authors mention that Chi3l1 is a secreted protein, so does this mean other cells are also secreting it, and are these needed for KC death? In that case, this would not explain the phenotype in the CLEC4F-Cre mice. Here, the authors do perform a basic immunophenotyping of the macrophage populations; however, the markers used are outdated, making it difficult to interpret the findings. Instead of F4/80 and CD11b, which do not allow a perfect discrimination of KCs and moMFs, especially in HFHC diet-fed mice, more robust and specific markers of KCs should be used, including CLEC4F, VSIG4, and TIM4.
We thank the reviewer for raising this important point. We performed experiments in Clec4f<sup>∆Chil1</sup> (KC-specific KO) model. The phenotype in these mice closely mirrors that of the full KO: we observed a significant reduction in KC numbers and a concurrent increase in KC cell death following an HFHC diet in Clec4f<sup>∆Chil1</sup> mice post HFHC diet compared to Clec4f-cre mice. We have reported these data in the following related manuscript (Figure 6 D-G). This confirms that the loss of CHI3L1 specifically from KCs is sufficient to drive this effect.
Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483
While other hepatic cells (e.g., neutrophils and liver sinusoidal endothelial cells) also express Chi3l1, our data indicate that KC-secreted Chi3l1 plays a dominant and cellautonomous role in maintaining KCs viability. The potential contribution of other cellular sources to this phenotype remains an interesting direction for future study.
We apologize for the lack of clarity in our initial immunophenotyping. We have revised the flow cytometry data to clearly show that KCs are rigorously defined as TIM4+ cells (Revised Figure 4C, D).
Additionally, while the authors report a reduction of KCs in terms of absolute numbers, there are no differences in proportions. Thus, coupled with a decrease also in moMF numbers at 16 weeks (when one would expect an increase if KCs are decreased, based on previous literature) suggests that the differences in KC numbers may be due to differences in total cell counts obtained from the obese livers compared with controls. To rule this out, total cell counts and total live CD45+ cell counts should be provided. Here, the authors also provide tunnel staining in situ to demonstrate increased KC death, but as it is typically notoriously difficult to visualise dying KCs in MASLD models, here it would be important to provide more images. Similarly, there appear to be many more Tunel+ cells in the KO that are not KCs; thus, it would be important to examine this in the CLEC4F-Cre line to ascertain direct versus indirect effects on cell survival.
We thank the reviewer for raising this important point. We have now included the total cell counts and total live CD45<sup>+</sup> cell counts, which showed similar numbers between WT and Chil1<sup>-/-</sup> mice post HFHC diet (Figure 3A, below).
Moreover, we included cleavaged caspase 3 and TIM4 co-staining in WT and Chil1<sup>-/-</sup> mice before and after HFHC diets, which confirmed increased KCs death in Chil1<sup>-/-</sup> mice (Revised Figure S10B). We have compared KCs number and KCs death between Clec4fcre and Clec4f<sup>∆Chil1</sup> mice under NCD and HFHC diet in the following manuscript (Figure 6 D-G). The data showed similar KCs number under NCD and reduced KCs number in Clec4f<sup>∆Chil1</sup> mice compared to Clec4f-cre mice, which confirms direct effects of Chi3l1 on cell survival but not because of cre insertion.
Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483
Author response image 3.
Number of total cells and total live CD45+ cells in liver of WT and Chil1<sup>-/-</sup> mice. (A) Number of total cells and total live CD45+ cells/liver were statistically analyzed. n= 3-4 mice per group.
Finally, the authors suggest that Chi3l1 exerts its effects through binding glucose and preventing its uptake. They use ex vivo/in vitro models to assess this with rChi3l1; however, here I miss the key in vivo experiment using the CLEC4F-Cre mice to prove that this in KCs is sufficient for the phenotype. This is critical to confirm the take-home message of the manuscript.
We agree that it is essential to confirm the in vivo relevance of Chi3l1-mediated glucose regulation in Kupffer cells (KCs). Our data suggest that KCs undergo cell death not because they express Chi3l1 per se, but because they exhibit a glucose-hungry metabolic phenotype that makes them uniquely dependent on Chi3l1-mediated regulation of glucose uptake. To directly assess this mechanism in vivo, we injected 2-NBDG, a fluorescent glucose analog, into overnight-fasted and refed mice and quantified its uptake in hepatic KCs. Notably, Chi3l1-deficient KCs exhibited significantly increased 2-NBDG uptake compared with controls, and this effect was markedly suppressed by co-treatment with recombinant Chi3l1 (rChi3l1) (Revised Figure 6G, H). These findings demonstrate that Chi3l1 regulates glucose uptake by KCs in vivo, supporting our proposed mechanism that Chi3l1 controls KC metabolic homeostasis through modulation of glucose availability.
Minor points:
(1) Some key references of macrophage heterogeneity in MASLD are not cited: PMID: 32362324 and PMID: 32888418.
We thank the reviewer for highlighting these critical references and have included them in the introduction (Revised manuscript, page 2, line 64-73).
(2) In the discussion, Figure 3H is referenced (Serum data), but there is no Figure 3H. If the authors have this data (increased Chi3l1 in serum of mice fed HFHC diet), what happens in CLEC4F-Cre mice fed the diet? Is this lost completely? This comes back to the point regarding the specificity of expression.
We apologize for the mistake. It should be Figure 5F now in the revised version, in which serum Chi3l1 was significantly upregulated after HFHC diet. Moreover, under a normal chow diet (NCD), serum CHI3L1 is significantly lower in Clec4f<sup>ΔChil1</sup> mice compared to controls (Chil1<sup>fl/fl</sup>). Following an HFHC diet, levels increase in both genotypes but remain relatively lower in the KC-KO mice (please see Figure 2A above). This data strongly suggests that Kupffer Cells (KCs) are the primary source of serum CHI3L1 under basal conditions and a major contributor during MASLD progression.
Reviewer #3 (Public review):
This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.
(1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.
We thank the reviewer for this insightful comment regarding the novelty of our findings. We agree that Chi3l1 has previously been linked to macrophage survival and function in models of liver injury and fibrosis (e.g., PMID: 37166517, 31250532). However, our study focuses specifically on the early stage of MASLD, prior to the onset of fibrosis, revealing a distinct mechanistic role for CHI3L1 in this context.
We demonstrate that CHI3L1 directly interacts with extracellular glucose to regulate its cellular uptake—a previously unrecognized biochemical function. Furthermore, we show that CHI3L1’s protective role is metabolically dependent, safeguarding glucose-dependent Kupffer cells (KCs) but not monocyte-derived macrophages (MoMFs). This metabolic dichotomy and the direct link between CHI3L1 and glucose sensing represent conceptual advances beyond previous studies of CHI3L1 in fibrotic or injury models.
(2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.
We thank the reviewer for this important comment and the opportunity to clarify both the efficiency and specificity of our conditional knockouts, as well as the differences from the study by Feldstein’s group (PMID: 37166517).
(1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:
We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting that Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.
For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.
Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.
(2) Explanation for Differences from Feldstein et al. (PMID: 37166517):
Our findings differ from those reported by Feldstein’s group primarily due to differences in disease stage and model. We used a high-fat, high-cholesterol (HFHC) diet to model earlystage MASLD characterized by steatosis and inflammation without fibrosis (Revised Figure S1A,B). In this context, we observed KC death but minimal MoMF infiltration (Revised Figure 4D). Accordingly, deletion of Chi3l1 in MoMFs (Lyz2<sup>∆Chil1</sup>) had no measurable effect on insulin resistance or steatosis, consistent with limited MoMF involvement at this stage. In contrast, the Feldstein study employed a CDAA-HFAT diet that models later-stage MASH with fibrosis. In that setting, Lyz2<sup>∆Chil1</sup> mice showed reduced recruitment of neutrophils and MoMFs, which likely underlies the attenuation of fibrosis and disease severity reported. Together, these data support a model in which KCs and MoMFs play temporally distinct roles during MASLD progression: KCs primarily drive early lipid accumulation and metabolic dysfunction, whereas MoMFs contribute more substantially to inflammation and fibrosis at later stages.
(3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.
We thank the reviewer for this valuable suggestion to validate our findings in an additional MASH model. We have now included data from a methionine- and choline-deficient (MCD) diet–induced MASH model, which exhibits pronounced hepatic lipid accumulation and fibrosis (Revised Figure S6A,B). Consistent with our HFHC results, Clec4f<sup>∆Chil1</sup> mice displayed exacerbated MASH progression in this model, including increased lipid deposition, inflammation, and fibrosis (Revised Figure S6E-G).These findings confirm that CHI3L1 deficiency in Kupffer cells promotes hepatic lipid accumulation and disease progression across distinct MASLD/MASH models.
(4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.
We thank the reviewer for this important comment regarding translational relevance. We fully agree that validation in human liver samples would further strengthen our study. However, obtaining tissue from early-stage steatotic livers is challenging due to the asymptomatic nature of this disease stage. Nonetheless, multiple studies have consistently reported Chi3l1 upregulation in human fibrotic and steatotic liver disease (PMID: 31250532, 40352927, 35360517), supporting the clinical significance of our mechanistic findings. We have now expanded the Discussion to highlight these human data and better contextualize our results within the spectrum of human MASLD/MASH progression (Revised manuscript, page 9, line390-394).
Minor points:
The authors need to follow the new nomenclature (e.g., MASLD instead of MAFLD, e.g., in Figure 1).
"MASLD" used throughout.
We thank the reviewers for their rigorous critique again. We thank eLife for fostering an environment of fairness and transparency that enables authors to communicate openly and present their data honestly.
Reference
(1) Tran, S. Baba I, Poupel L, et al(2020) Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.
Table 3.1: Token usage and estimated cost per model
@Valentin some confusion here. In some cases we have pro on 3 papers. In some cases we have pro on 40 or so papers
Author response:
The following is the authors’ response to the original reviews.
We are grateful for the insightful and constructive feedback received from reviewers. As outlined in our previous response to the public reviews of the manuscript, we have made only minor changes to the manuscript to clarify some points noted by Reviewers 1 and 3. Firstly, we identify the DUB shown in the correlation plot (Fig 3B) - whose knockdown enhances PROTAC sensitivity without significantly altering cell cycle progression - as BAP1. Secondly, we explain in more detail how we selected DUB hits for further study, and thirdly, we acknowledge that the result in Figure 5G is unexpected given prevailing knowledge in the field.
Please see below the detailed list of changes we have made to the manuscript.
In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author)
We have labelled one of the hits (as BAP1) in Figure 3B
In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author) and Reviewer 3 (Point 6 in recommendations to authors)
We have rewritten our description of Figure 3 in order to make clarifications about how we selected which hits to take forwards in our study
In response to Reviewer 3 (Point 1 in the recommendation to authors)
We corrected a typo in the first subtitle of the results section
In response to Reviewer 3 (Point 2 in the recommendation to authors)
We added information requested about how we selected our top hits
In response to Reviewer 1 (Point 4 in public review and Point 4 in recommendation to authors)
We pointed out the seemingly contradictory nature of the UCHL5 result in Figure 5G for the reader
All of the changes have been aimed at clarifying our narrative, without any change to data content, analysis or interpretation, and we hope these improvements can be agreed by editorial review.
Reviewer #1 (Public review):
Summary:
Wang, Po-Kai et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT and CEP120), as well as the use of dominant negative constructs to inhibit centrosomal microtubule nucleation did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.
Strengths:
The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable. The authors have provided a detailed analysis of many of these components at defined stages of polarity establishment, and well demonstrate that centrosomes are not necessary for apical polarity formation, but are involved in the efficient establishment of the apical membrane.
Weaknesses:
Key questions remain regarding the structure of the intracellular cytoskeleton following depletion of centrosomes, centrosome proteins,or abrogation of centrosome microtubule nucleation. The authors strengthen their model that centrosomes are positioned independently of microtubule nucleation using dominant negative Cdk5RAP2 and NEDD-1 constructs, however, the structure of the intracellular microtubule network remains unresolved and will be an important avenue for future investigation.
Reviewer #3 (Public review):
Here the Wang et al resubmit their manuscript describing the events in the establishment of polarity in MDCK cells cultured in vitro. As with the original version, the description is throughout and is important to the field to report as it establishes a hierarchy of events in polarization, placing Par3 upstream of centrosome positioning and apical membrane component trafficking. Unfortunately, in the revised version, the authors addressed almost none of my points. They did a cursory job of responding in the rebuttal letter but made little attempt to actually address what was being asked or to incorporate any of my suggestions into the manuscript. The particularly egregious examples are cited below:
Comments on revisions:
(1) My original main experimental concern was not addressed: I had originally asked what role microtubules play in the process of polarization (either centrosomal or non-centrosomal). An obvious model is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. Centrosomes might be also be pulled to the AMIS via cortically derived microtubules as is the case in the C. elegans intestine where the centrosome moves apically on apical microtubules via dynein directed transport to the cortically anchored minus ends. The authors do not explore the role of microtubules in the revision, citing that it was not possible to observe the microtubules directly or to perform nocodazole experiments during polarization. Instead, the authors use a relatively new genetic tool to disrupt centrosomal microtubules. They appear to succeed in displacing centrosomal g-tubulin using this tool, but without being able to observe microtubules, a remaining caveat of this experiment is that it is still unclear whether the authors have removed centrosomal microtubules. Compounding this issue is that this tool has never been used in MDCK cells. The authors conclude "we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically.", but they have not shown this, instead the data suggest this conclusion and the authors should acknowledge the caveat that they have no idea whether centrosomal microtubules are abolished. Similarly, the authors also state: "Additionally, although PCNT knockout cells show reduced microtubule nucleation ability, they still recruit a small amount of γ-tubulin". Where are the data that show that microtubule nucleation is reduced in these PCNT knock out cells?
(2) Many of my comments were addressed in the rebuttal, but not in the text.<br /> The non-centrosomal GP135 in Figure 2 is not acknowledged or explained.
That the polarity index does not actually measure polarity, but nuclear-centrosome distance is not acknowledged or explained in the paper.
I still don't believe that the quantification in Figure 3D matches the images I am being shown in Figure 3A. In the centrinone treatment condition, there is certainly an enrichment of GP135 at the AMIS that is not detected in the quantification. The method described in the rebuttal might miss this enrichment if it is offset from line drawn between the centroid of the two nuclei.
Cell height changes in the centrosome depleted cysts are still referenced in the text ("the cell heights of the centrosome-depleted cysts are less uniform"), but no specific data or image is called out. Currently, Figure 3G is referenced, but that is a graph of GP135 intensity
In my original review, I called on the authors to comment on the striking similarity of the mechanisms they documented in MDCK cells to what has been shown in in vivo systems. The authors did not do this, instead restating in the rebuttal some features of what they found. But, the mechanisms shown here are remarkably similar to the polarization of primordia that generate tubular organs in vivo. Perhaps most striking is the similarity to the C> elegans intestine where Par3 localizes to the cortex at the site of an apical MTOC that pulls the centrosome to the apical surface via dynein (Feldman and Priess, 2012). Instead of discussing this similarity, the authors state: "Par3 is likely to regulate centrosome positioning through some intermediate molecules or mechanisms, but its specific mechanism is still unclear and requires further investigation." Given the acetylated tubulin signal emanating from the Par3 positive patch in Figure 5E and F, I suspect similar mechanisms to the C. elegans intestine are at play here. Such a parallel should be noted in the Discussion.
I had originally commented that "I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?" The authors responded that "The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome". In Figure 6G, the localization of GP135 to the centrosome appears significantly delayed compared to its localization to the centrosome in images where cells were cultured in Matrigel. Indeed, the authors argue that the centrosomal localization precedes and contributes to its localization to the AMIS. In the absence of ECM, GP135 localizes to the membrane before it localizes to the centrosome and its localization to the centrosome appears significantly reduced. Thus, my original and current interpretation is that ECM signaling is somehow required for the centrosomal targeting of GP135. One could make a competition argument, i.e. that the cortex in the absence of ECM is somehow a more desirable place to localize than the centrosome, but this experiment also argues that the centrosome does not need to be a source of this material in order for it to end up on the cortex.
(3) There needs to be precision in the language used in many places:
I don't understand this line in the abstract: "When cultured in Matrigel, de novo polarization of a single epithelial cell is often coupled with mitosis." If a cell has divided, it is no longer a single cell.
The authors state in the Introduction "Because of its strong ability to nucleate microtubules, the centrosome functions as the primary microtubule organizing center", but then state ""In polarized epithelial cells, the centrosome is localized at the apical region during interphase, which contributes to the construction of an asymmetric microtubule network conducive to polarized vesicle trafficking". In the latter statement, I assume the authors are describing the well-characterized apical microtubule network in epithelial cells that is non-centrosomal. Thus, the latter sentence is at odds with the former.
The authors continually refer to Par3 as a tight junction protein. "Par3, which controls tight junction assembly to partition the apical surface from the basolateral surface". To my knowledge, PARD3 is an apical protein with similar localization to C. elegans PAR-3 and Drosophila Bazooka. PARD3B is a junctional protein. I assume that the antibody that the authors are using is to PARD3 and not PARD3B? Can the authors please clarify this in the text.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
Wang, Po-Kai, et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning, and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT, and CEP120) did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.
Strengths:
The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable to the community. Wang et al., have provided a detailed analysis of many of these components at defined stages of polarity establishment. Furthermore, the generation of PCNT, p53, ODF2, Cep120, and Cep164 knockout MDCK cell lines is likely valuable to the community.
Weaknesses:
Additional quantifications would highly improve this manuscript, for example it is unclear whether the centrosome perturbation affects gamma tubulin levels and therefore microtubule nucleation, it is also not clear how they affect the localization of the trafficking machinery/polarity proteins. For example, in Figure 4, the authors measure the intensity of Gp134 at the apical membrane initiation site following cytokinesis, but there is no measure of Gp134 at the centrosome prior to this.
We thank the reviewer for this important suggestion. Previous studies have shown that genes encoding appendage proteins and CEP120 do not regulate γ-tubulin recruitment to centrosomes (Betleja, Nanjundappa, Cheng, & Mahjoub, 2018; Vasquez-Limeta & Loncarek, 2021). Although the loss of PCNT reduces γ-tubulin levels, this reduction is partially compensated by AKAP450. Even in the case of PCNT/AKAP450 double knockouts, low levels of γ-tubulin remain at the centrosome (Gavilan et al., 2018), suggesting that it is difficult to completely eliminate γ-tubulin by perturbing centrosomal genes alone.
To directly address this question, in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we employed a recently reported method to block γ-tubulin recruitment by co-expressing two constructs: the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). This approach effectively depleted γ-tubulin and abolished microtubule nucleation at the centrosome (Vinopal et al., 2023). Interestingly, despite the reduced efficiency of apical vesicle trafficking, these cells were still able to establish polarity, with centrioles positioned apically. These results suggest that microtubule nucleation at the centrosomes (centrosomal microtubules) facilitates—but is not essential for—polarity establishment.
Regarding Figure 4, we assume the reviewer was referring to Gp135 rather than Gp134. In the revised manuscript (Page 8, Paragraph 2; Figure 4I), we observed a slight decrease in Gp135 intensity near PCNT-KO centrosomes at the pre-Abs stage. However, its localization at the AMIS following cytokinesis remained unaffected. These results suggest that the loss of PCNT has a limited impact on Gp135 localization.
Reviewer #2 (Public review):
Summary:
The authors decoupled several players that are thought to contribute to the establishment of epithelial polarity and determined their causal relationship. This provides a new picture of the respective roles of junctional proteins (Par3), the centrosome, and endomembrane compartments (Cdc42, Rab11, Gp135) from upstream to downstream.
Their conclusions are based on live imaging of all players during the early steps of polarity establishment and on the knock-down of their expression in the simplest ever model of epithelial polarity: a cell doublet surrounded by ECM.
The position of the centrosome is often taken as a readout for the orientation of the cell polarity axis. There is a long-standing debate about the actual role of the centrosome in the establishment of this polarity axis. Here, using a minimal model of epithelial polarization, a doublet of daugthers MDCK cultured in Matrigel, the authors made several key observations that bring new light to our understanding of a mechanism that has been studied for many years without being fully explained:
(1) They showed that centriole can reach their polarized position without most of their microtubule-anchoring structures. These observations challenge the standard model according to which centrosomes are moved by the production and transmission of forces along microtubules.
(2) However) they showed that epithelial polarity can be established in the absence of a centriole.
(3) (Somehow more expectedly) they also showed that epithelial polarity can't be established in the absence of Par3.
(4) They found that most other polarity players that are transported through the cytoplasm in lipid vesicles, and finally fused to the basal or apical pole of epithelial cells, are moved along an axis which is defined by the position of centrosome and orientation of microtubules.
(5) Surprisingly, two non-daughter cells that were brought in contact (for 6h) could partially polarize by recruiting a few Par3 molecules but not the other polarity markers.
(6) Even more surprisingly, in the absence of ECM, Par 3 and centrosomes could move to their proper position close to the intercellular junction after cytokinesis but other polarity markers (at least GP135) localized to the opposite, non-adhesive, side. So the polarity of the centrosome-microtubule network could be dissociated from the localisation of GP135 (which was believed to be transported along this network).
Strengths:
(1) The simplicity and reproducibility of the system allow a very quantitative description of cell polarity and protein localisation.
(2) The experiments are quite straightforward, well-executed, and properly analyzed.
(3) The writing is clear and conclusions are convincing.
Weaknesses:
(1) The simplicity of the system may not capture some of the mechanisms involved in the establishment of cell polarity in more physiological conditions (fluid flow, electrical potential, ion gradients,...).
We agree that certain mechanisms may not be captured by this simplified system. However, the model enables us to observe intrinsic cellular responses, minimize external environmental variables, and gain new insights into how epithelial cells position their centrosomes and establish polarity.
(2) The absence of centriole in centrinone-treated cells might not prevent the coalescence of centrosomal protein in a kind of MTOC which might still orient microtubules and intracellular traffic. How are microtubules organized in the absence of centriole? If they still form a radial array, the absence of a centriole at the center of it somehow does not conflict with classical views in the field.
Previous studies have shown that in the absence of centrioles, centrosomal proteins can relocate to alternative microtubule-organizing centers (MTOCs), such as the Golgi apparatus (Gavilan et al., 2018). Furthermore, centriole loss leads to increased nucleation of non-centrosomal microtubules (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). However, these microtubules typically do not form the classical radial array or a distinct star-like organization.
While this non-centrosomal microtubule network can still support polarity establishment, it does so less efficiently—similar to what is observed in p53-deficient cells undergoing centriole-independent mitosis (Meitinger et al., 2016). Thus, although the absence of centrioles does not completely prevent microtubule-based organization or polarity establishment, it impairs their spatial coordination and reduces overall efficiency compared to a centriole-centered microtubule-organizing center (MTOC).
(3) The mechanism is still far from clear and this study shines some light on our lack of understanding. Basic and key questions remain:
(a) How is the centrosome moved toward the Par3-rich pole? This is particularly difficult to answer if the mechanism does not imply the anchoring of MTs to the centriole or PCM.
Previous studies have shown that Par3 interacts with dynein, potentially anchoring it at the cell cortex (Schmoranzer et al., 2009). This interaction enables dynein, a minus-enddirected motor, to exert pulling forces on microtubules, thereby promoting centrosome movement toward the Par3-enriched pole.
In our experiments (Figure 4), we attempted to disrupt centrosomal microtubule nucleation by knocking out multiple genes involved in centrosome structure and function, including ODF2 and PCNT. Under these perturbations, γ-tubulin still remained detectable at the centrosome, and we were unable to completely eliminate centrosomal microtubules.
To address this question more directly, we employed a strategy to deplete γ-tubulin from centrosomes by co-expressing the centrosome-targeting C-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). As shown in the new data of the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), this approach effectively depleted γ-tubulin from centrosomes, thereby abolishing microtubule nucleation at the centrosome.
Surprisingly, even under these conditions, centrioles remained apically positioned (Page 8, Paragraph 4; Figure 4—figure supplement 3), indicating that centrosomal microtubules are not essential for centrosome movement during polarization.
Given these findings, we agree that the precise mechanism by which the Par3-enriched cortex attracts or guides centrosome movement remains unclear. Although dynein–Par3 interactions may contribute, further studies are needed to elucidate how centrosome repositioning occurs in the absence of microtubule-based pulling forces from the centrosome itself.
(b) What happens during cytokinesis that organises Par3 and intercellular junction in a way that can't be achieved by simply bringing two cells together? In larger epithelia cells have neighbours that are not daughters, still, they can form tight junctions with Par3 which participates in the establishment of cell polarity as much as those that are closer to the cytokinetic bridge (as judged by the overall cell symmetry). Is the protocol of cell aggregation fully capturing the interaction mechanism of non-daughter cells?
We speculate that a key difference between cytokinesis and simple cell-cell contact lies in the presence or absence of actomyosin contractility during the process of cell division. Specifically, contraction of the cytokinetic ring generates mechanical forces between the two daughter cells, which are absent when two non-daughter cells are simply brought together. While adjacent epithelial cells can indeed form tight junctions and recruit Par3, the lack of shared cortical tension and contractile actin networks between non-daughter cells may lead to differences in how polarity is initiated. This mechanical input during cytokinesis may serve as an organizing signal for centrosome positioning. This idea is supported by recent work showing that the actin cytoskeleton can influence centrosome positioning (Jimenez et al., 2021), suggesting that contractile actin structures formed during cytokinesis may contribute to spatial organization in a manner that cannot be replicated by simple aggregation.
In our experiments, we simply captured two cells that were in contact within Matrigel. We cannot say for sure that it captures all the interaction mechanisms of non-daughter cells, but it does provide a contrast to daughter cells produced by cytokinesis.
Reviewer #3 (Public review):
Here, Wang et al. aim to clarify the role of the centrosome and conserved polarity regulators in apical membrane formation during the polarization of MDCK cells cultured in 3D. Through well-presented and rigorous studies, the authors focused on the emergence of polarity as a single MDCK cell divided in 3D culture to form a two-cell cyst with a nascent lumen. Focusing on these very initial stages, rather than in later large cyst formation as in most studies, is a real strength of this study. The authors found that conserved polarity regulators Gp135/podocalyxin, Crb3, Cdc42, and the recycling endosome component Rab11a all localize to the centrosome before localizing to the apical membrane initiation site (AMIS) following cytokinesis. This protein relocalization was concomitant with a repositioning of centrosomes towards the AMIS. In contrast, Par3, aPKC, and the junctional components E-cadherin and ZO1 localize directly to the AMIS without first localizing to the centrosome. Based on the timing of the localization of these proteins, these observational studies suggested that Par3 is upstream of centrosome repositioning towards the AMIS and that the centrosome might be required for delivery of apical/luminal proteins to the AMIS.
To test this hypothesis, the authors generated numerous new cell lines and/or employed pharmacological inhibitors to determine the hierarchy of localization among these components. They found that removal of the centrosome via centrinone treatment severely delayed and weakened the delivery of Gp135 to the AMIS and single lumen formation, although normal lumenogenesis was apparently rescued with time. This effect was not due to the presence of CEP164, ODF2, CEP120, or Pericentrin. Par3 depletion perturbed the repositioning of the centrosome towards the AMIS and the relocalization of the Gp135 and Rab11 to the AMIS, causing these proteins to get stuck at the centrosome. Finally, the authors culture the MDCK cells in several ways (forced aggregation and ECM depleted) to try and further uncouple localization of the pertinent components, finding that Par3 can localize to the cell-cell interface in the absence of cell division. Par3 localized to the edge of the cell-cell contacts in the absence of ECM and this localization was not sufficient to orient the centrosomes to this site, indicating the importance of other factors in centrosome recruitment.
Together, these data suggest a model where Par3 positions the centrosome at the AMIS and is required for the efficient transfer of more downstream polarity determinants (Gp135 and Rab11) to the apical membrane from the centrosome. The authors present solid and compelling data and are well-positioned to directly test this model with their existing system and tools. In particular, one obvious mechanism here is that centrosome-based microtubules help to efficiently direct the transport of molecules required to reinforce polarity and/or promote lumenogenesis. This model is not really explored by the authors except by Pericentrin and subdistal appendage depletion and the authors do not test whether these perturbations affect centrosomal microtubules. Exploring the role of microtubules in this process could considerably add to the mechanisms presented here. In its current state, this paper is a careful observation of the events of MCDK polarization and will fill a knowledge gap in this field. However, the mechanism could be significantly bolstered with existing tools, thereby elevating our understanding of how polarity emerges in this system.
We agree that further exploration of microtubule dynamics could strengthen the mechanistic framework of our study. In our initial experiments, we disrupted centrosome function through genetic perturbations (e.g., knockout of PCNT, CEP120, CEP164, and ODF2). However, consistent with previous reports (Gavilan et al., 2018; Tateishi et al., 2013), we found that single-gene deletions did not completely eliminate centrosomal microtubules. Furthermore, imaging microtubule organization in 3D culture presents technical challenges. Due to the increased density of microtubules during cell rounding, we were unable to obtain clear microtubule filament structures—either using α-tubulin staining in fixed cells or SiR-tubulin labeling in live cells. Instead, the signal appeared diffusely distributed throughout the cytosol.
To overcome this, we employed a recently reported approach by co-expressing the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γtubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). In our new data presented in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to efficient apical transport.
Reference
Betleja, E., Nanjundappa, R., Cheng, T., & Mahjoub, M. R. (2018). A novel Cep120-dependent mechanism inhibits centriole maturation in quiescent cells. Elife, 7. doi:10.7554/eLife.35439
Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942
Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002
Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864
Meitinger, F., Anzola, J. V., Kaulich, M., Richardson, A., Stender, J. D., Benner, C., . . . Oegema, K. (2016). 53BP1 and USP28 mediate p53 activation and G1 arrest after centrosome loss or extended mitotic duration. J Cell Biol, 214(2), 155-166. doi:10.1083/jcb.201604081
Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065
Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071
Vasquez-Limeta, A., & Loncarek, J. (2021). Human centrosome organization and function in interphase and mitosis. Semin Cell Dev Biol, 117, 30-41. doi:10.1016/j.semcdb.2021.03.020
Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Figures:
(1) Figure 3 B+C - Although in comparison to Figure 2 it appears the p53 mutation does not affect θN-C, or Lo-c. the figure would benefit from direct comparison to control cells.
We appreciate your suggestion to improve the clarity of the figure. In response, we have revised Figure 3B+C to include control cell data, allowing for clearer side-by-side comparisons in the updated figures.
(2) Figure 3D - Clarify if both were normalized to time point 0:00 of the p53 KO. The image used appears that Gp135 intensity increases substantially between 0:00 and 0:15 in the figure, but the graph suggests that the intensity is the same if not slightly lower.
Figure 3D – The data were normalized to the respective 0:00 time point for each condition. Because the intensity profile was measured along a line connecting the two nuclei, Gp135 signal could only be detected if it appeared along this line. However, the images shown are maximum-intensity projections, meaning that Gp135 signals from peripheral regions are projected onto the center of the image. This may create the appearance of increased intensity at certain time points (e.g., Figure 3A, p53-KO + CN, 0:00–0:15).
(3) Figure 4A: The diagram does not accurately represent the effect of the mutations, for example, PCNT mutation likely doesn't completely disrupt PCM (given gamma-tubulin is still visible in the staining), but instead results in its disorganization, Cep164 also wouldn't be expected to completely ablate distal appendages.
Thank you for your comment. We have modified the figure in the revised manuscript (Figure 4A) to more clearly depict the defective DAs.
(4) Figure 4 + Supplements: A more in-depth characterization of the mutations would help address the previous comment and strengthen the manuscript. Especially as these components have previously been implicated in centrosome transport.
Thank you for your valuable suggestion. As noted in previous studies, CEP164 is essential for distal appendage function and basal body docking, with its loss resulting in blocked ciliogenesis (Tanos et al., 2013); CEP120 is required for centriole elongation and distal appendage formation, and its loss also results in blocked ciliogenesis (Comartin et al., 2013; Lin et al., 2013; Tsai, Hsu, Liu, Chang, & Tang, 2019); ODF2 functions upstream in the formation of subdistal appendages, and its loss eliminates these structures and impairs microtubule anchoring (Tateishi et al., 2013); and PCNT functions as a PCM scaffold, necessary for the recruitment of PCM components and for microtubule nucleation at the centrosome (Fong, Choi, Rattner, & Qi, 2008; Zimmerman, Sillibourne, Rosa, & Doxsey, 2004).
Given that the phenotypes of these mutants have been well characterized in the literature. Here, we further focus on their roles in centrosome migration and polarized vesicle trafficking within the specific context of our study.
(5) Figure 4: It would be interesting to measure the Gp135 intensity at the centrosomes, given that the model proposes it is trafficked from the centrosomes to the AMIS.
Thank you for your suggestion. We have included measurements of Gp135 intensity at the centrosomes during the Pre-Abs stage in the revised figure (Figure 4I). Our data show no significant differences in Gp135 intensity between wild-type (WT) and CEP164-, ODF2-, or CEP120-knockout (KO) cell lines. However, a slight decrease in Gp135 intensity was observed in PCNT-KO cells.
(6) Figure 6F shows that in suspension culture polarity is reversed, however, in Figure 6G gp135 still localizes to the cytokinetic furrow prior to polarity reversal. Given this paper demonstrates Par-3 is upstream of centrosome positioning, it would be important to have temporal data of how Par-3 localizes prior to the ring observed in 6F.
Thank you for your comment. We have included a temporal analysis of Par3 localization using fixed-cell staining in the revised figure (Figure 6—figure supplement 1D). This analysis shows that Par3 also localizes to the cytokinesis site during the Pre-Abs stage, prior to ring formation observed during the Post-CK stage (Figure 6F). Interestingly, during the Pre-Abs stage, the centrosomes also migrate toward the center of the cell doublets in suspension culture, and Gp135 surrounding the centrosomes is also recruited to a region near the center (Figure 6—figure supplement 1E). These data suggest that Par3 also is initially recruited to the cytokinesis site before polarity reversal, potentially promoting centrosome migration. The main difference from Matrigel culture is the peripheral localization of Par3 and Gp135 in suspension, which is likely due to the lack of external ECM signaling.
Results:
(1) Page 7 Paragraph 1 - consistently use AMIS (Apical membrane initiation site) rather than "the apical site".
Thank you for your helpful comment. We have revised the manuscript (Page 7, Paragraph 1) and will now use "AMIS" (Apical Membrane Initiation Site) instead of "the apical site" throughout the text.
(2) Page 7 Paragraph 4 - A single sentence explaining why the p53 background had to be used for the Cep120 deletion would be beneficial. Did the cell line have a reduced centrosome number? Does this effect apical membrane initiation similar to centrinone?
We have revised the text (Page 7, Paragraph 4) to clarify that we were unable to generate a CEP120 KO line in p53-WT cells for unknown reasons. CEP120-KO cells have a normal number of centrosome, but their centrioles are shorter. Because this KO line still contains centrioles, the effect is different from centrinone treatment, which results in a complete loss of centrioles.
(3) Page 10 paragraph 4 - This paragraph is confusing to read. I understand that in the cysts and epithelial sheet the cytokinetic furrow is apical, therefore a movement towards the AMIS could be due to its coincidence with the furrow. However, the phrasing "....we found that centrosomes move towards the apical membrane initiation site direction before bridge abscission. Taken together these findings indicate the position is strongly associated with the site of cytokinesis but not with the apical membrane" is confusing to the reader.
We have revised the manuscript (Page 11, paragraph 4) to change the AMIS as the center of the cell doublet. During de novo epithelial polarization, the apical membrane has not yet formed at the Pre-Abs stage. However, at the Pre-Abs stage, the centrosome has already migrated toward the site of cytokinesis, suggesting that centrosome positioning is correlated with the site of cell division. A similar phenomenon occurs in fully polarized epithelial cysts and sheets, where the centrosomes also migrate before bridge abscission. Thus, we propose that the position of the centrosome is closely associated with the site of cytokinesis and is independent of apical membrane formation.
Discussion
(1) Page 11, Paragraph 2 - citations needed when discussing previous studies.
Thank you for your suggestion. We have included the necessary references to the discussion of the previous studies in the revised manuscript (Page 12, Paragraph 2).
(2) Page 12, Paragraph 2 - This section of the discussion would be strengthened by discussing the role of the actomyosin network in defining centrosome position (Jimenez et al., 2021). It seems plausible that the differences observed in the different conditions could be due to altered actomyosin architecture. Especially where the cells haven't undergone cytokinesis.
We appreciate the suggestion of a role for the actomyosin network in determining centrosome positioning. Recent studies have indeed highlighted the role of the actomyosin network in regulating centrosome centering and off-centering (Jimenez et al., 2021). During the pre-abscission stage of cell division, the actomyosin network undergoes significant dynamic changes, with the contractile ring forming at the center and actin levels decreasing at the cell periphery. In contrast, under aggregated cell conditions—meaning cells that have not undergone division—the actomyosin network does not exhibit such dynamic changes. The loss of actomyosin remodeling may therefore influence whether the centrosome moves. Thus, alterations in actomyosin architecture may contribute to the differences observed under various conditions, particularly when cells have not yet completed cytokinesis. We have revised Paragraph 2 on Page 13 to briefly mention the referenced study and to propose that the actomyosin network may influence centrosome positioning, contributing to our observed results. This addition strengthens the discussion and clarifies our findings.
(3) Page 12 paragraph 3 - Given that centrosome translocation during cytokinesis in MDCK cells (this study) appears to be similar to that observed in HeLa cells and the zebrafish Kupffers vesicle (Krishnan et al., 2022) it would be interesting to discuss why Rab11a and PCNT may not be essential to centrosome positioning in MDCK cells.
Thank you for your insightful comment. We agree that it is interesting that centrosome translocation during cytokinesis in MDCK cells (as observed in our study) is similar to that observed in HeLa cells and zebrafish Kupffer's vesicle (Krishnan et al., 2022). However, there are notable differences between these systems that may help explain why Rab11a and PCNT are not essential for centrosome positioning in MDCK cells.
Our study used 3D culture of MDCK cells, while the reference study examined adherent culture of HeLa cells. In the adherent culture, cells attached to the culture surface form large actin stress fibers on their basal side, which weakens the actin networks in the apical and intercellular regions. In contrast, the 3D culture system used in our study better preserves cell polarity and the integrity of the actin network, which might contribute to centrosome positioning independent of Rab11a and PCNT. Differences in culture conditions and actin network architecture may explain why Rab11a and PCNT are not required for centrosome positioning in MDCK cells.
Furthermore, the referenced study focused on Rab11a and PCNT in zebrafish embryos at 3.3–5 hours post-fertilization (hpf), a time point before the formation of the Kupffer’s vesicle. At this stage, the cells they examined may not yet have become epithelial cells, which may also influence the requirement of Rab11a and PCNT for centrosome positioning. We hypothesize that during the pre-abscission stage, centrosome migration toward the cytokinetic bridge occurs primarily in epithelial cells, and that the polarity and centrosome positioning mechanisms in these cells may differ from those in other cell types, such as zebrafish embryos.
Furthermore, data from Krishnan et al. (2022) suggest that cytokinesis failure in pcnt+/- heterozygous embryos and Rab11a functional-blocked embryos may be due to the presence of supernumerary centrosomes. Consistent with this, our data show that blocking cytokinesis inhibits centrosome movement in MDCK cells. However, in our MDCK cell lines with PCNT or Rab11a knockdown, we did not observe significant cytokinesis failure, and centrosome migration proceeded normally.
Reviewer #2 (Recommendations for the authors):
Suggestions for experiments:
(1) A description of the organization of microtubules in the absence of centriole, or in the absence of ECM would be interesting to understand how polarity markers end up where you observed them. This easy experiment may significantly improve our understanding of this system.
Previous studies have shown that in the absence of centrioles, microtubule organization undergoes significant changes. Specifically, the number of non-centrosomal microtubules increases, and these microtubules are not radially arranged, leading to the absence of focused microtubule organizing centers in centriolar-deficient cells (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). This disorganized microtubule network reduces the efficiency of vesicle transport during de novo epithelial polarization at the mitotic preabscission stage.
In contrast, the organization of microtubules under ECM-free conditions remains less well characterized. Here, we show that while the ECM plays a critical role in establishing the direction of epithelial polarity, it does not influence the positioning of the centrosome, the microtubule-organizing center (MTOC).
(2) Would it be possible to knock down ODF2 and pericentrin to completely disconnect the centrosome from microtubules?
ODF2 is the base of subdistal appendages. When ODF2 is knocked out, it affects the recruitment of all downstream proteins to the subdistal appendages (Mazo, Soplop, Wang, Uryu, & Tsou, 2016). One study has shown that ODF2 knockout cells almost completely lost subdistal appendage structures and significantly reduced the microtubule asters surrounding the centrioles (Tateishi et al., 2013). However, although pericentrin (PCNT) is the main scaffold of the pericentriolar matrix (PCM) of centrosomes, the microtubule organization ability of centrosomes can be compensated by AKAP450, a paralog of PCNT, after PCNT knockout. A previous study has even shown that in cells with a double knockout of PCNT and AKAP450, γ-tubulin can still be recruited to the centrosomes, and centrosomes can still nucleate microtubules (Gavilan et al., 2018). This suggests that there are other proteins or pathways that promote microtubule nucleation on centrosomes. We are unsure whether the triple knockout of ODF2, PCNT, and AKAP450 can completely disconnect the centrosome from microtubules. However, a recent study reported a simpler approach involving the expression of dominant-negative fragments of the γ-tubulinbinding protein NEDD1 and the activator CDK5RAP2 at the centrosome (Vinopal et al., 2023). In our revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we applied this strategy, which resulted in the depletion of nearly all γ-tubulin from the centrosome. This indicates a strong suppression of centrosomal microtubule nucleation and an effective disconnection of the centrosome from the microtubule network.
(3) The study does not distinguish the role of cytokinesis from the role of tight junctions, which form only after cytokinesis and not simply by bringing cells into contact. Would it be feasible and interesting to study the polarization after cytokinesis in cells that could not form tight junctions (due to the absence of Ecad or ZO1 for example)?
Studying cell polarization after cytokinesis in cells unable to form tight junctions is a promising area of research.
Recent studies have shown that mouse embryonic stem cells (mESCs) cultured in Matrigel can form ZO-1-labelled tight junctions at the midpoint of cell–cell contact even in the absence of cell division. However, in the absence of E-cadherin, ZO-1 localization is significantly impaired. Interestingly, despite the loss of E-cadherin, the Golgi apparatus and centrosomes remain oriented toward the cell–cell interface (Liang, Weberling, Hii, Zernicka-Goetz, & Buckley, 2022). These findings suggest that cell polarity can be maintained independently of tight junction formation, highlighting the potential value of studying cell polarization that lack tight junctions.
Furthermore, while studies have explored the effects of knocking down tight junction components such as JAM-A and Cingulin on lumen formation in MDCK 3D cultures (Mangan et al., 2016; Tuncay et al., 2015), the role of ZO-1 in this context remains underexplored. Cingulin knockdown has been shown to disrupt endosome targeting and the formation of the AMIS, while both JAM-A and Cingulin knockdown result in actin accumulation at multiple points, leading to the formation of multi-lumen structures rather than a reversal of polarity. However, previous research has not specifically investigated centrosome positioning in JAM-A and Cingulin knockdown cells, an area that could provide valuable insights into how polarity is maintained in the absence of tight junctions.
Writing details:
(1) The migration of the centrosome in the absence of appendages or PCM is proposed to be ensured by compensatory mechanisms ensuring the robustness of microtubule anchoring to the centrosome. It could also be envisaged that the centrosome motion does not require this anchoring and that other yet unknown moving mechanisms, based on an actin network for example, might exist.
Thank you for your valuable comments. We agree that there may indeed be some unexpected mechanisms that allow centrosomes to move independently of microtubule anchoring to the centrosome, such as mechanisms based on actin filaments or noncentrosomal microtubules; these mechanisms are worth further investigation.
In response to your suggestion, in the Paragraph 5 of the discussion section, we further clarified that while a microtubule anchoring mechanism might be one explanation, other mechanisms could also influence centrosome movement in the absence of appendages or PCM. Additionally, we revised the Paragraph 4 regarding the possibility of actin network-driven centrosome movement and emphasized the importance of future research for a deeper understanding of these processes.
(2) The actual conclusion of the study of Martin et al (eLife 2018) is not simply that centrosome is not involved in cell polarization but that it hinders cell polarization!
Thank you for your valuable feedback. We agree with the findings of Martin et al. (eLife 2018) that centrosome is not irrelevant to cell polarity, but rather they inhibit cell polarization. Therefore, we have revised the manuscript (Page 2, Paragraph 2) to more accurately reflect this viewpoint.
(3) This study recalls some conclusions of the study by Burute et al (Dev Cell 2017), in particular the role of Par3 in driving centrosome toward the intercellular junction of daughter cells after cytokinesis. It would be welcome to comment on the results of this study in light of their work.
Thank you for your valuable feedback. The study by Burute et al. (Dev Cell, 2017) showed that in micropattern-cultures of MCF10A cells, the cells exhibit polarity and localize their centrosomes towards the intercellular junction, while downregulation of Par3 gene expression disrupts this centrosome positioning. This result is similar to our findings in 3D cultured MDCK cells and consistent with previous studies in C. elegans intestinal cells and migrating NIH 3T3 cells (Feldman & Priess, 2012; Schmoranzer et al., 2009), indicating that Par3 indeed influences centrosome positioning in different cellular systems. However, Par3 does not directly localize to the centrosome; rather, it localizes to the cell cortex or cell-cell junctions. Therefore, Par3 likely regulates centrosome positioning through other intermediary molecules or mechanisms, but the specific mechanism remains unclear and requires further investigation.
(4) Could the term apico-basal be used in the absence of a basement membrane to form a basal pole?
We understand that using the term "apico-basal" in the absence of a basement membrane might raise some questions. Traditionally, the apico-basal axis refers to the polarity of epithelial cells, where the apical surface faces the lumen or external environment, and the basal surface is oriented toward the basement membrane. However, in the absence of a basement membrane, such as in certain in vitro systems or under specific experimental conditions, polarity along a similar axis can still be observed. In such cases, the term "apico-basal" can still be used to describe the polarity between the apical domain and the region where it contacts the substrate or adjacent cells.
(5) The absence of centrosome movement to the intercellular bridge in spread cells in culture is not so surprising considering the work of Lafaurie-Janvore et al (Science 2018) about the role of cell spreading in the regulation of bridge tension and abscission delay.
Thank you for your valuable comment. Indeed, previous studies have shown that in some cell types, the centrosome does move toward the intercellular bridge in spread cells (Krishnan et al., 2022; Piel, Nordberg, Euteneuer, & Bornens, 2001), but other studies have suggested that this movement may not be significant and it may not occur in universally observed across all cell types (Jonsdottir et al., 2010). In our study, we aim to demonstrate that this phenomenon is more pronounced in 3D culture systems compared to 2D spread cell culture systems. Previous studies and our work have observed that centrosome migration occurs during the pre-abscission stage, but whether this migration is directly related to cytokinetic bridge tension or the time of abscission remains an open question. Further research is needed to explore the potential relationship between centrosome positioning, cytokintic bridge tension, and the timing of abscission.
(6) GP135 (podocalyxin) has been proposed to have anti-adhesive/lubricant properties (hence its pro-invasive effect). Could it be possible that once localized at the cell surface it is systematically moved away from regions that are anchored to either the ECM or adjacent cells? So its localization away from the centrosome in an ECM-free experiment would not be a consequence of defective targeting but relocalization after reaching the plasma membrane?
Thank you for your valuable comment. We agree that GP135 may indeed move directly across the cell surface, away from the region where it interacts with the ECM or adjacent cells. This re-localization could be due to its anti-adhesive or lubricating properties, which may facilitate its displacement from these adhesive sites. To validate this, it is necessary to employ higher-resolution real-time imaging system to observe the dynamic behavior of GP135 on the cell surface.
However, this does not contradict our main conclusion. Under suspension culture conditions without ECM, the centrosome positioning in cell doublets is indeed decoupled from apical membrane orientation. This suggests that the localization of the centrosome and the apical membrane is regulated by different mechanisms. Specifically, the GP135 protein tends to accumulate away from areas of contact with the ECM or adjacent cells, possibly through movement within the cell membrane or by recycling endosome transport. In contrast, centrosome positioning is closely related to the cytokinesis site. Our study clearly elucidates the differences between these two polarity properties.
Reviewer #3 (Recommendations for the authors):
Major:
(1) To me, a clear implication of these studies is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. The authors do not explore this model except to say that depletion of SD appendage or pericentrin has no effect on the protein relocalization to the AMIS. However, the authors do not observe microtubule association with the centrosome in these KO conditions. This analysis is imperative to interpret existing results since these are new KO conditions in this cell/culture system and parallel pathways (e.g. CDK5RAP2) are known to contribute to microtubule association with the centrosome. An ability to comment on the mechanism by which the centrosome contributes to the efficiency of polarization would greatly enhance the paper.
Microtubule requirement could also be tested in numerous additional ways requiring varying degrees of new experiments:
(a) faster live cell imaging at abscission to see if the deposition of those components appears to traffic on MTs;
(b) live cell imaging with microtubules (e.g. SPY-tubulin) and/or EB1 to determine the origin and polarity of microtubules at the pertinent stages;
For (a) and (b), because the cells were cultured in Matrigel, they tended to be round up, with a dense internal structure that made observation difficult. In contrast, under adherent culture conditions, the cells were flattened with a more dispersed internal structures, making them easier to observe. We had previously used SPY-tubulin to label microtubules for live cell imaging; however, due to the dense microtubule structure in 3D culture, the image contrast was reduced, and we could not clearly observe the microtubule network within the cells.
(c) acute nocodazole treatment at abscission to determine the effect on protein localization.
Regarding the method of using nocodazole to study microtubule requirements at the abscission stage, we believe that nocodazole treatment may lead to cytokinesis failure. Cell division failure results in the formation of binucleated cells, which are unable to establish cell polarity. Furthermore, nocodazole treatment cannot distinguish between centrosomal and non-centrosomal microtubules, making it unsuitable for studying the specific role of centrosomal microtubules in this process.
In our new data (Figure 4-figure supplementary 3) presented in the revised manuscript, we employed a recently reported method by co-expressing of the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). We found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to facilitate efficient apical transport.
(2) Similar to the expanded analysis of the role of microtubules in this system, it would be excellent if the author could expand on the role of Par3 and the centrosome, although this reviewer recognizes that the authors have already done substantial work. For example, what are the consequences of Gp135 and/or Rab11 getting stuck at the centrosome? Do the authors have any later images to determine when and if these components ever leave the centrosome? Existing literature focuses on the more downstream consequence of Par3 removal on single-lumen formation.
Similarly, could the authors expand on the description of polarity disruption following centrinone treatment? It is clear that Gp135 recruitment is disrupted, but how and when do things get fixed and what else is disrupted at the very earliest stages of AMIS formation? The authors have an excellent opportunity to really expand on what is known about the requirements for these conserved components.
Regarding the use of centrinone in treatment, we speculate that Gp135 can still accumulate at the AMIS over time, although the efficiency of its recruitment may be reduced.
Furthermore, under similar conditions, other apical membrane components (such as the Crumbs3 protein) may exhibit similar characteristics to Gp135 protein.
(3) Perhaps satisfying both of the above asks, could the authors do a faster time-lapse at the relevant time points, i.e. as proteins are being recruited to the AMIS (time points between 1Aiv and v)? This type of imaging again might help shed light on the mechanism.
We believe the above questions are very important and may require further experimental verification in the future.
Minor:
(1) What is the green patch of Gp135 in Figure 2A that does not colocalize with the centrosome? Is this another source of Gp135 that is being delivered to the AMIS? This type of patch is also visible in Figure 3A 15 and 30-minute panels.
During mitosis, membrane-composed organelles such as the Golgi apparatus are typically dispersed throughout the cytoplasm. However, during the pre-abscission stage, these organelles begin to reassemble and cluster around the centrosome. Furthermore, they also accumulate in the region between the nucleus and the cytokinetic bridge, corresponding to the “patch” mentioned in Figure 2A.
Live cell imaging results showed that this Gp135 patch initially appears in a region not associated with the centrosome. Subsequently, they were either directly transported to the AMIS or fused with the centrosome-associated Gp135 and transported together. Notably, this patch was only observed when Gp135 was overexpressed in cells. No such distinct protein patches were observed when staining endogenous Gp135 protein (Figure 1A), suggesting that overexpression of Gp135 protein may lead to a localized increase in its concentration in that region.
(2) I am confused by the "polarity index" quantification as this appears to just be a nucleus centrosome distance measurement and wouldn't, for example, distinguish if the centrosomes separated from the nucleus but were on the basal side of the cell.
The position of the centrosome within the cell (i.e., its distance from the nucleus) can indeed serve as an indicator of cell polarity (Burute et al., 2017). We acknowledge that this quantitative method does not directly capture the specific direction in which the centrosome deviates from the cell center. To address this limitation, we have incorporated information about the angle between the nucleus and the centrosome, which allows for a more accurate description of changes in cell polarity (Rodriguez-Fraticelli, Auzan, Alonso, Bornens, & Martin-Belmonte, 2012).
(3) How is GP135 "at AMIS" measured? Is an arbitrary line drawn? This is important later when comparing to centrinone treatment in Figure 3D where the quantification does not seem to accurately capture the enrichment of Gp135 that is seen in the images.
To measure the expression level of Gp135 in the "AMIS" region of the cell, we first connected the centers of the two cell nuclei in three-dimensional space to form a straight line. Then, we used the Gp135 expression intensity at the midpoint of this line as the representative value for the AMIS region. This method is based on the assumption that the AMIS region is most likely located between the centers of the two cell nuclei. Therefore, this quantitative method provides a standardized assessment tool for comparing Gp135 expression levels under different conditions.
(4) The authors reference cell height (p.7) but no data for this measurement are shown
Thank you for the comment. Although we did not perform quantitative measurements, the differences in cell height are clearly visible in Figure 3E (p53-KO + CN), which visually illustrates this phenomenon.
(5) Can the authors comment on the seeming reduction of Par3 in p53 KO cells?
We did not observe a reduction of Par3 in p53-KO cells in our experiments.
(6) Can the authors make sense of the E-cad localization: Figure 5, Supplement 2.
Our study revealed that E-cadherin begins to accumulate at the cell-cell contact sites during the pre-abscission stage. Its appearance is similar to that of ZO-1, which also appears near the cell division site during this phase. Therefore, the behavior of E-cadherin contrasts sharply with that of Gp135, further highlighting the unique trafficking mechanisms of apical membrane proteins during this process.
(7) I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?
We appreciate the reviewer’s valuable comments and thank you for the opportunity to clarify this point. The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome. Rather, our findings suggest that even in the absence of ECM, the centrosomes can migrate to a polarized position similar to that in Matrigel culture. This suggests that centrosome migration and the orientation of the nucleus–centrosome axis may be independent of ECM signaling and are primarily driven by cytokinesis alone.
Regarding the localization of Gp135, previous studies have shown that ECM signaling through integrin promotes endocytosis, which is crucial for the internalization of Gp135 from the cell membrane and its subsequent transport to the AMIS (Buckley & St Johnston, 2022). Our study found that, prior to its accumulation at the AMIS, Gp135 transiently localizes around the centrosome. In the absence of ECM, due to reduced endocytosis, Gp135 primarily remains on the cell membrane and does not undergo intracellular trafficking.
(8) The authors end the Discussion stating that these studies may have implication for in vivo settings, yet do not discuss the striking similarities to the C. elegans and Drosophila intestine or the findings from any other more observational studies of tubular epithelial systems in vivo (e.g. mouse kidney polarization, zebrafish neuroepithelium, etc.). These models should be discussed.
Thank you for your valuable comment. Indeed, all types of epithelial tissues or tubular epithelial systems in vivo share some common features during cell division, which have been well-documented across various species.
These features include: during interphase, the centrosome is located at the apical surface of the cells; after the cell enters mitosis, the centrosome moves to the lateral side of the cell to regulate spindle orientation; and during cytokinesis, the cleavage furrow ingresses asymmetrically from the basal to the apical side, with the cytokinetic bridge positioned at the apical surface. Our study using MDCK 3D culture and transwell culture systems successfully mimicked these key features, demonstrating that these in vitro models are of significant value for studying cell polarization dynamics.
Based on our observations, we speculate that the centrosome may return to the apical surface after anaphase, just before bridge abscission. This is consistent with our findings from studies using MDCK 3D cultures and transwell systems, which showed that the centrosome relocates prior to the final stages of cytokinesis.
Additionally, we propose that de novo polarization of the kidney tubule in vivo may not solely depend on the aggregation and mesenchymal-epithelial transition (MET) of the metanephric mesenchyme. It may also be related to the cell division process, which triggers centrosome migration and polarized vesicle trafficking. These processes likely contribute to enhancing cell polarization, as we observed in our in vitro models.
We hope this will further clarity the potential implications of our findings for in vivo model studies, as well as and their broader impact on the field of tubular epithelial cell polarization research.
(9) There are several grammatical issues/typos throughout the paper. A careful readthrough is required. For example:
this sentence makes no sense "that the centrosome acts as a hub of apical recycling endosomes and centrosome migration during cytokinetic pre-abscission before apical membrane components are targeted to the AMIS"
We carefully reviewed the paper and made necessary revisions to address the issues raised. In particular, we revised certain sentences to improve clarity and readability (Page 5, Paragraph 3).
(10) P.8: have been previously reported [to be] involved in MDCK...
We appreciate the reviewer's valuable suggestions. We have revised the sentence accordingly (Page 9, Paragraph 2).
(11) This sentence seems misplaced: "Cultured conditions influence cellular polarization preferences."
The sentence itself is fine, but to improve the coherence and clarity of the paragraph, we adjusted the paragraph structure and added some transitional phrases (Page 13, Paragraph 1).
(12) "Play a downstream role in Par3 recruitment" doesn't make sense, this should just be downstream of Par3 recruitment.
Thank you for your suggestion. We have revised the wording accordingly, changing it to "downstream of Par3 recruitment" (Page 10, Paragraph 2).
Reference
Buckley, C. E., & St Johnston, D. (2022). Apical-basal polarity and the control of epithelial form and function. Nat Rev Mol Cell Biol, 23(8), 559-577. doi:10.1038/s41580-022-00465-y
Burute, M., Prioux, M., Blin, G., Truchet, S., Letort, G., Tseng, Q., . . . Thery, M. (2017). Polarity Reversal by Centrosome Repositioning Primes Cell Scattering during Epithelial-to-Mesenchymal Transition. Dev Cell, 40(2), 168-184. doi:10.1016/j.devcel.2016.12.004
Comartin, D., Gupta, G. D., Fussner, E., Coyaud, E., Hasegan, M., Archinti, M., . . . Pelletier, L. (2013). CEP120 and SPICE1 cooperate with CPAP in centriole elongation. Curr Biol, 23(14), 13601366.
doi:10.1016/j.cub.2013.06.002
Feldman, J. L., & Priess, J. R. (2012). A role for the centrosome and PAR-3 in the hand-off of MTOC function during epithelial polarization. Curr Biol, 22(7), 575-582. doi:10.1016/j.cub.2012.02.044
Fong, K. W., Choi, Y. K., Rattner, J. B., & Qi, R. Z. (2008). CDK5RAP2 is a pericentriolar protein that functions in centrosomal attachment of the gamma-tubulin ring complex. Mol Biol Cell, 19(1), 115-125. doi:10.1091/mbc.e07-04-0371
Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942
Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002
Jonsdottir, A. B., Dirks, R. W., Vrolijk, J., Ogmundsdottir, H. M., Tanke, H. J., Eyfjord, J. E., & Szuhai, K. (2010). Centriole movements in mammalian epithelial cells during cytokinesis. BMC Cell Biol, 11, 34. doi:10.1186/1471-2121-11-34
Krishnan, N., Swoger, M., Rathbun, L. I., Fioramonti, P. J., Freshour, J., Bates, M., . . . Hehnly, H. (2022). Rab11 endosomes and Pericentrin coordinate centrosome movement during preabscission in vivo. Life Sci Alliance, 5(7). doi:10.26508/lsa.202201362
Liang, X., Weberling, A., Hii, C. Y., Zernicka-Goetz, M., & Buckley, C. E. (2022). E-cadherin mediates apical membrane initiation site localisation during de novo polarisation of epithelial cavities. EMBO J, 41(24), e111021. doi:10.15252/embj.2022111021
Lin, Y. N., Wu, C. T., Lin, Y. C., Hsu, W. B., Tang, C. J., Chang, C. W., & Tang, T. K. (2013). CEP120 interacts with CPAP and positively regulates centriole elongation. J Cell Biol, 202(2), 211219. doi:10.1083/jcb.201212060
Mangan, A. J., Sietsema, D. V., Li, D., Moore, J. K., Citi, S., & Prekeris, R. (2016). Cingulin and actin mediate midbody-dependent apical lumen formation during polarization of epithelial cells. Nat Commun, 7, 12426. doi:10.1038/ncomms12426
Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864
Mazo, G., Soplop, N., Wang, W. J., Uryu, K., & Tsou, M. F. (2016). Spatial Control of Primary Ciliogenesis by Subdistal Appendages Alters Sensation-Associated Properties of Cilia. Dev Cell, 39(4), 424-437. doi:10.1016/j.devcel.2016.10.006
Piel, M., Nordberg, J., Euteneuer, U., & Bornens, M. (2001). Centrosome-dependent exit of cytokinesis in animal cells. Science, 291(5508), 1550-1553. doi:10.1126/science.1057330
Rodriguez-Fraticelli, A. E., Auzan, M., Alonso, M. A., Bornens, M., & Martin-Belmonte, F. (2012). Cell confinement controls centrosome positioning and lumen initiation during epithelial morphogenesis. J Cell Biol, 198(6), 1011-1023. doi:10.1083/jcb.201203075
Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065
Tanos, B. E., Yang, H. J., Soni, R., Wang, W. J., Macaluso, F. P., Asara, J. M., & Tsou, M. F. (2013). Centriole distal appendages promote membrane docking, leading to cilia initiation. Genes Dev, 27(2), 163-168. doi:10.1101/gad.207043.112
Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071
Tsai, J. J., Hsu, W. B., Liu, J. H., Chang, C. W., & Tang, T. K. (2019). CEP120 interacts with C2CD3 and Talpid3 and is required for centriole appendage assembly and ciliogenesis. Sci Rep, 9(1), 6037. doi:10.1038/s41598-019-42577-0
Tuncay, H., Brinkmann, B. F., Steinbacher, T., Schurmann, A., Gerke, V., Iden, S., & Ebnet, K. (2015). JAM-A regulates cortical dynein localization through Cdc42 to control planar spindle orientation during mitosis. Nat Commun, 6, 8128. doi:10.1038/ncomms9128
Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020
Zimmerman, W. C., Sillibourne, J., Rosa, J., & Doxsey, S. J. (2004). Mitosis-specific anchoring of gamma tubulin complexes by pericentrin controls spindle organization and mitotic entry. Mol Biol Cell, 15(8), 3642-3657. doi:10.1091/mbc.e03-11-0796.
Reviewer #3 (Public review):
Summary:
In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.
The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.
Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.
The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.
Strengths:
The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.
This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).
Weaknesses:
The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.
Author response:
The following is the authors’ response to the original reviews
Public Reviews:
Reviewer #1 (Public review):
Summary:
In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.
Strengths:
This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.<br />
Weaknesses:
This manuscript is well-written, organized, and informative. However, there are some points that need to be clarified.
(1) After MCNP-dye injection, does it remain in the blood vessels, adsorb onto the cell surface, or permeate into the cells? Does the MCNP-dye have cell selectivity?
The experimental results showed that after injection, the MCNP series nanoparticles predominantly remained within the lumens of blood vessels and bile ducts, with their tissue distribution determined by physical perfusion. No diffusion of the dye signal into the surrounding parenchymal tissue was observed, nor was there any evidence of adsorption onto the cell surface or entry into cells. The newly added Supplementary Figure S2A–H further confirmed this feature, demonstrating that the dye signals were strictly confined to the luminal space, clearly delineating the continuous course of blood vessels and the branching morphology of bile ducts. These findings strongly support the conclusion that “MCNP dyes are distributed exclusively within the luminal compartments.”
Therefore, the MCNP dyes primarily serve as intraluminal tracers within the tissue rather than as labels for specific cell types.
(2) All MCNP-dyes were injected after the mice were sacrificed, and the mice's livers were fixed with PFA. After the blood flow had ceased, how did the authors ensure that the MCNP-dyes were fully and uniformly perfused into the microcirculation of the liver?
Thank you for the reviewer’s valuable comments. Indeed, since all MCNP dyes were perfused after the mice were euthanized and blood circulation had ceased, we cannot fully ensure a homogeneous distribution of the dye within the hepatic microcirculation. The vascular labeling technique based on metallic nanoparticle dyes used in this study offers clear imaging, stable fluorescence intensity, and multiplexing advantages; however, it also has certain limitations. The main issue is that the dye distribution within the hepatic parenchyma can be affected by factors such as lobular overlap, local tissue compression, and variations in vascular pathways, resulting in regional inhomogeneity of dye perfusion. This is particularly evident in areas where multiple lobes converge or where anatomical structures are complex, leading to local dye accumulation or over-perfusion.
In our experiments, we attempted to minimize local blockage or over-perfusion by performing PBS pre-flushing and low-pressure, constant-speed perfusion. Nevertheless, localized dye accumulation or uneven distribution may still occur in lobe junctions or structurally complex regions. Such variation represents one of the methodological limitations. Overall, the dye signals in most samples remained confined to the vascular and biliary lumens, and the distribution pattern was highly reproducible.
We have addressed this issue in the Discussion section but would like to emphasize here that, although this system has clear advantages, it remains sensitive to anatomical variability in the liver—such as lobular overlap and vascular heterogeneity. At vascular junctions, local perfusion inhomogeneity or dye accumulation may occur; therefore, injection strategies and perfusion parameters should be adjusted according to liver size and vascular condition to improve reproducibility and imaging quality. It should also be noted that the results obtained using this method primarily aim to visualize the overall and fine anatomical structures of the hepatic vascular system rather than to quantitatively reflect hemodynamic processes. In the future, we plan to combine in vivo perfusion or dynamic fluid modeling to further validate the diffusion characteristics of the dyes within the hepatic microcirculation.
(3) It is advisable to present additional 3D perspective views in the article, as the current images exhibit very weak 3D effects. Furthermore, it would be better to supplement with some videos to demonstrate the 3D effects of the stained blood vessels.
Thank you for the reviewer’s valuable comments. In response to the suggestion, we have added perspective-rendered images generated from the 3D staining datasets to provide a more intuitive visualization of the spatial morphology of the hepatic vasculature. These images have been included in Figure S2A–J. In addition, we have prepared supplementary videos (available upon request) that dynamically display the three-dimensional distribution of the stained vessels, further enhancing the spatial perception and visualization of the results.
(4) In Figure 1-I, the authors used MCNP-Black to stain the central veins; however, in addition to black, there are also yellow and red stains in the image. The authors need to explain what these stains are in the legend.
Thank you for the reviewer’s constructive comment. In Figure 1I, MCNP-Black labels the central vein (black), MCNP-Yellow labels the portal vein (yellow), MCNP-Pink labels the hepatic artery (pink), and MCNP-Green labels the bile duct (green). We have revised the Figure 1 legend to include detailed descriptions of the color signals and their corresponding structures to avoid any potential confusion.
(5) There is a typo in the title of Figure 4F; it should be "stem cell".
Thank you for the reviewer’s careful correction. We have corrected the spelling error in the title of Figure 4F to “stem cell” and updated it in the revised manuscript.
(6) Nuclear staining is necessary in immunofluorescence staining, especially for Figure 5e. This will help readers distinguish whether the green color in the image corresponds to cells or dye deposits.
We thank the reviewer for the valuable suggestion. We understand that nuclear staining can help determine the origin of fluorescence signals. However, in our three-dimensional imaging system, the deep signal acquisition range after tissue clearing often causes nuclear dyes such as DAPI to generate highly dense and widespread fluorescence, especially in regions rich in vascular structures, which can obscure the fine vascular and perivascular details of interest. Therefore, this study primarily focuses on high-resolution visualization of the spatial architecture of the vascular and biliary systems. We have added an explanation regarding this point in Figures S2I–J.
Reviewer #2 (Public review):
Summary:
The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injecting metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, inferior vena cava, and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers, and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as a scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.
Strengths:
The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.
Weaknesses:
This reviewer has several concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results. In this reviewer's opinion, the introduction contains overstatements regarding the potential of the method, there are severe caveats in the method descriptions, and several parts of the Results are not fully supported by the documentation. Thus, the conclusions of the paper may be critically viewed in their present form and may need reconsideration by the authors.
We sincerely thank the reviewer for the thorough evaluation and constructive comments on our study. We fully understand and appreciate the reviewer’s concerns regarding the methodological validity and interpretation of the results. In response, we have made comprehensive revisions and additions to the manuscript as follows:
First, we have carefully revised the Introduction and Discussion sections to provide a more balanced description of the methodological potential, removing statements that might be considered overstated, and clarifying the applicable scope and limitations of our approach (see the revised Introduction and Discussion).
Second, we have substantially expanded the Methods section with detailed information on model construction, imaging parameters, data processing workflow, and technical aspects of the single-cell transcriptomic reanalysis, to enhance the transparency and reproducibility of the study.
Third, we have added additional references and explanatory notes in the Results section to better support the main conclusions (see Section 6 of the Results).
Finally, we have rechecked and validated all experimental data, and conducted a verification analysis using an independent single-cell RNA-seq dataset (Figure S6). The results confirm that the morphological observations and transcriptomic findings are consistent and reproducible across independent experiments.
We believe these revisions have greatly strengthened the reliability of our conclusions and the overall scientific rigor of the manuscript. Once again, we sincerely appreciate the reviewer’s valuable comments, which have been very helpful in improving the logic and clarity of our work.
Reviewer #3 (Public review):
Summary:
In the reviewed manuscript, researchers aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the portal vein axis. The PLC originates from the portal vein and is characterized by a unique population of CD34⁺Sca-1⁺ dual-positive endothelial cells. Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting the mRNA presence of genes linked to neurodevelopment, biliary function, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.
The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver at the same time. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists; however, some claims need more thorough assessment by functional experimental approaches to decipher the functional molecules and the sequence of events before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems. Similarly, the level of detail of the methods section does not appear to be sufficient to exactly recapitulate the performed experiments, which is of concern, given that the new technique is a cornerstone of the manuscript.
Nevertheless, the work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of readers.
Strengths:
The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.
This work proposes a new biological framework between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).
Weaknesses:
Possible overinterpretation of the CD34+Sca1+ findings was built on re-analysis of one scRNAseq dataset.
Lack of detail in the materials and methods section greatly limits the usefulness of the new technique to other researchers.
We thank the reviewer for this important comment. We agree that when conclusions are mainly based on a single dataset, overinterpretation should be avoided. In response to this concern, we have carefully re-evaluated and clearly limited the scope of our interpretation of the scRNA-seq analysis. In addition, we performed a validation analysis using an independent single-cell RNA-seq dataset (see new Figure S6), which consistently confirmed the presence and characteristic transcriptional profile of the periportal CD34⁺Sca1⁺ endothelial cell population. These supplementary analyses strengthen the robustness of our findings and address the reviewer’s concern regarding potential overinterpretation.
In the revised manuscript, we have also greatly expanded the Materials and Methods section by providing detailed information on sample preparation, imaging parameters, data processing workflow, and single-cell reanalysis procedures. These revisions substantially improve the transparency and reproducibility of our methodology, thereby enhancing the usability and reference value of this technique for other researchers.
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
Introduction
(1) In general, the Introduction is very lengthy and repetitive. It needs extensive shortening to a maximum of 2 A4 pages.
We thank the reviewer for the valuable suggestions. We have thoroughly condensed and restructured the Introduction, removing redundant content and merging related paragraphs to make the theme more focused and the logic clearer. The revised Introduction has been shortened to within two A4 pages, emphasizing the scientific question, innovation, and technical approach of the study.
(2) Please correct this erroneous sentence:
'...the liver has evolved the most complex and densely n organized vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].'
We thank the reviewer for pointing out this spelling error. The revised sentence is as follows:
“…the liver has evolved the most complex and densely organized ductal-vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].”
(3) '...we achieved a 63.89% improvement in clearing efficiency and a 20.12% increase in tissue transparency'
Please clarify what you exactly mean by 'clearing efficiency' and 'increased tissue transparency'.
We thank the reviewer for the valuable comments and have clarified the relevant terminology in the revised manuscript.
“Clearing efficiency” refers to the improvement in the time required for the liver tissue to become completely transparent when treated with the optimized Liver-CUBIC protocol (40% urea + H₂O₂), compared with the conventional CUBIC method. In this study, the clearing time was reduced from 9 days to 3.25 days, representing a 63.89% increase in time efficiency.
“Tissue transparency” refers to the ability of the cleared tissue to transmit visible light. We quantified the optical transparency by measuring light transmittance across the 400–900 nm wavelength range using a microplate reader. The results showed that the average transmittance increased by 20.12%, indicating that Liver-CUBIC treatment markedly enhanced the optical clarity of the liver tissue.
(4) I am concerned about claiming this imaging method as real '3D imaging'. Namely, while the authors clear full lobes, they actually cut the cleared lobes into 200-micrometer-thick slices and perform further microscopy imaging on these slices. Considering that they focus on ductular structures of the liver (such as vasculature, bile duct system, and innervations), 200 micrometer allows a very limited 3D overview, particularly in comparison with the whole-mount immuno-imaging methods combined with light sheet microscopy (such as Adori 2021, Liu 2021, etc). In this context, I feel several parts of the Introduction to be an overstatement: besides of emphasizing the advantages of the technique (such as simultaneous visualization of different hepatic vascular compartments and the bile duct system by MCNPs, the combination with immunostainings), the authors must honestly discuss the limitations (such as limited tissue overview, potential dye perfusion problems - uneven distribution of the dye etc).
We appreciate the reviewer’s insightful comments. It is true that most of the imaging depth in this study was limited to approximately 200 μm, and thus it could not achieve whole-liver three-dimensional imaging comparable to light-sheet microscopy. However, the primary focus of our study was to resolve the microscopic intrahepatic architecture, particularly the spatial relationships among blood vessels, bile ducts, and nerve fibers. Through high-resolution imaging of thick tissue sections, combined with MCNP-based multichannel labeling and immunofluorescence co-staining, we were able to accurately delineate the three-dimensional distribution of these microstructures within localized regions.
In addition to thick-section imaging, we also obtained whole-lobe dye perfusion data (as shown in Figure S1F), which comprehensively depict the three-dimensional branching patterns and distribution of the vascular systems within the liver lobe. These images were acquired from intact liver lobes perfused with MCNP dyes, revealing a continuous vascular network extending from major trunks to peripheral branches, thereby demonstrating that our approach is also capable of achieving organ-level visualization.
We have added this image and a corresponding description in the revised manuscript to more comprehensively present the coverage of our imaging system, and we have incorporated this clarification into the Discussion section.
Method
(5) More information may be needed about MCNPs:
a) As reported, there are nanoparticles with different colors in brightfield microscopy, but the particles are also excitable in fluorescence microscopy. Would you please provide a summary about excitation/emission wavelengths of the different MCNPs? This is crucial to understand to what extent the method is compatible with fluorescence immunohistochemistry.
We thank the reviewer for the careful attention and professional suggestion. We fully agree that this issue is critical for evaluating the compatibility of our method with fluorescent immunohistochemistry. Different types of metal compound nanoparticles (MCNPs) have clearly distinguishable spectral properties:
- MCNP-Green and MCNP-Yellow: AF488-matched spectra, with excitation/emission wavelengths of 495/519 nm.
- MCNP-Pink: Designed for far-red spectra, with excitation/emission wavelengths of 561/640 nm.
- MCNP-Black: Non-fluorescent, appearing black under bright-field microscopy only.
The above information has been added to the Materials and Methods section.
b) Also, is there more systematic information available concerning the advantage of these particles compared to 'traditional' fluorescence dyes, such as Alexa fluor or Cy-dyes, in fluorescence microscopy and concerning their compatibility with various tissue clearing methods (e.g., with the frequently used organic-solvent-based methods)?
We thank the reviewer for the detailed question. Compared with conventional organic fluorescent dyes, MCNP offers the following advantages:
- Enhanced photostability: Its inorganic core-shell structure resists fading even after hydrogen peroxide bleaching.
- High signal stability: Fluorescence is maintained during aqueous-based clearing (e.g., CUBIC) and multiple rounds of staining without quenching.
We appreciate the reviewer’s suggestion. In our Liver-CUBIC system, MCNP nanoparticles exhibited excellent multi-channel labeling stability and fluorescence signal retention. Regarding compatibility with other clearing methods (e.g., SCAFE, SeeDB, CUBIC), since these methods have limited effectiveness for whole-liver clearing (see Figure 2 of Tainaka, et al. 2014) and cannot meet the requirements for high-resolution microstructural imaging in this study, we consider further testing of their compatibility unnecessary.
In summary, MCNP dye demonstrates superior signal stability and spectral separation compared with conventional organic fluorescent dyes in multi-channel, long-term, high-transparency three-dimensional tissue imaging.
c) When you perfuse these particles, to which structures do they bind inside the ducts (vessels, bile ducts)? Is the 48h post-fixation enough to keep them inside the tubes/bind them to the vessel walls? Is there any 'wash-out' during the complex cutting/staining procedure? E.g., in Figure 2D: the 'classical' hepatic artery in the portal triad is not visible - but the MCNP apparently penetrated to the adjacent sinusoids at the edge of the lobulus. Also, in Figure 3B, there is a significant mismatch between the MNCP-green (bile duct) signal and the CD19 (epithelium marker) immunostaining. Please discuss these.
The experimental results showed that following injection, MCNP nanoparticles primarily remained within the vascular and biliary lumens, and their tissue distribution depended on physical perfusion. No dye signal was observed to diffuse into the surrounding parenchyma, nor did the particles adhere to cell surfaces or enter cells. The newly added Supplementary Figures S2A–H further confirm this feature: the dye signal is strictly confined within the lumens, clearly delineating continuous vascular paths and biliary branching patterns, strongly supporting the conclusion that “MCNP dye is distributed only within luminal spaces.”
Thus, MCNP dye mainly serves as an intraluminal tracer rather than a label for specific cell types.
We provide the following explanations and analyses regarding MCNP distribution in the hepatic vascular and biliary systems and its post-fixation stability:
- Potential signal displacement during sectioning/immunostaining: During slicing and immunostaining, a small number of particles may be washed away due to mechanical cutting or washing steps; however, the overall three-dimensional structure retains high spatial fidelity.
- Observation in Figure 2D: MCNP was seen entering the sinusoidal spaces at the lobule periphery, but hepatic arteries were not visible, likely due to limitations in section thickness. Although arteries were not apparent in this slice, arterial distribution around the portal vein is visible in Figure 2C. It should be noted that Figures 2C, D, and E do not represent whole-liver imaging, so not all regions necessarily contain visible hepatic arteries. For easier identification, the main hepatic artery trunk is highlighted in cyan in Figure 2E.
- Incomplete biliary signal in Figure 3B: This may be because CK19 labeling only covers biliary epithelial cells, whereas MCNP-green distributes throughout the biliary lumen. In Figure 3B, the terminal MCNP-green signal exhibits irregular polygonal structures, which we interpret as the canalicular regions.
(6) Which fixative was used for 48h of postfixation (step 6) after MCNP injections?
After MCNP injection, mouse livers were post-fixed in 4% paraformaldehyde (PFA) for 48 hours. This fixation condition effectively “locks” the MCNP particles within the vascular and biliary lumens, maintaining their spatial positions, while also being compatible with subsequent sectioning and multi-channel immunostaining analyses.
The above information has been added to the Materials and Methods section
(7) What is the 'desired thickness' in step 7? In the case of immunostained tissue, a 200-micrometer slice thickness is mentioned. However, based on the Methods, it is not completely clear what the actual thickness of the tissue was that was examined ultimately in the microscopes, and whether or not the clearing preceded the cutting or vice versa.
We appreciate the reviewer’s question. The “desired thickness” referred to in step 7 of the manuscript corresponds to the thickness of tissue sections used for immunostaining and high-resolution microscopic imaging, which is typically around 200 µm. We selected 200 µm because this thickness is sufficient to observe the PLC structure in its entirety, allows efficient staining, and preserves tissue architecture well. Other researchers may choose different section thicknesses according to their experimental needs.
In this study, the processing order for immunostained tissue samples was sectioning followed by clearing, as detailed below:
Section Thickness
To ensure antibody penetration and preservation of three-dimensional structure, tissue sections were typically cut to ~200 µm. Thicker sections can be used if more complete three-dimensional structures are required, but adjustments may be needed based on antibody penetration and fluorescence detection conditions.
Clearing Sequence
After sectioning, slices were processed using the Liver-CUBIC aqueous-based clearing system.
(8) More information is needed concerning the 'deep-focus microscopy' (Keyence), the applied confocal system, and the THUNDER 'high resolution imaging system': basic technical information, resolutions, objectives (N.A., working distance), lasers/illumination, filters, etc.
In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.
The above information has been added to the Materials and Methods section.
Imaging Systems and Settings
VHX-6000 Extended Depth-of-Field Microscope: Objective: VH-Z100R, 100×–1000×; resolution: 1 µm (typical); illumination: coaxial reflected; transmitted illumination on platform: ON.
Zeiss Confocal Microscope (980): Objectives: 20× or 40×; image size: 1024 × 1024. Fluorescence detection was set up in three channels:
- Channel 1: 639 nm laser, excitation 650 nm, emission 673 nm, detection range 673–758 nm, corresponding to Cy5-T1 (red).
- Channel 2: 561 nm laser, excitation 548 nm, emission 561 nm, detection range 547–637 nm, corresponding to Cy3-T2 (orange).
- Channel 3: 488 nm laser, excitation 493 nm, emission 517 nm, detection range 490–529 nm, corresponding to AF488-T3 (green).
Leica THUNDER Imager 3D Tissue: Fluorescence detection in two channels:
- Channel 1: FITC channel (excitation 488 nm, emission ~520 nm).
- Channel 2: Orange-red channel (excitation/emission 561/640 nm).<br /> Equipped with matching filter sets to ensure signal separation.
The above information has been added to the Materials and Methods section.
(9) Liver-CUBIC, step 2: which lobe(s) did you clear (...whole liver lobes...).
In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.
The above information has been added to the Materials and Methods section.
(10) For the DAB and TSA IHC stainings, did you use free-floating slices, or did you mount the vibratome sections and do the staining on mounted sections?
In this study, fixed livers were first sectioned into thick slices (~200 µm) using a vibratome. Subsequently, DAB and TSA immunohistochemical (IHC) staining were performed on free-floating sections. During the entire staining process, the slices were kept floating in the solutions, ensuring thorough antibody penetration in the thick sections while preserving the three-dimensional tissue architecture, thereby facilitating multiple rounds of staining and three-dimensional imaging.
(11) Regarding the 'transmission quantification': this was measured on 1 mm thick slices. While it is interesting to make a comparison between different clearing methods in general, one must note that it is relatively easy to clear 1mm thick tissue slices with almost any kind of clearing technique and in any tissues. The 'real' differences come with thicker blocks, such as >5mm in the thinnest dimension. Do you have such experiences (e.g., comparison in whole 'left lateral liver lobes')?
In this study, we performed three-dimensional visualization of entire liver lobes to depict the distribution of MCNPs and the overall spatial architecture of the vascular and biliary systems (Figure S1F). However, due to the limitations of the plate reader and fluorescence imaging systems in terms of spatial resolution and light penetration depth, quantitative analyses were conducted only on tissue sections approximately 1 mm thick.
Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous- or organic solvent–based clearing techniques can achieve relatively uniform transparency in 1 mm-thick tissue sections, so differences at this thickness are limited. We have not yet conducted systematic comparisons on whole-lobe sections thicker than 5 mm and therefore cannot provide “true” difference data for thicker tissues.
(12) There is no method description for the ELMI studies in the Methods.
Transmission Electron Microscopy (TEM) Analysis of MCNPs
Before imaging, the MCNP dye solution was centrifuged at 14,000 × g for 10 minutes at 4 °C to remove aggregates and impurities. The supernatant was collected, diluted 50-fold, and 3–4 μL of the sample was applied onto freshly glow-discharged Quantifoil R1.2/1.3 copper grids (Electron Microscopy Sciences, 300 mesh). The sample was allowed to sit for 30 seconds to enable particle adsorption, after which excess liquid was gently wicked away with filter paper and the grid was air-dried at room temperature. The sample was then negatively stained with 1% uranyl acetate for 30 seconds and air-dried again before imaging.
Negative-stain TEM images were acquired using a JEOL JEM-1400 transmission electron microscope operating at 120 kV and equipped with a CCD camera. Data acquisition followed standard imaging conditions.
The above information has been added to the Materials and Methods section.
(13) Please, provide a method description for the applied CCl4 cirrhosis model. This is completely missing.
(1) Under a fume hood, carbon tetrachloride (CCl₄) was dissolved in corn oil at a 1:3 volume ratio to prepare a working solution, which was filtered through a 0.2 μm filter into a 30 mL glass vial. In our laboratory, to mimic chronic injury, mice in the experimental group were intraperitoneally injected at a dose of 1 mL/kg body weight per administration.
(2) Mice were carefully removed from the cage and placed on a scale to record body weight for calculation of the injection volume.
(3) The needle cap was carefully removed, and the required volume of the pre-prepared CCl₄ solution was drawn into the syringe. The syringe was gently flicked to remove any air bubbles.
(4) Mice were placed on a textured surface (e.g., wire cage) and restrained. When the mouse was properly positioned, ideally with the head lowered about 30°, the left lower or right lower abdominal quadrant was identified.
(5) Holding the syringe at a 45° angle, with the bevel facing up, the needle was inserted approximately 4–5 mm into the abdominal wall, and the calculated volume of CCl₄ was injected.
(6) Mice were returned to their cage and observed for any signs of discomfort.
(7) Needles and syringes were disposed of in a sharps container without recapping. A new syringe or needle was used for each mouse.
(8) To establish a progressive liver fibrosis model, injections were administered twice per week (e.g., Monday and Thursday) for 3 or 6 consecutive weeks (n=3 per group). Control mice were injected with an equal volume of corn oil for 3 or 6 weeks (n=3 per group).
(9) Forty-eight hours after the last injection, mice were euthanized by cervical dislocation, and livers were rapidly harvested. Portions of the liver were processed for paraffin embedding and histological sectioning, while the remaining tissue was either immediately frozen or used for subsequent molecular biology analyses.
The above information has been added to the Materials and Methods section.
(14) Please provide a method description for the quantifications reported in Figures 5D, 5F, and 6E.
ImageJ software was used to analyze 3D stained images (Figs. 5F, 6E), and the ultra-depth-of-field 3D analysis module was used to analyze 3D DAB images (Fig. 5D). The specific steps are as follows:
Figure 5D: DAB-stained 3D images from the control group and the CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) group were analyzed. For each group, 20 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. All measurements were plotted as scatter plots to reflect the spatial extension of bile ducts relative to the portal vein under different conditions.
Figure 5F: TSA 3D multiplex-stained images from the control group, CCl<sub>4</sub> 3-week (CCl<sub>4</sub>-3W), and CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) groups were analyzed. For each group, 5 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Measurements were plotted as scatter plots to illustrate bile duct spatial extension.
Figure 6E: TSA 3D multiplex-stained images from the control, CCl<sub>4</sub>-3W, and CCl<sub>4</sub>-6W groups were analyzed. For each group, 5 terminal nerve branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Scatter plots were generated to depict the spatial distribution of nerves under different treatment conditions.
(15) Please provide a method description for the human liver samples you used in Figure S6. Patient data, fixation, etc...
The human liver tissue samples shown in Figure S6 were obtained from adjacent non-tumor liver tissues resected during surgical operations at West China Hospital, Sichuan University. All samples used were anonymized archived tissues, which were applied for scientific research in accordance with institutional ethical guidelines and did not involve any identifiable patient information. After being fixed in 10% neutral formalin for 24 hours, the tissues were routinely processed for paraffin embedding (FFPE), and sectioned into 4 μm-thick slices for immunostaining and fluorescence imaging.
Results
(16) While it is stated in the Methods that certain color MCNPs were used for labelling different structures (i.e., yellow: hepatic artery; green: bile duct; portal vein: pink; central veins: black), in some figures, apparently different color MCNPs are used for the respective structures. E.g., in Figure 1J, the artery is pink and the portal vein is green. Please clarify this.
The color assignment of MCNP dyes is not fixed across different experiments or schematic illustrations. MCNP dyes of different colors are fundamentally identical in their physical and chemical properties and do not exhibit specific binding or affinity for particular vascular structures. We select different colors based on experimental design and imaging presentation needs to facilitate distinction and visualization, thereby enhancing recognition in 3D reconstruction and image display. Therefore, the color labeling in Figure 1F is primarily intended to illustrate the distribution of different vascular systems, rather than indicating a fixed correspondence to a specific dye or injection color.
(17) In Figure 1J, the hepatic artery is extremely shrunk, while the portal vein is extremely dilated - compared to the physiological situation. Does it relate to the perfusion conditions?
We appreciate the reviewer’s attention. In fact, under normal physiological conditions, the hepatic arteries labeled by CD31 are naturally narrow. Therefore, the relatively thin hepatic arteries and thicker portal veins shown in Figure 1J are normal and unrelated to the perfusion conditions. See figure 1E of Adori et al., 2021.
(18) Re: MCNP-black labelled 'oval fenestrae': the Results state 50-100 nm, while they are apparently 5-10-micron diameter in Figure 1I. Accordingly, the comparison with the ELMI studies in the subsequent paragraph is inappropriate.
We thank the reviewer for the correction. The previous statement was a typographical error. In fact, the diameter of the “elliptical windows” marked by MCNP-black is 5–10 μm, so the diameter of 5–10 μm shown in Figure 1I is correct.
(19) Please, correct this erroneous sentence: 'Pink marked the hepatic arterial system by injection extrahepatic duct (Figure 2B).'
Original sentence: “The hepatic arterial system was labeled in pink by injection through the extrahepatic duct (Figure 2B).”
Revised sentence: “The hepatic arterial system was labeled in pink by injection through the left ventricle (Figure 2B).”
(20) How do you define the 'primary portal vein tract'?
We thank the reviewer for the question. The term “primary portal vein tract” refers to the first-order branches of the portal vein that enter the liver from the hepatic hilum. These are the major branches arising directly from the main portal vein trunk and are responsible for supplying blood to the respective hepatic lobes. This definition corresponds to the concept of the first-order portal vein in hepatic anatomy.
(21) I am concerned that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit. I also see these in 3D scans - in my opinion, these are fine, lower-order portal vein branches that connect the portal veins to the adjacent sinusoid. The strong MCNP-labelling of these structures may be caused by the 'sticking' of the perfused MCNP solutions in these 'pockets' during the perfusion process. What do these structures look like with SMA or CD31 immunostaining? Also, one may consider that the anatomical evaluation of these structures may have limitations in tissue slices. Have you ever checked MCNP-perfused, cleared full live lobes in light sheet microscope scans? I think this would be very useful to have a comprehensive morphological overview. Unfortunately, based on the presented documentation, I am also not convinced that PLCs are 'co-localize' with fine terminal bile duct branches (Figure 3E, S3C), or with TH+ 'neuronal bead chain networks' (Fig 6C). More detailed and more convincing documentation is needed here.
We thank the reviewer for the detailed comments. Regarding the existence and function of the periportal lamellar complex (PLC), our observations are based on MCNP-Pink labeling of the portal vein, through which we were able to identify the PLC structure surrounding the portal branches. It should be noted that the PLC represents a very small anatomical structure. Although we have not yet performed light-sheet microscopy scanning, we anticipate that such imaging would primarily visualize larger portal vein branches. Nevertheless, this does not affect our overall conclusions.
We also appreciate the reviewer’s suggestion that the observed structures might result from MCNP adherence during perfusion. To verify the structural characteristics of the PLC, we performed immunostaining for SMA and CD31, which revealed a specific arrangement pattern of smooth muscle and endothelial markers rather than simple perfusion-induced deposition (Figures 4F and S6B).
Regarding the apparent colocalization of the PLC with terminal bile duct branches (Figures 3E and S3C) and TH⁺ neuronal bead-like networks (Figure 6C), we acknowledge that current literature evidence remains limited. Therefore, we have carefully described these observations as possible spatial associations rather than definitive conclusions. Future studies integrating high-resolution three-dimensional imaging with functional analyses will help to further clarify the anatomical and physiological significance of the PLC.
(22) 'Extended depth-of-field three-dimensional bright-field imaging revealed a strict 1:1 anatomical association between the primary portal vein trunk (diameter 280 {plus minus} 32 μm) and the first-order bile duct (diameter 69 {plus minus} 8 μm) (Figures 3A and S3A)'.
How do you define '1:1 anatomical association'? How do you define and identify the 'order' (primary, secondary) of vessel and bile duct branches in 200-micrometer slices?
We thank the reviewer for the question. In this study, the term “1:1 anatomical correlation” refers to the stable paired spatial relationship between the main portal vein trunk and its corresponding primary bile duct within the same portal territory. In other words, each main portal vein branch is accompanied by a primary bile duct of matching branching order and trajectory, together forming a “vascular–biliary bundle.”
The definitions of “primary” and “secondary” branches were based on extended-depth 3D bright-field reconstructions, considering both branching hierarchy and vessel/duct diameters: primary branches arise directly from the main trunk at the hepatic hilum and exhibit the largest diameters (averaging 280 ± 32 μm for the portal vein and 69 ± 8 μm for the bile duct), whereas secondary branches extend from the primary branches toward the lobular interior with smaller calibers.
(23) In my opinion, the applied methodical approach in the single cell transcriptomics part (data mining in the existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations) is largely inappropriate and thus, all the statements here are purely speculative. In my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomic.
We thank the reviewer for the comment. We fully acknowledge the importance of high-resolution spatial transcriptomics in identifying the fine structural characteristics of portal vein branches. Due to current funding and technical limitations, we were unable to perform such high-resolution spatial transcriptomic analyses. However, we validated the molecular features of the PLC using another publicly available liver single-cell RNA-sequencing dataset, which provided preliminary supporting evidence (Figures S6B and S6C). In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.
(24) 'How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.'
Please consider the role of gap junctions between hepatocytes (e.g., Miyashita, 1991; Seseke, 1992).
In this study, we analyzed the spatial distribution of hepatic nerves in mice using immunofluorescence staining and found that nerve fibers were almost exclusively confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs markedly from that in humans. Previous studies have shown that, in human livers, nerves are not only located around the portal veins but also present along the central veins, interlobular septa, and within the parenchymal connective tissue (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).
Further research has provided a physiological explanation for this interspecies difference: even among species with distinct sympathetic innervation patterns in the parenchyma—i.e., with or without direct sympathetic input—the sympathetic efferent regulatory functions may remain comparable (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released from aminergic and peptidergic nerve terminals can be transmitted to hepatocytes through gap junctions as electrical signals (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017).
However, the scarcity of nerve fibers within the mouse hepatic parenchyma suggests that the mechanisms by which the autonomic nervous system regulates liver function in mice may differ from those in humans. This observation prompted us to further investigate the potential role of PLC endothelial cells in this process.
(25) Please, correct typos throughout the text.
We thank the reviewer for this comment. We have carefully proofread the entire manuscript and corrected all typographical errors and minor language issues throughout the text.
Reviewer #3 (Recommendations for the authors):
(1) A strong recommendation - the authors ought to challenge their scRNAsq- re-analysis with another scRNAseq dataset, namely a recently published atlas of adult liver endothelial, but also mesenchymal, immune, and parenchymal cell populations https://pubmed.ncbi.nlm.nih.gov/40954217/, performed with Smart-seq2 approach, which is perfectly suitable as it brings higher resolution data, and extensive cluster identity validation with stainings. Pietilä et al. indicate a clear distinction of portal vein endothelial cells into two populations that express Adgrg6, Jag1 (e2c), from Vegfc double-positive populations (e5c and e2c). Moreover, the dataset also includes the arterial endothelial cells that were shown to be part of the PLC, but were not followed up with the scRNAseq analysis. This distinction could help the authors to further validate their results, better controlling for cross-contaminations that may occur during scRNAseq preparation.
We thank the reviewer for the valuable suggestion. As noted, we have further validated the molecular characteristics of the PLC using a recently published atlas of adult liver endothelial cells (Pietilä et al., 2023, PMID: 40954217). This dataset, generated using the Smart-seq2 technique, provides high-resolution transcriptomic profiles. By analyzing this dataset, we identified a CD34⁺LY6A⁺ portal vein endothelial cell population within the e2 cluster, which is localized around the portal vein. We then examined pathways and gene expression patterns related to hematopoiesis, bile duct formation, and neural signaling within these cells. The results revealed gene enrichment patterns consistent with those observed in our primary dataset, further supporting the robustness of our analysis of the PLC’s molecular characteristics.
(2) Improving the methods section is highly recommended, this includes more detailed information for material and protocols used - catalog numbers; protocol details of the usage - rocking platforms, timing, and tubes used for incubations; GitHub or similar page with code used for the scRNA seq re-analysis.
We thank the reviewer for the valuable suggestion. We have added more detailed information regarding the materials and experimental procedures in the Methods section, including catalog numbers, incubation conditions (such as the type of shaker, incubation time, and tube specifications), and other relevant parameters.
(3) In Figure 2A, the authors claim the size of the nanoparticle is 100nm, while based on the image, the size is ~150-180nm. A more thorough quantification of the particle size would help users estimate the usability of their method for further applications.
We thank the reviewer for the comment. In the TEM image shown in Figure 2A, the nanoparticles indeed appear to be approximately 150–200 nm in size. We have re-verified the particle dimensions and will update the corresponding description in the Methods section to allow readers to more accurately assess the applicability of this approach.
(4) In Figure 3E, it is not clear what is labeled by the pink signal. Please consider labeling the structures in the figure.
We thank the reviewer for the valuable comment. The pink signal in Figure 3E was originally intended to label the hepatic artery. However, a slight spatial misalignment occurred during the labeling process, making its position appear closer to the central vein rather than the portal vein in the image. To avoid misunderstanding, we will add clear annotations to the image and clarify this deviation in the figure legend in the revised version. It should also be noted that this figure primarily aims to illustrate the spatial relationship between the bile duct and the portal vein, and this minor deviation does not affect the reliability of our experimental conclusions.
(5) The following statement is not backed by quantification as it ought to be „Dual-channel three-dimensional confocal imaging combined with CK19 immunostaining revealed that the sites of dye leakage did not coincide with the CK19-positive terminal bile duct epithelium, but instead were predominantly localized within regions adjacent to the PLC structures".
We thank the reviewer for the valuable comment. We have added the corresponding quantitative analysis to support this conclusion. Quantitative assessment of the extended-depth imaging data revealed that dye leakage predominantly occurred in regions adjacent to the PLC structure, rather than in the perivenous sinusoidal areas. The corresponding results have been presented in the revised Figure 3G.
(6) Similarly, Figure 4F is central to the Sca1CD34 cell type identification but lacks any quantification, providing it would strengthen the key statement of the article. A possible way to approach this is also by FACS sorting the double-positive cells and bluk/qRT validation.
We thank the reviewer for raising this point. We agree that quantitative validation of the Sca1⁺CD34⁺ population by FACS sorting could further support our conclusions. However, the primary focus of this study is on the spatial localization and transcriptional features of PLC endothelial cells. The identification of the Sca1⁺CD34⁺ subset is robustly supported by multiple complementary approaches, including three-dimensional imaging, co-staining with pan-endothelial markers, and projection mapping analyses. Collectively, these lines of evidence provide a solid basis for characterizing this unique endothelial population.
(7) The images in Figure S4D are not comparable, as the Sca1-stained image shows a longitudinal section of the PV, but the other stainings are cross-sections of PVs.
We thank the reviewer for the careful comment. We agree that the original Sca1-stained image, being a longitudinal section of the portal vein, was not optimal for direct comparison with other cross-sectional images. We have replaced it with a cross-sectional image of the portal vein to ensure comparability across all images. The updated image has been included in the revised Supplementary Figure S4D.
(8) I might be wrong, but Figure 4J is entirely missing, and only a cartoon is provided. Either remove the results part or provide the data.
We appreciate the reviewer’s careful observation. Figure 4J was intentionally designed as a schematic illustration to summarize the structural relationships and spatial organization of the portal vein, hepatic artery, and PLC identified in the previous panels (Figures 4A–4I). It does not represent newly acquired experimental data, but rather serves to provide a conceptual overview of the findings.
To avoid misunderstanding, we have clarified this point in the figure legend and the main text, stating that Figure 4J is a schematic summary rather than an experimental image. Therefore, we respectfully prefer to retain the schematic figure to aid readers’ interpretation of the preceding results.
(9) The methods section lacks information about the CCL4concentration, and it is thus hard to estimate the dosage of CCL4 received (ml/kg). This is important for the interpretation of the severity of the fibrosis and presence of cirrhosis, as different doses may or may not lead to cirrhosis within the short regimen performed by the authors [PMID: 16015684 DOI: 10.3748/wjg.v11.i27.4167]. Validation of the fibrosis/cirrhosis severity is, in this case, crucial for the correct interpretation of the results. If the level of cirrhosis is not confirmed, only progressive fibrosis should be mentioned in the manuscript, as these two terms cannot be used interchangeably.
Thank you for the reviewer’s comment. We indeed omitted the information on the concentration of carbon tetrachloride (CCl<sub>4</sub>) in the Methods section. In our experiments, mice received intraperitoneal injections of CCl<sub>4</sub> at a dose of 1 mL/kg body weight, twice per week, for a total of six weeks. We have revised the manuscript accordingly, using the term “progressive fibrosis” to avoid confusion between fibrosis and cirrhosis.
(10) The following statement is not backed by any correlation analysis: "Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension correlating with fibrosis severity,.. ".
We thank the reviewer for the comment. The original statement that the “PLC correlates with fibrosis severity” lacked support from quantitative analysis. To ensure a precise description, we have revised the sentence as follows: “During liver fibrosis progression, the PLC exhibits dynamic structural extension.”
(11) Similarly, the following statement is not followed by data that would address the impact of innervation on liver function: "How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.".
This section has been revised. In this study, we analyzed the spatial distribution of nerves in the mouse liver using immunofluorescence staining. The results showed that nerve fibers were almost entirely confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs significantly from that in humans. Previous studies have demonstrated that in the human liver, nerves are not only distributed around the portal vein but also present in the central vein, interlobular septa, and connective tissue of the hepatic parenchyma (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).
Previous studies have further explained the physiological basis for this difference: even among species with differences in parenchymal sympathetic innervation (i.e., species with or without direct sympathetic input), their sympathetic efferent regulatory functions may still be similar (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released by adrenergic and peptidergic nerve terminals can be transmitted to hepatocytes as electrical signals through intercellular gap junctions (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017). However, the scarcity of nerve fibers in the mouse hepatic parenchyma suggests that the mechanism by which the autonomic nervous system regulates liver function in mice may differ from that in humans. This finding also prompts us to further explore the potential role of PLC endothelial cells in this process.
(12) Could the authors discuss their interpretation of the results in light of the fact that the innervation is lower in cirrhotic patients? https://pmc.ncbi.nlm.nih.gov/articles/PMC2871629/. Also, while ADGRG6 (Gpr126) may play important roles in liver Schwann cells, it is likely not through affecting myelination of the nerves, as the liver nerves are not myelinated https://pubmed.ncbi.nlm.nih.gov/2407769/ and https://www.pnas.org/doi/10.1073/pnas.93.23.13280.
We have revised the text to state that although most hepatic nerves are unmyelinated, GPR126 (ADGRG6) may regulate hepatic nerve distribution via non-myelination-dependent mechanisms. Studies have shown that GPR126 exerts both Schwann cell–dependent and –independent functions during peripheral nerve repair, influencing axon guidance, mechanosensation, and ECM remodeling (Mogha et al., 2016; Monk et al., 2011; Paavola et al., 2014).
(13) The manuscript would benefit from text curation that would:
a) Unify the language describing the PLC, so it is clear that (if) it represents protrusions of the portal veins.
We have standardized the description of the PLC throughout the manuscript, clearly specifying its anatomical relationship with the portal vein. Wherever appropriate, we indicate that the PLC represents protrusions associated with the portal vein, avoiding ambiguous or inconsistent statements.
b) Increase the accuracy of the statements.
Examples: "bile ducts, and the central vein in adult mouse livers."
We have refined all statements for accuracy.
c) Reduce the space given to discussion and results in the introduction, moving them to the respective parts. The same applies to the results section, where discussion occurs at more places than in the Discussion part itself.
We have edited the Introduction, removing detailed results and functional explanations, and retaining only a concise overview.
Examples: "The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis, and vascular-stem cell interactions."
"This finding suggests that PLC endothelial cells not only regulate the periportal microcirculatory blood flow, but also establish a specialized microenvironment that supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair. "
"Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver cirrhosis. " This one in particular would require further validation with protein stainings and similar, directly in your model.
d) Provide a clear reference for the used scRNA seq so it's clear that the data were re-analyzed.
Example: "single-cell transcriptomic analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in cirrhotic liver, with notably high expression of Lgals1 (Galectin-1) and HGF(Figure 5G) "
When describing the transcriptional analysis of PLC endothelial cells, we explicitly cited the original scRNA-seq dataset (Su et al., 2021), clarifying that these data were reanalyzed rather than newly generated.
e) Introducing references for claims that, in places, are crucial for further interpretation of experiments.
Examples: "It not only guides bile duct branching during development but also"; the authors show no data from liver development.
Thank you for pointing this out. We have revised the relevant statement to ensure that the claim is accurate and well-supported.
f) Results sentence "Instead, bile duct epithelial cells at the terminal ducts extended partially along the canalicular network without directly participating in the formation of the bile duct lumen." Lacks a callout to the respective Figure.
We would like to thank the reviewers for pointing out this issue. In the revised manuscript, the relevant image (Figure 3D) has been clearly annotated with white arrows to indicate the phenomenon of terminal cholangiocytes extending along the bile canaliculi network. Additionally, the schematic diagram on the right side clearly shows the bile canaliculi, cholangiocytes, and bile flow direction using arrows and color coding, thus intuitively corresponding to the textual description.
(14) Formal text suggestions: The manuscript text contains a lot of missed or excessive spaces and several typos that ought to be fixed. A few examples follow:
a) "densely n organized vascular network "
b) "analysis, while offering high spatial "
c) "specific differences, In the human liver, "
d) Figure 4F has a typo in the description.
e) "generation of high signal-to-noise ratio, multi-target " SNR abbreviation was introduced earlier.
f) Canals of Hering, CoH abbreviation comes much later than the first mention of the Canals of Hering.
We thank the reviewer for the helpful comment regarding textual consistency. We have carefully reviewed and revised the entire manuscript to improve the accuracy, clarity, and consistency of the text.
Reviewer #1 (Public review):
Domínguez-Rodrigo and colleagues make a largely convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at a site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities.
The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.
The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. The results are detailed and clearly presented.
The authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. The authors cautiously emphasize the temporal and spatial correlation of 1) elephant butchery, 2) Acheulian toolkits, and 3) larger sites, and discuss how these elements may be causally related.
Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.
Reviewer #2 (Public review):
The manuscript makes a valuable contribution to the Olduvai Gorge record, offering a detailed description of the EAK faunal assemblage. In particular, the paper provides a high-resolution record of a juvenile Elephas recki carcass, associated lithic artifacts, and several green-broken bone specimens. These data are inherently valuable and will be of significant interest to researchers studying Early Pleistocene taphonomy. My concerns do not relate to the quality or importance of the data themselves, but rather to the interpretive inferences drawn from these data, particularly regarding the strength of the claim for unambiguous proboscidean butchery.
This review follows the authors' response to an earlier round of reviewer feedback and addresses points raised in that exchange. In their rebuttal, the authors state that some of my initial concerns reflect misunderstandings of their analysis, but after carefully re-reading both the manuscript and their responses, I do not believe this is the case.
In their response, the authors state that they do not treat the EAK evidence as decisive, yet the manuscript repeatedly characterizes the assemblage in very definitive terms. For example, EAK is described as "the oldest unambiguous proboscidean butchery site at Olduvai" and as "the oldest secure proboscidean butchery evidence." These phrases communicate a high level of confidence that does not align with the more qualified position articulated in the rebuttal and extends beyond what the documented evidence securely supports.
I appreciate the authors' clarification regarding the fracture features, and I agree that these are well-established outcomes of dynamic hammerstone percussion. At the same time, several of these traits have been documented in non-anthropogenic contexts, including helicoidal spiral fractures resulting from trampling and carnivore activity (Haynes 1983), adjacent or flake-like scars created by carnivore gnawing (Villa and Bartram 1996), hackled break surfaces produced by heavy passive breakage such as trampling or sediment pressure (Haynes 1983), and impact-related bone flakes observed in carnivore-modified assemblages (Coil et al. 2020). One of the biggest issues is that there is no quantitative data or images of the bone fracture features that the authors refer to as the main diagnostic criteria at EAK. The only figures that show EAK specimens (S21, S22, S23) illustrate general green-bone fracture morphology but none of the specific traits listed in the text. In contrast, clear examples of similar features come from other Olduvai assemblages, which may be misleading to readers if they mistakenly interpret those as images from EAK. The manuscript also states that these traits "co-occur," but it is not defined whether this refers to multiple features on the same fragment or within the broader assemblage. Without images or counts that document these traits on EAK fossils, readers cannot evaluate the strength of the interpretation. Including that information would substantially strengthen the manuscript.
Regarding the statement that "natural elephant long limb breaks have been documented only in pre or peri-mortem stages when an elephant breaks a leg, and only in femora (Haynes et al., 2021)," it is not entirely clear what this example is intended to illustrate in relation to the EAK assemblage. My understanding is that the authors are suggesting that naturally produced green bone fractures in elephants are very limited, perhaps occurring only in pre or peri-mortem broken leg cases, and that fractures on other elements should therefore be attributed to hominin activity. If that is not the intended argument, I would encourage clarifying this point. This appears to conflate pre-mortem injury with the broader issue of equifinality. My original comment was not referring to pre-mortem breaks but to the range of natural (i.e., non-hominin) and post-mortem processes that can generate spiral or green bone fractures similar to those described by the authors.
I fully understand the spatial analyses, and I realize that the association between bones and lithics is statistically significant. My original concern was not about whether the correlation exists, but about how that correlation is interpreted. That point still stands. Statistical co-occurrence cannot distinguish among the multiple depositional and post-depositional processes that can generate similar spatial patterns. However, I agree that the spatial correlation is intriguing, particularly when viewed alongside the possible butchery evidence. The pattern is notable and worthy of publication, even if the behavioral interpretation requires caution.
Finally, in considering the authors' response on the Nyayanga material, I still find the basis for their dismissal of that evidence difficult to follow and the contrasting treatment of the Nyayanga and EAK evidence raises concerns about interpretive consistency. Plummer et al. (2023) specify that bone surface modifications were examined using low-power magnification (10×-40×) and strong light sources to identify modifications and that they attributed agency (e.g., hominin, carnivore) to modifications only after excluding possible alternatives. The rebuttal does not engage with the procedures reported. The existence of newer analytical techniques does not diminish the validity of long-standing methods that have been applied across many studies. It is also unclear why abrasion is presented as a more likely explanation than stone tool cutmarks. The authors dismiss the Nyayanga images as "blurry," but this is irrelevant to the interpretation, since the analysis was based on the fossils, not the photographs. The Nyayanga dataset is dismissed without a thorough engagement, while the EAK material, despite similar uncertainties and potential for alternative explanations, is treated as definitive.
These concerns do not diminish the significance of the EAK assemblage, and addressing them would allow the interpretations to more fully reflect the scope of the available data.
Literature Cited:<br /> Coil, R., Yezzi-Woodley, K., & Tappen, M. (2020). Comparisons of impact flakes derived from hyena and hammerstone long bone breakage. Journal of Archaeological Science, 120, 105167.
Haynes, G. (1983). A guide for differentiating mammalian carnivore taxa responsible for gnaw damage to herbivore limb bones. Paleobiology, 9(2), 164-172.<br /> Haynes, G., Krasinski, K., & Wojtal, P. (2021). A study of fractured proboscidean bones in recent and fossil assemblages. Journal of Archaeological Method and Theory, 28(3), 956-1025.
Plummer, T. W., et al. (2023). Expanded geographic distribution and dietary strategies of the earliest Oldowan hominins and Paranthropus. Science, 379(6632), 561-566.<br /> Villa, P., & Bartram, L. (1996). Flaked bone from a hyena den. Paléo, Revue d'Archéologie Préhistorique, 8(1), 143-159.
Author response:
The following is the authors’ response to the original reviews
Public Reviews:
Reviewer #1 (Public review):
Domínguez-Rodrigo and colleagues make a moderately convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at the site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities. The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.
The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. However, the methods of the zooarchaeological and taphonomic analysis - the core of the study - are peculiarly missing. Some of these are explained along the manuscript, but not in a standard Methods paragraph with suitable references and an explicit account of how the authors recorded bone-surface modifications and the mode of bone fragmentation. This seems more of a technical omission that can be easily fixed than a true shortcoming of the study. The results are detailed and clearly presented.
By and large, the authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. Nevertheless, some ambiguity surrounds the evolutionary significance part. The authors emphasize the temporal and spatial correlation of (1) elephant butchery, (2) Acheulian toolkits, and (3) larger sites, but do not actually discuss how these elements may be causally related. Is it not possible that larger group size or the adoption of Acheulian technology have nothing to do with megafaunal exploitation? Alternative hypotheses exist, and at least, the authors should try to defend the causation, not just put forward the correlation. The only exception is briefly mentioning food surplus as a "significant advantage", but how exactly, in the absence of food-preservation technologies? Moreover, in a landscape full of aggressive scavengers, such excess carcass parts may become a death trap for hominins, not an advantage. I do think that demonstrating habitual butchery bears very significant implications for human evolution, but more effort should be invested in explaining how this might have worked.
Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.
This is a good example of the advantages of the eLife reviewing process. It has become much too common, among traditional peer-reviewing journals, to reject articles when there is no coincident agreement in the reviews, regardless of the heuristics (i.e., empirically-supported weight) of the arguments on both reviewers. Reviewers 1 and 2 provide contrasting evaluations, and the eLife dialogue between authors and reviewers enable us to address their comments differentially. Reviewer 1 (R1), whose evaluation is overall positive, remarks that the methods of the zooarchaeological and taphonomic analysis are missing. We have added them now in the revised version of our manuscript. R1 also remarks that our work highlights correlation of events, but not necessarily causation. We did not establish causation because such interpretations bear a considerable amount of speculation (and they might have fostered further criticism by R2); however, in the revised version, we expanded our discussion of these issues substantially. Establishing causation among the events described is impossible, but we certainly provide arguments to link them.
Reviewer #2 (Public review):
The authors argue that the Emiliano Aguirre Korongo (EAK) assemblage from the base of Bed II at Olduvai Gorge shows systematic exploitation of elephants by hominins about 1.78 million years ago. They describe it as the earliest clear case of proboscidean butchery at Olduvai and link it to a larger behavioral shift from the Oldowan to the Acheulean.
The paper includes detailed faunal and spatial data. The excavation and mapping methods appear to be careful, and the figures and tables effectively document the assemblage. The data presentation is strong, but the behavioral interpretation is not supported by the evidence.
The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.
The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.
The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.
The broader evolutionary conclusions are not supported by the data. The paper presents EAK as marking the start of systematic megafaunal exploitation, but the evidence does not show this. The assemblage is described well, but the behavioral and evolutionary interpretations extend far beyond what can be demonstrated.
We disagree with the arguments provided by Reviewer 2 (R2). The arguments are based on two issues: bone breakage and spatial association. We will treat both separately here.
Bone breakage
R2 argues that:
“The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.”
In our manuscript, we argued that green-breakage provides an equally good (or even better) taphonomic evidence of butchery if documented following clear taphonomic indicators. Not all green breaks are equal and not all “cut marks” are unambiguously identifiable as such. First, “natural” elephant long limb breaks have been documented only in pre/peri-mortem stages when an elephant breaks a leg. As a matter of fact, they have only been reported in publication on femora, the thinnest long bone (Haynes et al., 2021). Unfortunately, they have been studied many months after the death of the individuals, and the published diagnosis is made under the assumption that no other process intervened in the modification of those bones during this vast time span. Most of the breaks resulting from pre-mortem fractures produce long smooth, oblique/helical outlines. Occasionally, some flake scarring may occur on the cortical surface. This has been documented as uneven, small-sized, spaced, and we are not sure if it resulted from rubbing of broken fragments while the animal was alive and attempting to walk or some may have resulted from dessication of the bone after one year. When looking at them in detail, such breaks contain sometimes step-microfractures and angular (butterfly-like) outlines. Sometimes, they may be accompanied by pseudo-notches, which are distinct and not comparable to the deep notches that hammerstone breaking generates on the same types of bones. Commonly, the edges of the breaks show some polishing, probably from separate break planes rubbing against each other. It should be emphasized that the experimental work on hammerstone breaking documented by Haynes et al. (2021) is based on bone fracture properties of bones that are no longer completely green. The cracking documented in their hammerstone experimentation, with very irregular outlines differs from the cracking that we are documented in butchery of recently dead elephants.
All this contrasts with the overlapping notches and flake scars (mostly occurring on the medullary side of the bone), both of them bigger in size, with clear smooth, spiral and longitudinal trajectories, with a more intensive modification on the medullary surface, and with sharp break edges resulting from hammerstone breaking of the green bone. No “natural” break has been documented replicating the same morphologies displayed in the Supplementary File to our paper. We display specimens with inflection points, hackle marks on the breaks, overlapping scarring on the medullary surface, with several specimens displaying percussion marks and pitting (also most likely percussion marks). Most importantly, we document this patterned modification on elements other than femora, for which no example has been documented of purported morphological equifinality caused by pre-mortem “natural” breaking. In contrast, such morphologies are documented in hammerstone-broken completely green bones (work in progress). We cited the works of Haynes to support this, because they do not show otherwise. As a matter of fact, Haynes himself had the courtesy of making a thorough reading of our manuscript and did not encounter any contradiction with his work.
Spatial association
R2 argues in this regard:
“The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.”
We should emphasize that there is some confusion in the use and interpretation of clustering by R2 when applied to EAK. R2 appears to interpret clustering as the typical naked-eye perception of the spatial association of different items. In contrast, we rely on the statistical concept of clustering, more specifically on spatial interdependence or covariance, which is different. Items may appear visually clustered but still be statistically independent. This could, for example, result from two independent depositional episodes that happen to overlap spatially. In such cases, the item-to-item relationship does not necessarily show any spatial interdependence between classes other than simple clustering (i.e., spatial coincidence in intensity).
Spatial statistical interdependence, on the other hand, reflects a spatial relationship or co-dependence between different items. This goes beyond the mere fact that classes appear clustered: items between classes may show specific spatial relationships — they may avoid each other or occupy distinct positions in space (regular co-dependence), or they may interact within the same spatial area (clustering co-dependence). Our tests indicate the latter for EAK.
Such patterns are difficult to explain when depositional events are unrelated, since the probability that two independent events would generate identical spatial patterns in the same loci is very low. They are also difficult to reconcile when post-depositional processes intervene and resediment part of the assemblage (Domínguez-Rodrigo et al. 2018).
Finally, R2 concludes:
“The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.”
The Nyayanga hippo remains bearing modifications have not been well-documented cut marks. Neither R2 nor we can differentiate those marks from those inflicted by natural abrasive processes in coarse-grained sedimentary contexts, where the carcasses are found. The fact that the observable microscopic features (through low-quality photographs as appear in the original publication) differ between the cut marks documented on smaller animals and those inferred for the hippo remains makes them even more ambiguous. Nowhere in our manuscript do we treat the EAK evidence (or any other evidence) as decisive, but as the most likely given the methods used and the results reported.
References
Haynes G, Krasinski K, Wojtal P. 2021. A Study of Fractured Proboscidean Bones in Recent and Fossil Assemblages. Journal of Archaeological Method and Theory 28:956–1025.
Domínguez-Rodrigo, M., Cobo-Sánchez, L., yravedra, J., Uribelarrea, D., Arriaza, C., Organista, E., Baquedano, E. 2018. Fluvial spatial taphonomy: a new method for the study of post-depositional processes. Archaeological and Anthropological Sciences 10: 1769-1789.
Recommendations for authors:
Reviewer #1 (Recommendations for the authors):
I have several recommendations that, in my opinion, could enhance the communication of this study to the readers. The first point is the only crucial one.
(1) A detailed zooarchaeological methods section must be added, with explanations (or references to them) of precisely how the authors defined and recorded bone-surface modifications and mode of bone fragmentation.
This appears in the revised version of the manuscript in the form of a new sub-section within the Methods section.
(2) The title could be improved to better represent the contents of the paper. It contains two parts: the earliest evidence for elephant butchery (that's ok), and revealing the evolutionary impact of megafaunal exploitation. The latter point is not actually revealed in the manuscript, just alluded to here and there (see also below).
We have elaborated on this in the revised version, linking megafaunal exploitation and anatomical changes (which appear discussed in much more detail in the references indicated).
(3) The abstract does not make it clear whether the authors think that the megafaunal adaptation strongly correlates with the Acheulian technocomplex. It seems that they do, so please make this point apparent in the abstract.
From a functional point of view, we document the correlation, but do not believe in the causation, since most butchering tools around these megafaunal carcasses are typologically non Acheulian. We have indicated so in the abstract.
(4) Please define what you mean by "megafauna". How large should an animal be to be considered as megafauna in this particular context?
We have added this definition: we identify as “megafauna” those animals heavier than 800 kg.
(5) In the literature survey, consider also this Middle Pleistocene case-study of elephant butchery, including a probable bone tool: Rabinovich, R., Ackermann, O., Aladjem, E., Barkai, R., Biton, R., Milevski, I., Solodenko, N., and Marder, O., 2012. Elephants at the middle Pleistocene Acheulian open-air site of Revadim Quarry, Israel. Quaternary International, 276, pp.183-197.
Added to the revised version
(6) The paragraph in lines 123-160 is unclear. Do the authors argue that the lack of evidence for processing elephant carcasses for marrow and grease is universal? They bring forth a single example of a much later (MIS 5) site in Germany. Then, the authors state the huge importance of fats for foragers (when? Where? Surely not in all latitudes and ecosystems). This left me confused - what exactly are you trying to claim here?
We have explained this a little more in the revised text. What we pointed out was that most prehistoric (and modern) elephant butchery sites leave grease-containing long bones intact. Evidence of anthropogenic breakage of these elements is rather limited. The most probably reason is the overabundance of meat and fat from the rest of the carcass and the time-consuming effort needed to access the medullary cavity of elephant long bones.
(7) The paragraph in lines 174-187 disrupts the flow of the text, contains previously mentioned information, ends with an unclear sentence, and could be cut.
(8) Results: please provide the MNI for the EAK site (presumably 1, but this is never mentioned).
Done in the revised version.
(9) Lines 292 - 295: The authors found no traces of carnivoran activity (carnivoran remains, coprolites, or gnawing marks on the elephant bones), yet they attribute the absence of some non-dense skeletal elements to carnivore ravaging. I cannot understand this rationale, given that other density-mediated processes could have deleted the missing bones and epiphysis.
This interpretation stems from our observations of several elephant carcasses in the Okavango delta in Botswana. Those that were monitored showed deletion of remains (i.e., disappearance of certain bones, like feet) without necessarily imprinting damage on the rest of the carcass. Carnivore intervention in an elephant death site can result in deletion of a few remains without much damage (if any), or if hyena clans access the carcass, much more conspicuous damage can be documented. There is a whole range of carnivore signatures in between. We are currently working on our study of several elephant carcasses subjected to these highly variable degrees of carnivore impact.
(10) Lines 412 - 422: "The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification." - how so? It could equally suggest that both hominins and elephants were drawn to the same lush environments.
We agree. Both hominins and megafauna must have been drawn to the same ecological loci for interaction to emerge. However, the fact that the highest density clusters of artifacts coincide with the highest density of carcasses “showing evidence of having been broken”, is suggestive of hominin use and consumption.
(11) Discussion: I suggest starting the Discussion with a concise appraisal of the lines of evidence detailed in the Results and their interpretation, and only then, the critical reassessment of other studies. Similarly, a new topic starts in line 508, but without any subheading or an introductory sentence that could assist the readers.
We added the introductory lines of the former Conclusion section to the revised Discussion section, as suggested by R1.
(12) Line 607: Neumark-Nord are Late Pleistocene sites (MIS 5), not Middle Pleistocene.
Corrected.
(13) Regarding the ambiguity in how megafaunal exploitation may be causally related to the other features of the early Acheulian, the authors can develop the discussion. Alternatively, they should explicitly state that correlation is not causation, and that the present study adds the megafaunal exploitation element to be considered in future discussion of the shifts in lifestyles 1.8 million years ago.
We have done so.
Reviewer #2 (Recommendations for the authors):
The following detailed comments are provided to help clarify arguments, ensure accurate representation of cited literature, and strengthen the logical and methodological framing of the paper. Line numbers refer to the version provided for review.
(1) Line 55: Such concurrency (sometimes in conjunction with other variables)
The term "other variables" is very vague. I would suggest expanding on this or taking it out altogether.
(2) Line 146: Megafaunal long bone green breakage (linked to continuous spiral fractures on thick cortical bone) is probably a less ambiguous trace of butchery than "cut marks", since many of the latter could be equifinal and harder to identify, especially in contexts of high abrasion and trampling (Haynes et al., 2021, 2020).
This reasoning is not supported by the evidence or the cited sources. Green-bone spiral fractures only show that a bone broke while it was fresh and do not reveal who or what caused it. Carnivore feeding, trampling, and natural sediment pressure can all create the same patterns, so these fractures are not clearer evidence of butchery than cut marks. Cut marks, when they are preserved and morphologically clear, remain the most reliable indicator of human activity. The Haynes papers actually show the opposite of what is claimed here. They warn that spiral fractures and surface marks can form naturally and that fracture patterns alone cannot be used to infer butchery. This section should be revised to reflect what those studies actually demonstrate.
The reasoning referred to in line 146 is further explained below in the original text as follows:
“Despite the occurrence of green fractures on naturally-broken bones, such as those trampled by elephants (Haynes et al., 2020), those occurring through traumatic fracturing or gnawed by carnivores (Haynes and Hutson, 2020), these fail to reproduce the elongated, extensive, or helicoidal spiral fractures (uninterrupted by stepped sections), accompanied by the overlapping conchoidal scars (both cortical and medullary), the reflected scarring, the inflection points, or the impact hackled break surfaces and flakes typical of dynamic percussive breakage. Evidence of this type of green breakage had not been documented earlier for the Early Pleistocene proboscidean or hippopotamid carcasses, beyond the documentation of flaked bone with the purpose of elaboration of bone tools (Backwell and d’Errico, 2004; Pante et al., 2020; Sano et al., 2020).”
The problem in the way that R2 uses Haynes et al.´s works is that R2 uses features separately. Natural breaks occurring while the bone is green can generate spiral smooth breaks, for example, but it is not the presence of a single feature that invalidates the diagnosis of agency or that is taphonomically relevant, but the concurrence of several of them. The best example of a naturally (pre-mortem) broken bone was published by Haynes et al.
The natural break shows helical fractures, subjugated to linear (angular) fracture outlines. Notice how the crack displays a zig-zag. The break is smooth but most damage occurs on the cortical surface, with flaking adjacent to the break and step micro-fracturing on the edges. The cortical scarring is discontinuous (almost marginal) and very small, almost limited to the very edge of the break. No modification occurs on the medullary surface. No extensive conchoidal fractures are documented, and certainly none inside the medullary surface of the break.
Compare with Figure S8, S10, S17 and S34 (all specimens are shown in their medullary surface):
In these examples, we see clearly modified medullary surfaces with multiple green breaks and large-sized step fractures, accompanied in some examples by hackle marks. Some show large overlapping scars (of substantially bigger size than those documented in the natural break image). Not a single example of naturally-broken bones has been documented displaying these morphologies simultaneously. It is the comprehensive analysis of the co-occurrence of these features and not their marginal and isolated occurrence in naturally-broken bones that make a difference in the attribution of agency. Likewise, no example of naturally-broken bone has been published that could mimic any of the two green-broken bones documented at EAK. In contrast, we do have bones from our on-going experimentation with green elephant carcasses that jointly reproduce these features. See also Figure 6 of the article to find another example without any modern referent in the naturally-broken bones documented.
We should emphasize that R2 is inaccurately portraying what Haynes et al.´s results really document. Contrary to R2´s assertion, trampling does not reproduce any of the examples shown above. Neither do carnivores. It should be stressed that Haynes & Harrod only document similar overlapping scarring on the medullary surface of bones, when using much smaller animals. In all the carnivore damage repertoire that they document for elephants, durophagous spotted hyenas can only inflict furrowing on the ends of the biggest long bones, especially if they are adults. Long bone midshafts remain inaccessible to them. The mid-shaft portions of bones that we document in our Supplementary File and at EAK cannot be the result of hyena (or carnivore damage) for this reason, and also because their intense gnawing on elephant bones leaves tooth marking on most of the elements that they modify, being absent in our sample.
(3) Line 176: other than hominins accessed them in different taphonomically-defined stages- stages - the "Stages" is repeated twice
Defined in the revised version
(4) Line 174: Regardless of the type of butchery evidence - and with the taphonomic caveat that no unambiguous evidence exists to confirm that megafaunal carcasses were hunted or scavenged other than hominins accessed them in different taphonomically-defined stages- stages - the principal reasons for exploring megafaunal consumption in early human evolution is its origin, its episodic or temporally-patterned occurrence, its impact on hominin adaptation to certain landscapes, and its reflection on hominin group size and site functionality.
This sentence is confusing and needs to be rewritten for clarity. It tries to combine too many ideas at once, and the phrasing makes it hard to tell what the main point is. The taphonomic caveat in the middle interrupts the sentence and obscures the argument. It should be broken into separate, clearer statements that distinguish what evidence exists, what remains uncertain, and what the broader goals of the discussion are.
We believe the ideas are displayed clearly
(5) Line 179: landscapes, and its reflection on hominin group size and site functionality. If hominins actively sought the exploitation of megafauna, especially if targeting early stages of carcass consumption, the recovery of an apparent surplus of resources reflects a substantially different behavior from the small-group/small-site pattern documented at several earlier Oldowan anthropogenic sites (Domínguez-Rodrigo et al., 2019) -or some modern foragers, like the Hadza, who only exploit megafaunal carcasses very sporadically, mostly upon opportunistic encounters (Marlowe, 2010; O'Connell et al., 1992; Wood, 2010; Wood and Marlowe, 2013).
This sentence makes a reasonable point, but is written in a confusing way. The idea that early, deliberate access to megafauna would represent a different behavioral pattern from smaller Oldowan or modern foraging contexts is valid, but the sentence is awkward and hard to follow. It should be rephrased to make the logic clearer and more direct.
We believe the ideas are displayed clearly
(6) Line 186: When the process started of becoming megafaunal commensal started has major implications for human evolution.
This sentence is awkward and needs to be rewritten for clarity. The phrasing "when the process started of becoming megafaunal commensal started" is confusing and grammatically incorrect. It could be revised to something like "Determining when hominins first began to interact regularly with megafauna has major implications for human evolution," or another version that clearly identifies the process being discussed.
Modified in the revised version
(7) Line189: The multiple taphonomic biases intervening in the palimpsestic nature of most of these butchery sites often prevent the detection of the causal traces linking megafaunal carcasses and hominins. Functional links have commonly been assumed through the spatial concurrence of tools and carcass remains; however, this perception may be utterly unjustified as we argued above. Functional association of both archaeological elements can more securely be detected through objective spatial statistical methods. This has been argued to be foundational for heuristic interpretations of proboscidean butchery sites (Giusti, 2021). Such an approach removes ambiguity and solidifies spatial functional association, as demonstrated at sites like Marathousa 1 (Konidaris et al., 2018) or TK Sivatherium (Panera et al., 2019). This method will play a major role in the present study.
This section overstates what spatial analysis can demonstrate and misrepresents the cited studies. The works by Giusti (2021), Konidaris et al. (2018), and Panera et al. (2019) do use spatial statistics to examine relationships between artifacts and faunal remains, but they explicitly caution that spatial overlap alone does not prove functional or behavioral association. These studies argue that clustering can support such interpretations only when combined with detailed taphonomic and stratigraphic evidence. None of them claims that spatial analysis "removes ambiguity" or "solidifies" functional links. The text should be revised to reflect the more qualified conclusions of those papers and to avoid implying that spatial statistics can establish behavioral causation on their own.
We disagree. Both works (Giusti and Panera) use spatial statistical tools to create an inferential basis reinforcing a functional association of lithics and bones. In both cases, the anthropogenic agency inferred is based on that. We should stress that this only provides a basis for argumentation, not a definitive causation. Again, those analyses show much more than just apparent visual clustering.
(8) Line 200: Here, we present the discovery of a new elephant butchery site (Emiliano Aguirre Korongo, EAK), dated to 1.78 Ma, from the base of Bed II at Olduvai Gorge. It is the oldest unambiguous proboscidean butchery site at Olduvai.
It is fine to state the main finding in the introduction, but the phrasing here is too strong. Calling EAK "the oldest unambiguous proboscidean butchery site" asserts certainty before the evidence is presented. The claim should be stated more cautiously, for example, "a new site that provides early evidence for proboscidean butchery," so that the language reflects the strength of the data rather than pre-judging it.
We understand the caution by R2, but in this case, EAK is the oldest taphonomically-supported evidence of elephant butchery at Olduvai (see discussion about FLK North in the text). Whether this is declared at the beginning or the end of the text is irrelevant.
(9) Line 224: The drying that characterizes Bed II had not yet taken place during this moment.
This sentence reads like a literal translation. It should be rewritten for clarity.
Modified in the revised version
(10) Line 233: During the recent Holocene, the EAK site was affected by a small landslide which displaced the...
This section contains far more geological detail than is needed for the argument. The reader only needs to know that the site block was displaced by a small Holocene landslide but retains its stratigraphic integrity. The extended discussion of regional faults, seismicity, and slope processes goes well beyond what is necessary for context and distracts from the main focus of the paper.
We disagree. The geological information is what is most commonly missing from most archaeological reports. Here, it is relevant because of the atypical process and because it has been documented only twice with elephant butchery sites. Explaining the dynamic geological process that shaped the site helps to understand its spatial properties.
(11) Line 264: In June 2022, a partial elephant carcass was found at EAK on a fragmented stratigraphic block...
This section reads like field notes rather than a formal site description. Most of the details about the discovery sequence, trench setup, and excavation process are unnecessary for the main text. Only the basic contextual information about the find location, stratigraphic position, and anatomical composition is needed. The rest could be condensed or moved to the methods or supplementary material.
We disagree. See reply above.
(12) Line 291: hominins or other carnivores. Ongoing restoration work will provide an accurate estimate of well-preserved and modified fractions of the assemblage.
This sentence is unclear and needs to specify what kind of restoration work is being done and what is meant by well-preserved and modified fractions. It is not clear whether modified refers to surface marks, diagenetic alteration, or something else. If the bones are still being cleaned or prepared, the analysis is incomplete, and the counts cannot be considered final. If restoration only means conservation or stabilization, that should be stated clearly so the reader understands that it does not affect the results. As written, it is not clear whether the data presented here are preliminary or complete.
We added: For this reason, until restoration is concluded, we cannot produce any asssertion about the presence or absence of bone surface modifications.
(13) Line 294: The tibiae were well preserved, but the epiphyseal portions of the femora were missing, probably removed by carnivores, which would also explain why a large portion of the rib cage and almost all vertebrae are missing.
This explanation is not well supported. The missing elements could be the result of other forms of density-mediated destruction, such as sediment compaction or post-depositional fragmentation, especially since no tooth marks were found. Given the low density of ribs, vertebrae, and femoral epiphyses, these processes are more likely explanations than carnivore removal. The text should acknowledge these alternatives rather than attributing the pattern to carnivore activity without direct evidence.
Sediment compaction and post-depositional can break bones but cannot make them disappear. Our excavation process was careful enough to detect bone if present. Their absence indicates two possibilities: erosion through the years at the front of the excavation or carnivore intervention. Carnivores can take elephant bones without impacting the remaining assemblage (see our reply above to a similar comment).
(14) Line 304: The fact that the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins. A more objective way to assess this association is through spatial statistical analysis.
The authors state that "the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins." This does not logically follow. Movement of the block explains why the bones and tools remain together, not how that association was created. The preserved association alone does not demonstrate butchery, especially in the absence of cut marks or other direct evidence of hominin activity.
Again, we are sorry that R2 is completely overlooking the strong signal detected by the spatial statistical analysis. The way that the block moved, it preserved the original association of bones and tools. This statement is meant to clarify that despite the allochthonous nature of the block, the original autochthonous depositional process of both types of archaeological materials has been preserved. The spatial association, as statistically demonstrated, indicates that the functional link is more likely than any other alternative process. The additional fact that nowhere else in that portion of the outcrop do we identify scatters of tools (all appear clustered at a landscape scale with the elephant) adds more support to this interpretation. This would have been further supported by the presence of cut marks, no doubt, but their absence does not indicate lack of functional association, since as Haynes´ works have clearly shown, most bulk defleshing of modern elephant leaves no traces on most bones.
(15) Line 370: This also shows that the functional connection between the elephant bones and the tools has been maintained despite the block post-sedimentary movement.
The spatial analyses appear to have been carried out appropriately, and the interpretations of clustering and segregation are consistent with the reported results. However, the conclusion that the "functional connection" between bones and tools has been maintained goes beyond what spatial correlation alone can demonstrate. These analyses show spatial proximity and scale-dependent clustering but cannot, by themselves, confirm a behavioral or functional link.
R2 is making this comment repeatedly and we have addressed it more than once above. We disagree and we refer to our replies above to sustain it.
(16) Line 412: The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification. The presence of green broken elephant long bone elements in the area surveyed is only documented within such clusters, both for lower and upper Bed II. This constitutes inverse negative evidence for natural breaks occurring on those carcasses through natural (i.e., non-hominin) pre- and peri-mortem limb breaking (Haynes et al., 2021, 2020; Haynes and Hutson, 2020). In this latter case, it would be expected for green-broken bones to show a more random landscape distribution, and occur in similar frequencies in areas with intense hominin landscape use (as documented in high density artifact deposition) and those with marginal or non-hominin intervention (mostly devoid of anthropogenic lithic remains).
The clustering of green-bone fractures with stone tools is intriguing but should be interpreted cautiously. The Haynes references are misrepresented here. Those studies address both cut marks and green-bone (spiral) fractures, emphasizing that each can arise through non-hominin processes such as trampling, carcass collapse, and sediment loading. They do not treat green fractures as clearer evidence of butchery; in fact, they caution that such breakage patterns can occur naturally and even form clustered distributions in areas of repeated animal activity. The claim that these studies support spiral fractures as unambiguous indicators of hominin activity, or that natural breaks would be randomly distributed, is not accurate.
We would like to emphasize again that the Haynes´references are not misrepresented here. See our extensive reply above. If R2 can provide evidence of natural breakage patterns resulting from pre-mortem limb breaking or post-mortem trampling resulting in all limb bones being affected by these processes and resulting in smooth spiral breaks, accompanied with extensive and overlapping scarring on the medullary surface, in conjunction with the other features described in our replies above, then we would be willing to reconsider. With the evidence reported until now, that does not occur simultaneously on specimens resulting from studies on modern elephant bones.
R2 seems to contradict him(her)self here by saying that Haynes studies show that cut marks are not reliable because they can also be reproduced via trampling. Until this point, R2 had been saying that only cut marks could demonstrate a functional link and support butchery. Haynes´ studies do not deal experimentally with sediment loading.
(17) Line 424: This indicates that from lower Bed II (1.78 Ma) onwards, there is ample documented evidence of anthropogenic agency in the modification of proboscidean bones across the Olduvai paleolandscapes. The discovery of EAK constitutes, in this respect, the oldest evidence thereof at the gorge. The taphonomic evidence of dynamic proboscidean bone breaking across time and space supports, therefore, the inferences made by the spatial statistical analyses of bones and lithics at the site.
This conclusion is overstated. The claim of "ample documented evidence of anthropogenic agency" is too strong, given that the main support comes from indirect indicators like green-bone fractures and spatial clustering rather than clear butchery marks. It would be more accurate to say that the evidence suggests or is consistent with possible hominin involvement. The final sentence also conflates association with causation; spatial and taphonomic data can indicate a relationship, but do not confirm that the carcasses were butchered by hominins.
The evidence is based on spatially clustering (at a landscape scale) of tools and elephant (and other megafaunal taxa) bones, in conjunction with a large amount of green-broken elements. This interpretation, if we compare it against modern referents is supported even stronger. In the past few years, we have been conducting work on modern naturally dead elephant carcasses in Botswana and Zambia, and of the several carcasses that we have seen, we have not identified a single case of long bone shaft breaks like those described by Haynes as natural or like those we describe here as anthropogenic. This probably means that they are highly unlikely or marginal occurrences at a landscape scale. This seems to be supported by Haynes´ work too. Out of the hundreds of elephant carcasses that he has monitored and studied over the years for different works, we have managed to identify only two instances where he described natural pre-mortem breaks. This certainly qualifies as extremely marginal.
Most of the Results section is clearly descriptive, but beginning with "The clustering of the elephant (and hippopotamus) carcasses..." the text shifts from reporting observations to drawing behavioral conclusions. From this point on, it interprets the data as evidence of hominin activity rather than simply describing the patterns. This part would be more appropriate for the Discussion, or should be rewritten in a neutral, descriptive way if it is meant to stay in the Results.
This appears extensively discussed in the Discussion section, but the data presented in the results is also interpreted in that section, following a clear argumental chain.
(18) Line 433: A recent discovery of a couple of hippopotamus partial carcasses at the 3.0-2.6 Ma site of Nyayanga (Kenya), spatially concurrent with stone artifacts, has been argued to be causally linked by the presence of cut marks on some bones (Plummer et al., 2023). The only evidence published thereof is a series of bone surface modifications on a hippo rib and a tibial crest, which we suggest may be the result of byproduct of abiotic abrasive processes; the marks contrast noticeably with the well-defined cut marks found on smaller mammal bones (Plummer et al. ́s 2023: Figure 3C, D) associated with the hippo remains (Plummer et al., 2023).
The authors suggest that the Nyayanga marks could result from abiotic abrasion, but this claim does not engage with the detailed evidence presented by Plummer et al. (2023). Plummer and colleagues documented well-defined, morphologically consistent cut marks and considered the sedimentary context in their interpretation. Raising abrasion as a general possibility without addressing that analysis gives the impression of selective skepticism rather than an evaluation grounded in the published data.
We disagree again on this matter. R2 does not clarify what he/she means by well-defined or morphologically consistent. We provide an alternative interpretation of those marks that fit their morphology and features and that Plummer at al did not successfully exclude. We also emphasize that the interpretation of the Nyayanga marks was made descriptively, without any analytical approach and with a high degree of subjectivity by the researcher. All of this disqualifies the approach as well defined and keeps casting an old look at modern taphonomy. Descriptive taphonomy is a thing of the 1980´s. Today there are a plethora of analytical methods, from multivariate statistics, to geometric morphometrics to AI computer vision (so far the most reliable) which represent how taphonomy (and more specifically, analysis of bone surface modifications) should be conducted in the XXI century. This approaches would reinforce interpretations as preliminarily published by Plummer et al, provided they reject alternative explanations like those that we have provided.
(19) Line 459: It would have been essential to document that the FLK N6 tools associated with the elephant were either on the same depositional surface as the elephant bones and/or on the same vertical position. The ambiguity about the FLK N6 elephant renders EAK the oldest secure proboscidean butchery evidence at Olduvai, and also probably one of the oldest in the early Pleistocene elsewhere in Africa.
The concern about vertical mixing is fair, but the tone makes it sound like the association is definitely not real. It would be more accurate to say that the evidence is ambiguous, not that it should be dismissed altogether.
We have precisely done so. We do not dismiss it, but we cannot take it for anything solid since we excavated the site and show how easily one could make functional associations if forgetting about the third dimension. It is not a secure butchery site. This is what we said and we stick to this statement.
(20) Line 479: In all cases, these wet environments must have been preferred places for water-dependent megafauna, like elephants and hippos, and their overlapping ecological niches are reflected in the spatial co-occurrence of their carcasses. Both types of megafauna show traces of hominin use through either cutmarked or percussed bones, green-broken bones, or both (Supplementary Information).
The environmental part is good, but the behavioral interpretation is too strong. Saying elephants and hippos "must have been" drawn to these areas is too certain, and claiming that both "show traces of hominin use" makes it sound like every carcass was modified. It should be clearer that only some have possible evidence of this.
The sentence only refers to both types of fauna taxonomically. No inference can be drawn therefor that all carcasses are modified.
(21) Line 496: In most green-broken limb bones, we document the presence of a medullary cavity, despite the continuous presence of trabecular bone tissue on its walls.
This sentence is confusing and doesn't seem to add anything meaningful. All limb bones naturally have a medullary cavity lined with trabecular bone, so it's unclear why this is noted as significant. The authors should clarify what they mean here or remove it if it's simply describing normal bone structure.
No. Modern elephant long bones do not have a hollow medullary cavity. All the medullary volume is composed of trabecular tissue. Some elephants in the past had hollow medullary cavities, which probably contained larger amounts of marrow and fat.
(22) Line 518: We are not confident that the artefacts reported by de la Torre et al are indeed tools.
While I generally agree with this statement, the paragraph reads as defensive rather than comparative. It would help if they briefly summarized what de la Torre et al. actually argued before explaining why they disagree.
We devote two full pages of the Discussion section to do so precisely.
(23) Lines 518-574: They are similar to the green-broken specimens that we have reported here...
This part is very detailed but inconsistent. They argue that the T69 marks could come from natural processes, but they use similar evidence (green fractures, overlapping scars) to argue for human activity at EAK. If equifinality applies to one, it applies to both.
We are confused by this misinterpretation. Features like green fractures and overlapping scars (among others) can be used to detect anthropogenic agency in elephant bone breaking; that is, any given specimen can be determined to have been an “artifact” (in the sense of human-created item), but going from there to interpreting an artifact as a tool, there is a large distance. Whereas an artifact (something made by a human) can be created indirectly through several processes (for example, demarrowing a bone resulting in long bone fragments), a tool suggest either intentional manufacture and use or both. That is the difference between de la Torre et al.´s interpretation and ours. We believe that they are showing anthropogenically-made items, but they have provided no proof that they were tools.
(24) Line 576: A final argument used by the authors to justify the intentional artifactual nature of their bone implements is that the bone tools were found in situ within a single stratigraphic horizon securely dated to 1.5 million years ago, indicating systematic production rather than episodic use. This is taphonomically unjustified.
The reasoning here feels uneven in how clustering evidence is used. At EAK, clustering of bones and artifacts is taken as meaningful evidence of hominin activity, but here the same pattern at T69 is treated as a natural by-product of butchery or carnivore activity. If clustering alone cannot distinguish between intentional and incidental association, the authors should clarify why it is interpreted as diagnostic in one case but not in the other.
Again, we are confused by this misinterpretation. It applies to two different scenarios/questions:
a) is there a functional link between tools and bones at EAK and T69? We have statistically demonstrated that at EAK and we think de la Torre et al. is trying to do the same for T69, although using a different method.
b) Are the purported tools at T69 tools? Are those that we report here tools? In this regard there is no evidence for either case and given that several bones from T69 come from animals smaller than elephants, we do not discard that carnivores might have been responsible for those, whereas hominin butchery might have been responsible for the intense long limb breaking at that site. It remains to be seen how many (if any) of those specimens were tools.
(25) Line 600: If such a bone implement was a tool, it would be the oldest bone tool documented to date (>1.7 Ma).
The comparison to prior studies is useful, and the point about missing use-wear traces is well taken. However, the last lines feel speculative. If no clear use evidence has been found, it's premature to suggest that one specimen "would be the oldest bone tool." That claim should be either removed or clearly stated as hypothetical.
It clearly reads as hypothetical.
(26) Line 606: Evidence documents that the oldest systematic anthropogenic exploitation of proboscidean carcasses are documented (at several paleolandscape scales) in the Middle Pleistocene sites of Neumark-Nord (Germany)(Gaudzinski-Windheuser et al., 2023a, 2023b).
This is the first and only mention of Neumark-Nord in the paper, and it appears without any prior discussion or connection to the rest of the study. If this site is being used for comparison or as part of a broader temporal framework, it needs to be introduced and contextualized earlier. As written, it feels out of place and disconnected from the rest of the argument.
This is a Late Pleistocene site and we do not see the need to present it earlier, given that the scope of this work is Early Pleistocene.
(27) Line 608: Evidence of at least episodic access to proboscidean remains goes back in time (see review in Agam and Barkai, 2018; Ben-Dor et al., 2011; Haynes, 2022).
The distinction between "systematic" and "episodic" exploitation is useful, but the authors should clarify what criteria define each. The phrase "episodic access...goes back in time" is vague and could be replaced with a clearer statement summarizing the nature of the earlier evidence.
It is self-explanatory
(28) Line 610: Redundant megafaunal exploitation is well documented at some early Pleistocene sites from Olduvai Gorge (Domínguez-Rodrigo et al., 2014a, 2014b; Organista et al., 2019, 2017, 2016).
The phrase "redundant megafaunal exploitation" needs clarification. "Redundant" is not standard terminology in this context. Does this mean repeated, consistent, or overlapping behaviors? Also, while these same Olduvai sites are mentioned earlier, this phrasing also introduces new interpretive language not used before and implies a broader behavioral generalization than what the data actually show.
Webster: Redundant means repetitive, occurring multiple times.
(29) Line 612: At the very same sites, the stone artifactual assemblages, as well as the site dimensions, are substantially larger than those documented in the Bed I Oldowan sites (Diez-Martín et al., 2024, 2017, 2014, 2009).
The placement and logic of this comparison are unclear. The discussion moves from Middle Pleistocene Neumark-Nord to early Pleistocene Olduvai sites, then to Bed I Oldowan contexts without clearly signaling the temporal or geographic transitions. If the intent is to contrast Acheulean vs. Oldowan site scale or organization, that connection needs to be made explicit. As written, it reads as a disjointed shift rather than a continuation of the argument.
We disagree. Here, we finalize by bringing in some more recent assemblages where hominin agency is not in question.
(30) Line 616: Here, we have reported a significant change in hominin foraging behaviors during Bed I and Bed II times, roughly coinciding with the replacement of Oldowan industries by Acheulian tool kits -although during Bed II, both industries co-existed for a substantial amount of time (Domínguez-Rodrigo et al., 2023; Uribelarrea et al., 2019, 2017).
This section should be restructured for flow. The reference to behavioral change during Bed I-II and the overlap of Oldowan and Acheulean industries is important, but feels buried after a long detour. Consider moving this earlier or rephrasing so the main conclusion (behavioral change across Beds I-II) is clearly stated first, followed by supporting examples.
It is not within the scope of this work and is properly described in the references mentioned.
(31) Line 620: The evidence presented here, together with that documented by de la Torre et al. (2025), represents the most geographically extensive documentation of repeated access to proboscidean and other megafaunal remains at a single fossil locality.
The phrase "most geographically extensive documentation of repeated access" overstates what has been demonstrated. The evidence presented is site-specific and does not justify such a broad superlative. This should be toned down or supported with comparative quantitative data.
We disagree. There is no other example where such an abundant record of green-broken elements from megafauna is documented. Neumark-Nord is more similar because it shows extensive evidence of butchery, but not so much about degreasing.
(32) Line 623: The transition from Oldowan sites, where lithic and archaeofaunal assemblages are typically concentrated within 30-40 m2 clusters, to Acheulean sites that span hundreds or even over 1000 m2 (as in BK), with distinct internal spatial organization and redundancy in space use across multiple archaeological layers spanning meters of stratigraphic sequence (Domínguez-Rodrigo et al., 2014a, 2009b; Organista et al., 2017), reflects significant behavioral and technological shifts.
This sentence about site size and spatial organization repeats earlier claims without adding new insight. If it's meant as a synthesis, it should explicitly say how the spatial expansion relates to changes in behavior or mobility, not just describe the difference.
In the Conclusion section these correlations have been explained in more detail to add some causation.
(33) Line 628: This pattern likely signifies critical innovations in human evolution, coinciding with major anatomical and physiological transformations in early hominins (Dembitzer et al., 2022; Domínguez-Rodrigo et al., 2021, 2012).
The conclusion that this "signifies critical innovations in human evolution" is too sweeping, given the data presented. It introduces physiological and anatomical transformation without connecting it to any evidence in this paper. Either cite the relevant findings or limit the claim to behavioral implications.
The references cited elaboration in extension this. The revised version of the Conclusion section also elaborates on this.
Overall, the conclusions section reads as a loosely connected set of assertions rather than a focused synthesis. It introduces new interpretations and terminology not supported or developed earlier in the paper, and the argument jumps across temporal and geographic scales without clear transitions. The discussion should be restructured to summarize key results, clarify the scope of interpretation, and avoid speculative or overstated claims about evolutionary significance.
We have done so, supported by the references used in addition to extending some of the arguments
(34) Line 639: The systematic excavation of the stratigraphic layers involved a small crew.
This sentence is not necessary.
No comment
(35) Line 643: The orientation and inclination of the artifacts were recorded using a compass and an inclinometer, respectively.
What were these measurements used for (e.g., post-depositional movement analysis, spatial patterning)? A short note on the purpose would make this more meaningful.
Fabric analysis has been added to the revised version.
(36) Line 659: Restoration of the EAK elephant bones
This section could be streamlined and clarified. It includes procedural detail that doesn't contribute to scientific replicability (e.g., the texture of gauze, number of consolidant applications), while omitting some key information (such as how restoration may have affected analytical results). It also contains interpretive comments ("most of the assemblage has been successfully studied") that don't belong in Methods.
No comment
(37) Line 689: In the field laboratory, cleaning of the bone remains was carried out, along with adhesion of fragments and their consolidation when necessary.
Clarify whether cleaning or adhesion treatments might obscure or alter bone surface modifications, as this has analytical implications.
These protocols do not impact bone like that anymore.
(38) Line 711: (b) Percussion Tools - Includes hammerstones or cobbles exhibiting diagnostic battering, pitting, and/or impact scars consistent with percussive activities.
Define how diagnostic features (battering, pitting) were identified - visual inspection, magnification, or quantitative criteria?
Both macro and microscopically
(39) Line 734: We conducted the analysis in three different ways after selecting the spatial window, i.e., the analysed excavated area (52.56 m2).
Clarify why the 52.56 m<sup>2</sup> spatial window was chosen. Was this the total excavated area or a selected portion?
It was what was left of the elephant accumulation after erosion.
(40) Line 728: The spatial statistical analyses of EAK.
Adding one or two sentences at the start explaining the analytical objective, such as testing spatial association between faunal and lithic materials, would help readers understand how each analysis relates to the broader research questions.
This is well explained in the main text
(41) Line 782: An intensive survey seeking stratigraphically-associated megafaunal bones was carried out in the months of June 2023 and 2024.
It would help to specify whether the same areas were resurveyed in both field seasons or if different zones were covered each year. This information is important for understanding sampling consistency and potential spatial bias.
Both areas were surveyed in both field seasons. We were very consistent.
(42) Line 787: We focused on proboscidean bones and used hippopotamus bones, some of the most abundant in the megafaunal fossils, as a spatial control.
Clarify how the hippopotamus remains functional as a "spatial control." Are they used as a proxy for water-associated taxa to test habitat patterning, or as a baseline for comparing carcass distribution? The meaning of "control" in this context is ambiguous.
As a proxy for megafaunal distribution given their greater abundance over any other megafaunal taxa.
(43) Line 789: Stratigraphic association was carried out by direct observation of the geological context and with the presence of a Quaternary geologist during the whole survey.
This is good methodological practice, but it would be helpful to describe how stratigraphic boundaries were identified in the field (for example, by reference to tuffs or marker beds). That information would make the geological framework more replicable.
This is basic geological work. Of course, both tuffs and marker beds were followed.
(44) Line 791: When fossils found were ambiguously associated with specific strata, these were excluded from the present analysis.
You might specify what proportion of the total finds were excluded due to uncertain stratigraphic association. Reporting this would indicate the strength of the stratigraphic control.
This was not quantified but it was a very small amount compared to those whose stratigraphic provenience was certain.
(45) Line 799: The goals of this survey were: a) collect a spatial sample of proboscidean and megafaunal bones enabling us to understand if carcasses on the Olduvai paleolandscapes were randomly deposited or associated to specific habitats.
You might clarify how randomness or habitat association was tested.
Randomness was tested spatially and comparing density according to ecotone. Same for habitat association.
(46) The Methods section provides detailed information about excavation, restoration, and spatial analyses but omits critical details about the zooarchaeological and taphonomic procedures. There is no explanation of how faunal remains were analyzed once recovered, including how cut marks, percussion marks, or green bone fractures were identified or what magnification or diagnostic criteria were used. The authors also do not specify the analytical unit used for faunal quantification (e.g., NISP, MNI, MNE, or other), making it unclear how specimen counts were generated for spatial or taphonomic analyses. Even if these details are provided in the Supplementary Information, the main text should include at least a concise summary describing the analytical framework, the criteria for identifying surface modifications and fracture morphology, and the quantification system employed. This information is essential for transparency, replicability, and proper evaluation of the behavioral interpretations.
See reply above. There is a new subsection on taphonomic methods now.
Supplementary information:
(47) The Supplementary Information includes a large number of green-broken proboscidean specimens from other Olduvai localities (BK, LAS, SC, FLK West), but it is never explained why these are shown or how they relate to the EAK study. The main analysis focuses entirely on the EAK elephant, including so much unrelated material without any stated purpose, which makes the supplement confusing. If these examples are meant only to illustrate the appearance of green fractures, that should be stated. Otherwise, the extensive inclusion of non-EAK material gives the impression that they were part of the analyzed assemblage when they were not.
This is stated in the opening paragraph to the section.
(48) Line 96: A small collection of green-broken elephant bones was retrieved from the lower and upper Bed II units.
It would help to clarify whether these specimens are part of the EAK assemblage or derive from other Bed II localities. As written, it is not clear whether this description refers to material analyzed in the main text or to comparative examples shown only in the Supplementary Information.
No, EAK only occupies the lower Bed II section. They belong in the Bed II paleolandscape units.
(49) Line 97: One of them, a proximal femoral shaft found within the LAS unit, has all the traces of having been used as a tool (Figure 6).
This says the bone tool in Figure 6 is from LAS, but the main text caption identifies it as from EAK. If I am not mistaken, EAK is a site at the base of Bed II, and LAS is a separate stratigraphic unit higher in the sequence, so the authors should clarify which is correct.
Our mistake. It provenience is from LAS in the vicinity of EAK.
(50) Line 186: Figure S20. Example of other megafaunal long bone shafts showing green breaks.
Not cited in text or SI narrative. No indication where these bones come from or why they are relevant.
It appears justified in the revised version.
(51) Line 474: Figure S28-S30. Hyena-ravaged giraffe bones from Chobe (Botswana).
These figures are not discussed in the text or SI, and their relevance to the study is unclear. The authors should explain why these modern comparative examples were included and how they inform interpretations of the Olduvai assemblages.
It appears justified in the revised version.
(52) Line 498: Figure S31. Bos/Bison bone from Bois Roche (France).
This figure is not mentioned in the text or Supplementary Information. The authors should specify why this specimen is shown and how it contributes to the study's taphonomic or behavioral comparisons.
It appears justified in the revised version.
(53) Line 504: Figure S32. Miocene Gomphotherium femur from Spain.
This figure is never referenced in the paper. The authors should clarify the purpose of including a Miocene specimen from outside Africa and explain what it adds to the interpretation of Bed II material.
It appears justified in the revised version.
(54) Line 508: Figure S33. Elephant femoral shaft from BK (Olduvai).
This figure appears to show comparative material but is not cited or discussed in the text. The authors should explain why the BK material is presented here and how it relates to EAK or the broader analysis.
There are two figures labeled S33.
It appears justified in the revised version.
(55) Line 515: Figure S33. Tibia fragment from a large medium-sized bovid displaying multiple overlapping scars on both breakage planes inflicted by carnivore damage.
Because this figure repeats the S33 label and is not cited or explained in the text, it is unclear why this specimen is included or how it contributes to the study. The authors should correct the duplicate numbering and clarify the purpose of this figure.
It appears justified in the revised version.
(56) Line 522: Same specimen as shown in Figure S30, viewed on its medial side.
This is not the same bone as S30. This figure is not discussed in the text or Supplementary Information. The authors should clarify why it is included and how it relates to the rest of the analysis.
It appears justified in the revised version.
Reviewer #1 (Public review):
Summary:
This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD.
Strengths:
(1) The authors employed a compelling integrated biochemical and biophysical approach.
(2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.
(3) Overall, the results support their conclusions.
(4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here.
Weaknesses:
All weaknesses were addressed by the Authors in revision.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD.
Strengths:
(1) The authors employed a compelling integrated biochemical and biophysical approach.
(2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.
(3) Overall, the results support their conclusions.
(4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here.
Weaknesses:
(1) The major weakness in this work is that it is hard to appreciate what these shifting allosteric ensembles actually look like on the protein structure. Additional graphical representations would really help convey the exciting results of this study.
We thank the review for the comments. In response to the specific recommendations below, we added two new figures—Figure 1 and Figure 8 in this resubmission—that hopefully address the weakness identified by the reviewer.
Reviewer #2 (Public review):
Summary:
The authors use ligands (inverse agonists, partial agonists) for PPAR, and coactivators and corepressors, to investigate how ligands and cofactors interact in a complex manner to achieve functional outcomes (repressive vs. activating).
Strengths:
The data (mostly biophysical data) are compelling from well-designed experiments. Figures are clearly illustrated. The conclusions are supported by these compelling data. These results contribute to our fundamental understanding of the complex ligand-cofactor-receptor interactions.
Weaknesses:
This is not the weakness of this particular paper, but the general limitation in using simplified models to study a complex system.
We appreciate the reviewer’s comments. Breaking down a complex system into a simpler model system, when possible, provides a unique lens with which to probe systems with mechanistic insight. While simplified models may not always explain the complexity of systems in cells, for example, our recently published work showed that a simplified model system — biochemical assays using reconstituted PPARγ ligand-binding domain (LBD) protein and peptides derived from coregulator proteins (similar to the assays in this current work) and protein NMR structural biology studies using PPARγ LBD — can explain the activity of ligand-induced PPARγ activation and repression to a high degree (pearson/spearman correlation coefficients ~0.7-0.9):
MacTavish BS, Zhu D, Shang J, Shao Q, He Y, Yang ZJ, Kamenecka TM, Kojetin DJ. Ligand efficacy shifts a nuclear receptor conformational ensemble between transcriptionally active and repressive states. Nat Commun. 2025 Feb 28;16(1):2065. doi: 10.1038/s41467-025-57325-4. PMID: 40021712; PMCID: PMC11871303.
Recommendations for the authors
Reviewer #1 (Recommendations for the authors):
(1) More set-up is needed in the results section. The first paragraph is unclear on what is new to this study versus what was done previously. Likewise, a brief description of the assays used and the meaning behind differences in signals would help the general reader along.
We modified the last paragraph of the introduction and first results section to hopefully better set the stage for what was done previously vs. what is new/recollected in this study. In our results section, we also include more description about what the assays measure.
(2) Since this paper is building on previous work, additional figures are needed in the introduction and discussion. Graphical depictions of what was found in the first study on how these ligands uniquely influence PPARg LBD conformation. A new model/depiction in the discussion for what was learned and its context with the rest of the field.
Our revised manuscript includes a new Figure 1 describing the possible allosteric mechanism by which a covalent ligand inhibits binding of other non-covalent ligands that was inferred from our previous study; and a new Figure 8 with a model for what has been learned.
(3) It is stated that the results shown are representative data for at least two biological replicates. However, I do not see the other replicates shown in the supplementary information.
We appreciate the Reviewer’s emphasis on data reproducibility and rigor. We confirm that the biochemical and cellular assay data presented are indeed representative of consistent findings observed across two or more biological replicates—and we show representative data in our figures but not the extensive replicate data in supplementary information consistent with standard practices.
(4) Figure 1a could benefit from labels of antagonists, inverse agonist, etc., next to each chemical structure. Likewise, if any co-crystal or other models are available it would be helpful to include those for comparison.
We added the pharmacological labels to Figure 2a (old Figure 1a).
(5) The figure legends don't seem to match up completely with the figures. For example, Figure 2b states that fitted Ki values +/- standard deviation. are stated in the legend, but it's shown as the log Ki.
We revised the figure legends to ensure they display the appropriate errors as reported from the data fitting.
(6) EC50, IC50, Ki, and Kd values alongside reported errors and R2 values for the fits should be reported in a table.
Our revised manuscript now includes a Source Data file (Figure 5—source data 1.xlsx) of the data (n=2) plotted in Figure 5 (old Figure 4) so that readers can regenerate the plots and calculate the errors and R2 values if desired. Otherwise, fitted values and errors are reported in figures when fitting in Prism permitted and reported errors; when Prism was unable to fit data or fit the error, n.d. (not determined) is specified.
(7) Statistical analysis is missing in some places, for example, Figure 1b.
We revised Figure 2b (old Figure 1b) to include statistical testing.
Reviewer #2 (Recommendations for the authors):
I suggest that the authors discuss the following points to broaden the significance of the results:
(1) The two partial agonists MRL24 and nTZDpa) are "partial" in the coactivator and corepressor recruitment assays, but are "complete" in the TR-FRET ligand displacement assay (Figure 2). Please explain that a partial agonist is defined based on the functional outcome (cofactor recruitment in this study) but not binding affinity/efficacy.
We added the following sentence to describe the partial agonist activity of these compounds: “These high affinity ligands are partial agonists as defined on their functional outcome in coregulator recruitment and cellular transcription; i.e., they are less efficacious than full agonists at recruiting peptides derived from coactivator proteins in biochemical assays (Chrisman et al., 2018; Shang et al., 2019; Shang and Kojetin, 2024) and increasing PPARγ-mediated transcription (Acton et al., 2005; Berger et al., 2003).“
(2) Will the discovery reported here be broadly applicable?
(a) Applicable if other partial agonists and inhibitors are used?
(b) Applicable if different coactivators/corepressors, or different segments of the same cofactor, are used?
(c) Applicable to other NRs (their AF-2 are similar but with sequence variation)?
(d) The term "allosteric" might mean different things to different people - many readers might think that it means a "distal and unrelated" binding pocket. It might be helpful to point out that in this study, the allosteric site is actually "proximal and related".
We expanded our introduction and/or discussion sections to expand upon these concepts; specific answers as follows:
(a) Orthosteric partial agonists?—yes, because helix 12 would clash with an orthosteiric ligand; other covalent inhibitors?—it depends on whether the covalent inhibitor stabilizes helix 12 in the orthosteric pocket.
(b) yes with some nuanced exceptions where certain segments of the same coregulator protein bind with high affinity and others apparently do not bind or bind with low affinity
(c) it is not clear yet if other NRs share a similar ligand-induced conformational ensemble to PPARγ
(d) we addressed this point in the 4th paragraph of the introduction “...the non-covalent ligand binding event we previously described at the alternate/allosteric site, which is proximal to the orthosteric ligand-binding pocket, …”
Reviewer #1 (Public review):
Summary:
This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable efficient transcription elongation. Using CUT&Tag-seq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation.
Strengths:
(1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription.
(2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community.
(3) The integration of inhibition (DRB) and recovery experiments offers insight into the coupling between Pol II activity and chromatin modifications.
(4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies.
Weaknesses:
(1) Misrepresentation of prior literature
The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence.
(2) Incorrect statement regarding hexasome fragility
The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed.
(3) Inaccurate mechanistic interpretation of DRB
The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 µM DRB fully inhibits Pol II activity, yet much higher concentrations (10-100×) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover.
(4) Insufficient resolution of genome-wide correlations
Figure 1 presents only low-resolution maps, which are insufficient to determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a different localization pattern should be included as a negative control.
(5) Conceptual framing
The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription.
Strengths:
The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation.
Weaknesses:
This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems. I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript. I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert. For example:
(1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2ph-PolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription.
(2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing.
(A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes?
(B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)?
(3) To my knowledge, the initial observation of DRB effects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged.
(4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top?
Reviewer #3 (Public review):
Summary:
Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which pan-acetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by pan-acetylation, while subsequent rounds are accompanied by pan-phosphorylation.
Strengths:
This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription.
Weaknesses:
The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated.
Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable eFicient transcription elongation. Using CUT&Tagseq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation.
Strengths:
(1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription.
(2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community.
(3) The integration of inhibition (DRB) and recovery experiments oFers insight into the coupling between Pol II activity and chromatin modifications.
(4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies.
Really appreciate the positive or affirmative comments from the reviewer.
Weaknesses:
(1) Misrepresentation of prior literature
The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence.
What we said is according to the original report in the publication (Bintu et al., Cell, 2012). Here is the citation from the report:
Page 739,(Bintu, L. et al., Cell, 2012)(PMID: 23141536)
“Overall transcription through tailless and acetylated nucleosomes is slightly faster than through unmodified nucleosomes (Figure 1C), with crossing times that are generally under 1 min (39.5 ± 5.7 and 45.3 ± 7.6 s, respectively). Both the removal and acetylation of the tails increase eFiciency of NPS passage:71% for tailless nucleosomes and 63% for acetylated nucleosomes (Figures 1C and S1), in agreement with results obtained using bulk assays of transcription (Ujva´ ri et al., 2008).”
We will cite this original sentence in our revision.
(2) Incorrect statement regarding hexasome fragility
The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed.
“hexasome” was introduced in the transcription field four decades ago. Later, several groups claimed that “hexasome” is fragile and could be generated in transcription elongation of Pol II. However, their original definition was based on the detection of ~100 bps DNA fragments (MNase resistant) in vivo by Micrococcal nuclease sequencing (MNase-seq), which is the right length to wrap up one hexasome histone subunit (two H3/4 and one H2A/2B) to form the sub-nucleosome of a hexasome. As we should all agree that acetylation or phosphorylation of the tails of histone nucleosomes will lead to the compromised interaction between DNA and histone subunits, which could lead to the intact naïve nucleosome being fragile and easy to disassemble, and easy to access by MNase. Fragile nucleosomes lead to better accessibility of MNase to DNA that wraps around the histone octamer, producing shorter DNA fragments (~100 bps instead of ~140 bps). In this regard, we believe that these ~100 bps fragments are the products of fragile nucleosomes (fragile nucleosome --> hexasome), instead of the other way around (hexasome --> fragile).
Actually, two early reports from Dr. David J. Clark’s group from NIH raised questions about the existence of hexasomes in vivo (PMID: 28157509) (PMID: 25348398).
From the report of PMID:35597239, depletion of INO80 leads to the reduction of “hexasome” for a group of genes, and the distribution of both “nucleosomes” and “hexasomes” with the gene bodies gets fuzzier (less signal to noise). In a recent theoretical model (PMID: 41425263), the corresponding PI found that chromatin remodelers could act as drivers of histone modification complexes to carry out different modifications along gene bodies. The PI found that INO80 could drive NuA3 (a H3 acetyltransferase) to carry out pan-acetylation of H3 and possibly H2B as well in the later runs of transcription of Pol II for a group of genes (SAGA-dependent). It suggests that the depletion of INO80 will affect (reduce) the pan-acetylation of nucleosomes, which leads to the drop of pan-acetylated fragile nucleosomes, subsequently the drop of “hexasomes”. This explains why depletion of INO80 leads to the fuzzier results of either nucleosomes or “hexasomes” in PMID: 35597239. The result of PMID: 35597239 could be a strong piece of evidence to support the model proposed by the corresponding PI (PMID: 41425263).
From a recent report: PMID:40412388, the authors claimed that FACT could bind to nucleosomes to generate “hexasomes”, which are fragile for Pol II to overcome the resistance of nucleosomes. It was well established that FACT enhances the processivity of Pol II in vivo via its chaperonin property. However, the exact working mechanism of FACT still remains ambiguous. A report from Dr. Cramer’s group showed that FACT enhances the elongation of regular genes but works just opposite for pausing-regulated genes (PMID: 38810649). An excellent review by Drs. Tim Formosa and Fred Winston showed that FACT is not required for the survival of a group of differentiated cells (PMID: 33104782), suggesting that FACT is not always required for transcription. It is quite tricky to generate naïve hexasomes in vitro according to early reports from the late Dr. Widom’s group. Most importantly, the new data (the speed of Pol II, the best one on bare DNA is ~27 bps/s) from the report of PMID: 40412388, which is much slower than the speed of Pol II in vivo: ~2.5 kbs/min or ~40 bps/s. From our recovering experiments (Fig. 4C, as mentioned by reviewer #3), in 20 minutes (the period between 10 minutes and 30 minutes, due to the property of CUT-&TAG-seq, of which Pol II still active after cells are collected, there is a big delay of complete stop of Pol II during the procedure of CUT&TAG experiments, so the first period of time does not actually reflect the speed of Pol II, which is ~5 kb/min), all Pol IIs move at a uniform speed of ~2.5 kbs/min in vivo. Interestingly, a recent report from Dr. Shixin Liu’s group (PMID: 41310264) showed that adding SPT4/5 to the transcription system with bare DNA (in vitro), the speed of Pol II reaches ~2.5kbs/min, exactly the same one as we derived in vivo. Similar to the original report (PMID: 23141536), the current report of PMID:40412388 does not mimic the conditions in vivo exactly.
There is an urgent need for a revisit of the current definition of “hexasome”, which is claimed to be fragile and could be generated during the elongation of Pol II in vivo. MNase is an enzyme that only works when the substrate is accessible. In inactive regions of the genome, due to the tight packing of chromatin, MNase is not accessible to individual nucleosomes within the bodies of a gene or upstream of promoters, which is why we only see phased/spacing or clear distribution of nucleosomes at the transcription start sites, but it becomes fuzzy downstream or upstream of promoters. On the other hand, for fragile nucleosomes, the accessibility to MNase should increase dramatically, which leads to the ~100 bps fragments. Based on the uniform rate (2.5 kbs/min) of Pol II for all genes derived from human 293T cells and the similar rate (2.5 kbs/min) of Pol II on bare DNA in vitro, it is unlikely for Pol II to pause in the middle of nucleosomes to generate “hexasomes” to continue during elongation along gene bodies. Similar to RNAPs in bacterial (no nucleosomes) and Archaea (tailless nucleosomes), there should be no resistance when Pol IIs transcribe along all fragile nucleosomes within gene bodies in all eukaryotes, as we characterized in this manuscript.
(3) Inaccurate mechanistic interpretation of DRB
The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 M DRB fully inhibits Pol II activity, yet much higher concentrations (10-100 ) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover.
Yes, it was reported that DRB is also an inhibitor of CDK9. However, if the reviewer agrees with us and the current view in the field, the phosphorylation of Ser5-CTD of Pol II is the initiation of transcription for all Pol II-regulated genes in eukaryotes. CDK9 is only required to work on the already phosphorylated Ser5-CTD of Pol II to release the paused Pol II, which only happens in metazoans. From a series of works by us and others: CDK9 is unique in metazoans, required only for the pausing-regulated genes but not for regular genes. We found that CDK9 works on initiated Pol II (Ser5-CTD phosphorylated Pol II) and generates a unique phosphorylation pattern on CTD of Pol II (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II), which is required to recruit JMJD5 (via CID domain) to generate a tailless nucleosome at +1 from TSS to release paused Pol II (PMID: 32747552). Interestingly, the report from Dr. Jesper Svejstrup’s group (PMID: 40315851) showed that CDK9 could generate a unique phosphorylation pattern (Ser2ph-Ser5ph-CTD of Pol II), which is not responsive to the popular 3E10 antibody that recognizes the single Ser2phCTD of Pol II. This interesting result is consistent with our early report showing the unique phosphorylation pattern (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II) is specifically generated by CDK9 in animals, which is not recognized by 3E10 either (PMID: 32747552). Actually, an early report from Dr. Dick Eick’s group (PMID: 26799765) showed the difference in the phosphorylation pattern of the CTD of Pol II between animal cells and yeast cells. We have characterized how CDK9 is released from 7SK snRNP and recruited onto paused Pol II via the coupling of JMJD6 and BRD4 (PMID: 32048991), which was published on eLIFE. It is well established that CDK9 works after CDK7 or CDK8. From our PRO-seq data (Fig. 3) and CUT&TAG-seq data of active Pol II (Fig. 4), adding DRB completely shuts down all genes via inhibiting the initiation of Pol II (generation of Ser5ph-CTD of Pol II). Due to the uniqueness of CDK9 only in metazoans, it is not required for the activation of CDK12 or CDK13 (they are orthologs of CTK1 in yeast), as we demonstrated recently (PMID: 41377501). Instead, we found that CDK11/10 acts as the ortholog of Bur1 kinase from yeast, is essential for the phosphorylation of Spt5, the link of CTD of Pol II, and CDK12 (PMID: 41377501).
(4) Insufficient resolution of genome-wide correlations
Figure 1 presents only low-resolution maps, which are Insufficient o determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a diFerent localization pattern should be included as a negative control.
A popular view in the field is that the majority of genomes are inactive since they do not contain coding RNAs, which are responsible for ~20,000 protein candidates characterized in animals. However, our genomewide characterization using the four histone modification marks, active Pol II, and RNA-seq, shows a different story. Figure 1 shows that most of the human genome of HEK293T is active in producing not only protein-coding RNAs but also non-coding RNAs (the majority of them). We believe that Figure 1 could change our current view of the activity of the entire genome, and should be of great interest to general readers as well as researchers on genomics. Furthermore, it is a basis for Figure 2, which is a zoom-in of Figure 1.
(5) Conceptual framing
The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.
The reviewer is right, we should tone down the strong sentences. However, we believe that our data is strong enough to derive the general conclusion. The reviewer may agree with us that the entire field of transcription and epigenetics has been stagnant in recent decades, but there is an urgent need for fresh ideas to change the current situation. Our novel discoveries, for sure, additional supporting data are needed, should open up a brand new avenue for people to explore. We believe that a new era of transcription will emerge based on our novel discoveries. We hope that this manuscript will attract more people to these topics. As Reviewer #3 pointed out, this story establishes the connection between transcription and epigenetics in the field.
Reviewer #2 (Public review):
Summary:
In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription.
Strengths:
The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation.
Weaknesses:
This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems.
Actually, the pioneering work of the establishment of in vitro transcription assays at Dr. Robert Roeder’s group led to numerous groundbreaking discoveries in the transcription field. The contributions of in vitro work in the transcription field are the key for us to explore the complexity of transcription in eukaryotes in the early times and remain important currently.
I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript. I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert.
Thanks for your suggestions. You are right, we have problems expressing our ideas clearly in this manuscript, which could confuse. We will make modifications accordingly per your suggestions.
For example:
(1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2phPolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription.
Figure 1 shows the overall distribution of the four major histone modifications, active Pol II, and mRNA genome-wide in human HEK293T cells. It tells general readers that the entire genome is quite active and far more than people predicted that most of the genome is inactive, since just a small portion of the genome expresses coding RNAs (~20,000 in animals). Figure 1 shows that the majority of the genome is active and expresses not only coded mRNA but also non-coding RNAs. After all, it is the basis of Figure 2, which is a zoom-in of Figure 1. However, it is beyond the scope of this manuscript to discuss the non-coding RNAs.
(2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing.
(A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes?
Figure 2 is a zoom-in for the individual gene, which shows how histone modifications are coupled with Pol II activity on the individual gene. We will give a more detailed explanation of the figure per the reviewer’s suggestions.
(B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)?
Good suggestions, we will do some modifications.
(3) To my knowledge, the initial observation of DRB eFects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged.
Thanks for the reference, which is the first report to show the DRB inhibits initiation of Pol II in vivo. We will cite it in the revision.
(4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top?
Thanks for the suggestions, we will give a more detailed description of the Figures.
Reviewer #3 (Public review):
Summary:
Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which panacetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by panacetylation, while subsequent rounds are accompanied by panphosphorylation.
Strengths:
This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription.
Thanks for the insightful comments, which are exactly what we want to present in this manuscript.
Weaknesses:
The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated.
Thanks for the suggestions. What we want to express is that the generation of fragile nucleosomes precedes transcription, or, more specifically, transcription elongation. The corresponding PI wrote a hypothetical model on how pan-acetylation is generated by the coupling of chromatin remodelers and acetyltransferase complexes along gene bodies, in which chromatin remodelers act as drivers to carry acetyltransferases along gene bodies to generate pan-acetylation of nucleosomes (PMID: 41425263). We have a series of work to show how “tailless nucleosomes” at +1 from transcription start sites are generated to release paused Pol II in metazoans (PMID: 28847961) (PMID: 29459673) (PMID: 32747552) (PMID: 32048991). We still do not know how pan-phosphorylation along gene bodies is generated. It should be one of the focuses of our future research.
Reviewer #1 (Public review):
This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.
The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.
Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.
Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.
(1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.
(2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.
(3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.
(4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.
(5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.
(6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.
Reviewer #2 (Public review):
Summary:
Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.
How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.
The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.
They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.
Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).
This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.
Strengths:
This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.
Weaknesses:
The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.
(1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?
(2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.
In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:<br /> a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?<br /> b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?
(3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?
(4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?
(5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?
(6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.
Author response:
Public Reviews:
Reviewer #1 (Public review):
This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.
The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.
Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.
Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.
(1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.
We agree with the reviewer that the Po analysis has strong limitations, particularly in experiments where the recording times are short, such as when extracellular pH is changed via photolysis (Figure 4D) or puff application (Figure 3Aa). To circumvent this problem and not rely solely on Po estimations, we used alternative methods, including an analysis of the total membrane charge (extensively used throughout the manuscript, as in Figures 3A and 4D) and an analysis of event latencies (Figure 4G). Nevertheless, single channel recordings contain information that is not included in the macroscopic current analysis. In the revised version, we intend to stress that the elementary current amplitude is conserved during manipulations such as pH changes, leaving the total number of channels (N) and the channel open probability (Po) as possible culprits for the current changes. Since these changes are rapid and reversible, it is likely that N remains constant while Po changes. To address the reviewer’s concern, we propose the following changes/reanalysis: (i) report in each condition the minimum N (based on the maximum number of simultaneously open channels; for example, in Figure 3Aa, the minimum N goes from 4-5 in control conditions to 1 during the puff of the pH 6.4 solution). Although imperfect, this method provides a tentative estimate of Po; (ii) report the fraction of time that the channels remain open; (iii) revise the text and figures to use the expression “apparent Po” instead of “Po”, acknowledging the limitations of the measurement in short recordings. We also acknowledge that some traces (Figure 3Aa, top) may appear confusing, as they seem to show macroscopic currents. We will modify these figures by including the amplitude histograms (as in Figure 1Bb) to clearly demonstrate that recordings from CSFcNs primarily reflect single-channel activity when challenged with pH changes.
(2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.
We agree with the reviewer that the nomenclature for the polycystin protein family is confusing. In this manuscript, we have followed the nomenclature proposed in a recent comprehensive review on polycystin channels by Palomero, Larmore and DeCaen (Palomero et al. 2023), which refer to the channels by their gene names. As indicated in that review, the PKD2L1 channel corresponds to TRPP2 (previously known as TRPP3, see their Table 1). However, in another recent review on TRP channels, the PKD2L1 channel is referred to as TRPP3 (Zhang et al. 2023). To prevent any ambiguity, we will remove references to the TRPP nomenclature from the text and exclusively use the PKD2L1 acronym.
(3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.
Although largely used in the literature, we concur with the reviewer that the term “OFF response” is not very helpful from a biophysical perspective as it may imply the existence of a distinct current. Consequently, we will remove the terms “OFF response” and “OFF current” from the revised manuscript and replace them with the term “photolysis-evoked PKD2L1 current”. Furthermore, to improve the logical flow, we will condense the two sections (“The proton-induced current is an off-current” and “The off-current is mediated by the activation of PKD2L1 channels”) into a single, new section titled “The photolysis-induced current is mediated by PKD2L1 channels”. This consolidation will prevent the artificial separation of the description of this current. Finally, we will revise the discussion to better characterize this photolysis-evoked phenomenon as a recovery current.
(4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.
This point is partly addressed in Figure 3. The data indicate that PKD2L1 channels are highly sensitive to pH variations within the physiological range. To strengthen this conclusion, we will add the EC50 values derived from the curve fittings to the figure. Regarding ASIC-mediated currents, one of our main conclusions is that ASICs are not present in the apical process (ApPr), as the effects of proton photolysis in the ApPr are not blocked by ASIC antagonists. Our results suggest that PKD2L1 channels are the exclusive pH sensitive channels in the ApPr. ASIC channels likely mediate acid sensitivity in the soma, although we have not investigated the latter in detail. We intend to modify the Discussion in order to provide a physiological framework linking channel activity with physiological and pathophysiological pH changes.
(5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.
Folllowing the reviewer’s suggestions, an exhaustive rephrasing will be performed to unify terminology, data presentation and correct the text.
(6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.
This is indeed a very interesting and currently unresolved point in the physiology of CSFcNs. Published data indicate that calcium influx through PKD2L1 channels is a key regulator of apical process (ApPr) physiology. These channels are calcium permeable yet are also inhibited by intracellular calcium (DeCaen et al. 2016). Additionally, ultrastructural data show that the ApPr is rich in mitochondria and tubulo-vesicular structures resembling the Golgi apparatus (Bruni et Reddy 1987; Bjugn et al. 1988; Nakamura et al. 2023), intracellular organelles critical for calcium homeostasis. Altogether, this evidence suggests that intra-ApPr calcium concentration must be finely regulated, both in space and time, for the ApPr to fulfill its physiological roles. Based on the existing literature, we can speculate that these calcium signals are decoded by several systems: (i) calcium may act as a second messenger, linking the activation of the multimodal PKD2L1 channels to changes in CSFcN excitability, which in turn regulates spinal neuronal networks controlling locomotor activity; (ii) calcium could initiate the neurosecretion of various molecules from the ApPr into the central canal, as proposed by the Wyart group in the zebrafish in the context of bacterial infections (Prendergast et al. 2023); (iii) calcium could activate the Hedgehog signaling pathway (as has been shown by Delling et al. 2013); iv) calcium could modulate CSF flow by modulating ependymal cells ciliary activity. Resolving these downstream pathways is essential to fully define the role of CSFcNs as integrators of cerebrospinal fluid homeostasis. We will expand on this topic in the Discussion section of the revised ms.
Reviewer #2 (Public review):
Summary:
Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.
How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.
The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.
They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.
Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).
This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.
Strengths:
This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.
Weaknesses:
The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.
(1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?
Following the reviewer’s suggestion, we performed a Spearman’s rank correlation test. The analysis revealed a significant correlation between the changes in the apparent open probability and the holding current in paired experiments (control vs pH 6.4 pressure applications; p < 0.05, Spearman r = 0.72 and critical value = 0.67). The Pearson correlation coefficient calculated on the same data set was r = 0.63 (critical value = 0.632), indicating that the correlation is not linear. We thank the reviewer for raising this point and will add this analysis to the manuscript.
(2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.
This is a very important point. As the reviewer notes, previous experimental evidence indicates that CSFcNs are also sensitive to osmolarity changes and mechanical stimulation in a PKD2L1-dependent manner. It is therefore reasonable to assume that, similar to pH sensitivity, osmotic and mechanical sensitivity depend on channels localized to the apical process (ApPr). Regarding mechanosensitivity, this spatial segregation could be tested by mechanically stimulating either the ApPr or the soma with a piezo-controlled blunt pipette (see, for example, Hao et al. 2013). Assessing sensitivity to osmotic changes, however, is more challenging, as pressure application lacks the spatial resolution to discriminate between compartments in such a compact cell. In theory, a highly localized osmotic jump could be achieved via photolysis, provided a caged compound that releases many osmotic particles simultaneously is used. In typical photolysis experiments, a localized osmotic change is produced, but its amplitude is very low (on the order of 1 to 2 mOsm).
In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:
(a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?
As the reviewer indicates, these partial and rapid openings are characteristic of PKD2L1 single-channel activity and appear to be conserved across species. However, estimating their precise contribution to the sustained current would require a detailed channel model, which is currently lacking. Indeed, the exact mechanism underlying this prominent sustained current in CSFcNs remains unknown and should definitely be addressed in future work.
(b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?
The outside-out recordings from the apical process also show some partial openings (see the upper trace in Figure 4Db). We will specifically mention this important point in the revised version of the ms.
(3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?
We thank the reviewer for pointing out the presence of metabotropic glutamate receptors in CSFcNs. However, our evidence indicates that metabotropic receptors do not contribute to the response when uncaging MNI-glutamate. This conclusion is supported by two observations: (i) the response obtained when uncaging MNI-γLGG, which does not release glutamate (Figure 5Ab), and (ii) the response obtained when uncaging protons from DPNI-GABA (data not shown) (DPNI-GABA is a GABA cage with photochemistry similar to MNI cages that also releases a proton upon photolysis; Trigo et al. 2009), are the same. In both experiments (uncaging MNI-γLGG or DPNI-GABA) a clear photolysis-evoked PKD2L1 current is observed.
(4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?
The reviewer is correct in noting that the opening frequency of PKD2L1 channels appears lower in outside-out patches than in whole-ApPr recordings, although we have not quantified this. We interpreted this difference as reflecting a lower channel number. However, as the reviewer suggests, a plausible alternative explanation is that the channel's biophysical properties are altered when removed from its native ionic environment or when it loses interactions with regulatory proteins. We will address this point in the Discussion.
(5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?
As described in the manuscript, we performed experiments with ASIC antagonists, although we did not attempt to characterize the specific ASIC subtype mediating the somatic response. Based on the published literature, we used both psalmotoxin-1, which blocks ASIC1 channels, and APETx2, which blocks ASIC3 channels. The presence of ASIC1 in mouse CSFcNs has been demonstrated previously (Orts-Del’immagine et al. 2012; Orts-Del’Immagine et al. 2016), while ASIC3 has been identified in lamprey CSFcNs (Jalalvand et al. 2016). When applying an acidic solution to the soma, we recorded an inward current that was substantially blocked by psalmotoxin-1, although a small residual component persisted, consistent with the earlier findings of Orts-Del’Immagine et al. We did not attempt to block this remaining Psalmotoxin1‑insensitive component.
(6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.
We have never seen such kind of openings in our recordings (when the channel is closed or in the presence of dibucaine).
References
Bjugn, R, H K Haugland, et P R Flood. 1988. “Ultrastructure of the mouse spinal cord ependyma”. Journal of Anatomy 160 (octobre): 117‑25.
Bruni, J. E., et K. Reddy. 1987. “Ependyma of the Central Canal of the Rat Spinal Cord: A Light and Transmission Electron Microscopic Study”. Journal of Anatomy 152 (juin): 55‑70.
Delling, Markus, Paul G. DeCaen, Julia F. Doerner, Sebastien Febvay, et David E. Clapham. 2013. ”Primary cilia are specialized calcium signaling organelles”. Nature 504 (7479): 311‑14 https://doi.org/10.1038/nature12833.
Hao, Jizhe, Jérôme Ruel, Bertrand Coste, Yann Roudaut, Marcel Crest, et Patrick Delmas. 2013. “Piezo-Electrically Driven Mechanical Stimulation of Sensory Neurons”. In Ion Channels, édité par Nikita Gamper, vol. 998. Methods in Molecular Biology. Humana Press. https://doi.org/10.1007/978-1-62703-351-0_12.
Jalalvand, Elham, Brita Robertson, Hervé Tostivint, Peter Wallén, et Sten Grillner. 2016. “The Spinal Cord Has an Intrinsic System for the Control of pH”. Current Biology: CB 26 (10): 1346‑51. https://doi.org/10.1016/j.cub.2016.03.048.
Nakamura, Yuka, Miyuki Kurabe, Mami Matsumoto, et al. 2023. “Cerebrospinal Fluid-Contacting Neuron Tracing Reveals Structural and Functional Connectivity for Locomotion in the Mouse Spinal Cord”. eLife 12 (février): e83108. https://doi.org/10.7554/eLife.83108.
Orts-Del’Immagine, Adeline, Riad Seddik, Fabien Tell, et al. 2016. “A Single Polycystic Kidney Disease 2-like 1 Channel Opening Acts as a Spike Generator in Cerebrospinal Fluid-Contacting Neurons of Adult Mouse Brainstem”. Neuropharmacology 101 (février): 549‑65. https://doi.org/10.1016/j.neuropharm.2015.07.030.
Orts-Del’immagine, Adeline, Nicolas Wanaverbecq, Catherine Tardivel, Vanessa Tillement, Michel Dallaporta, et Jérôme Trouslard. 2012. “Properties of Subependymal Cerebrospinal Fluid Contacting Neurones in the Dorsal Vagal Complex of the Mouse Brainstem”. The Journal of Physiology 590 (16): 3719‑41. https://doi.org/10.1113/jphysiol.2012.227959.
Prendergast, Andrew E., Kin Ki Jim, Hugo Marnas, et al. 2023. “CSF-Contacting Neurons Respond to Streptococcus Pneumoniae and Promote Host Survival during Central Nervous System Infection”. Current Biology 33 (5): 940-956.e10. https://doi.org/10.1016/j.cub.2023.01.039.
Trigo, Federico F., George Papageorgiou, John E. T. Corrie, et David Ogden. 2009. “Laser photolysis of DPNI-GABA, a tool for investigating the properties and distribution of GABA receptors and for silencing neurons in situ”. Journal of Neuroscience Methods 181 (2): 159‑69. https://doi.org/10.1016/j.jneumeth.2009.04.022.
Reviewer #2 (Public review):
This manuscript by Hisler, Rees, and colleagues examines the cardiac regenerative ability of two livebearer species, the platyfish and swordtail. Unlike zebrafish, these species lack cortical myocardium and coronary vasculature. Cryoinjury to their hearts caused persistent scarring at 60 and 90 days post-injury and prevented most of the myocardium from regenerating. Although the wound size progressively shrinks and fibronectin content decreases, the myocardial wall does not recover. Transcriptomic profiling at 7 dpi revealed significant differences between zebrafish and platyfish, including alterations in ECM deposition, immune regulation, and signaling pathways involved in regeneration, such as TGFβ, mTOR, and Erbb2. Platyfish exhibit a delayed but chronic immune response, and although some cardiomyocyte proliferation is observed, it does not appear to contribute to myocardial recovery significantly.
Overall, this is an excellent manuscript that tackles a crucial question: do different fish lineages have the ability to regenerate hearts, or is this capability limited to a few groups? Therefore, this work is relevant to the fields of cardiac regeneration and comparative regenerative biology for a broad audience. I am very enthusiastic about expanding the list of species tested for their heart regeneration abilities, and this study is detailed and rigorous, providing a solid foundation for future comparative research. However, there are several aspects where additional work could significantly strengthen the manuscript.
Major comments
(1) Title selection
The title the authors chose suggests that platyfish and swordtails "partially regenerate," but I do wonder how much these animals truly regenerate. This may be a semantic discussion and a matter of personal preference. Still, based on other significant work on regenerative capacity (see, for example, the landmark cavefish regeneration paper PMID: 30462998 or work on medaka PMID: 24947076), the persistence of such a prominent fibrotic scar would be considered a minimal regenerative capacity. Measuring this "partial regeneration" more precisely by comparing zebrafish with platyfish and swordtails would also greatly strengthen the comparisons made here - see below.
The same can be said about line 152-153 - do these hearts "regenerate" with deformation and partial scarring, or would it be more fair to say that they are "healed" or "repaired" with a process that involves fibrosis?
(2) Cross-species comparisons
Having two species of livebearers strengthens the findings of this paper, but the presentation of results from both species is inconsistent. For example, the reader should not be asked to assume that the architecture of the swordtail ventricle is similar to that of the platyfish (line 125). The same applies to the presence or absence of coronary vessels (Figure 1), the reduction in wound area over time (Figure 3), and the immune system's response (Figure 5). Most importantly, the authors miss an opportunity to move from qualitative observations to quantifying the "partial regeneration" phenotype they observe. Specifically, providing a side-by-side comparison between these new species and zebrafish would help define the extent of differences in regeneration potential. For instance, in Figure 6, while the authors provide excellent quantification of PCNA staining in platyfish, these data are less meaningful without a direct comparison with zebrafish results. The same applies to Figures 6E and 6F - although differences are noted, quantifying these results would enable a more rigorous assessment of the process.
(3) Lack of coronary vasculature
There is a growing body of evidence highlighting the importance of the coronary vessels during zebrafish heart regeneration (PMIDs: 27647901, 31743664). Surprisingly, this finding has not been integrated or discussed in the context of this literature.
The results of the alkaline phosphatase assay and anti-podocalyxin-2 staining appear inconsistent. Specifically, in Supplementary Figure 1L-M, we can see some vessels covering the bulbus arteriosus and also what appears to be a signal in the ventricle. However, in Figures 1 K and 1L, we cannot see any vessels, even in the bulbus. The authors should also be more rigorous and add a description of how many animals were analyzed, their ages, and sizes. In zebrafish, the formation of the coronary arteries appears to depend on animal size and age. With the data provided, we cannot say whether this is a one-time observation or a consistent finding across many animals at different ages and across both species.
The link between livebearers' responses and pseudoaneurysms is overstated. This work is already extremely relevant without trying to make it medically oriented.
Author response:
Reviewer #1:
Minor Weaknesses:
"Transcriptomic analysis was only done for one time point. Different time points could be included to validate whether some processes occur at different time points. But this can be done in the future for more detailed studies."
Our response regarding time points of transcriptomic analysis:
We appreciate this constructive suggestion. We fully agree that performing RNA-seq at multiple time points would provide valuable insights into the temporal dynamics of molecular pathways during cardiac regeneration. However, given that our study represents the first comprehensive characterization of cardiac regeneration in poeciliids, we deliberately focused our resources on establishing the foundational framework, including morphological, cellular, and initial transcriptomic analyses between zebrafish and platyfish. Expanding to multiple time points would constitute a substantial additional study that, while scientifically valuable, would extend beyond the scope of this initial characterization.
We will acknowledge this limitation in the Discussion and indicate that temporal transcriptomic profiling is an important direction for future investigation.
Reviewer #2:
(1) Title selection
Our response regarding the use of the term “partially regenerate” in the title and results:
We thank Reviewer 2 for this important point regarding the terminology used to describe the cardiac response in platyfish and swordtails. We agree that the term "partially regenerate" may overstate the regenerative capacity of these species, particularly given the persistence of a substantial collagenous scar at the injury site. The reviewer is correct that, based on established criteria in the field, including the landmark studies on cavefish (PMID: 30462998) and medaka (PMID: 24947076), the presence of such prominent fibrotic scarring would be more appropriately characterized as limited or minimal regenerative capacity rather than partial regeneration.
While we observe a significant reduction in wound volume at 30 dpci and some degree of tissue remodeling, we acknowledge that the persistent scarring and incomplete myocardial recovery more accurately reflect a healing or repair process rather than true regeneration. We therefore agree with the reviewer's suggestion to revise our terminology throughout the manuscript.
We will revise the title to: "The livebearers platyfish and swordtails heal their hearts with persistent scarring." We will also modify other relevant sections of the Results and Discussion to consistently describe these processes as "healing" or "repair" rather than "regeneration", while still acknowledging the biological changes that do occur (wound contraction, remodeling, limited cardiomyocyte proliferation). This revised framing better aligns our work with the established terminology in the comparative cardiac regeneration literature and more accurately represents the phenotype we observe.
We believe this change will strengthen the manuscript by providing a more precise characterization of the cardiac response in these species and facilitating clearer comparisons with other model systems.
(2) Cross-species comparisons
Our response regarding the inconsistent presentation of results for different species:
We thank the reviewer for recognizing that our conclusions regarding the regenerative capacity of livebearers are strengthened by including two poeciliid species, platyfish and swordtails. We agree that presenting results more consistently across both species will significantly improve the manuscript. We acknowledge that our current presentation creates a burden on the reader by asking them to assume similarities between species without providing supporting data. While we initially focused primarily on platyfish due to its superior genome annotation (critical for our transcriptomic analyses), we recognize that this approach left important gaps in the manuscript.
We will address this by generating comprehensive supplementary figures that present swordtail data alongside platyfish for key findings. Specifically, we will add a complete anatomical characterization of swordtail ventricle architecture, demonstrating the structural similarities to platyfish that underpin our comparative conclusions. We will also perform quantification of wound area reduction and immune response dynamics over time in swordtails, allowing direct comparison between species.
We clarify that we did perform detailed analyses of swordtail heart anatomy during our initial studies, which revealed remarkable similarity to platyfish. However, space constraints in Figures 1 and S1 (which already span full pages with zebrafish-platyfish comparisons) prevented us from including these data in the original submission. We now recognize that explicitly presenting these data is essential for the reader to evaluate our conclusions.
Our response regarding quantification and comparison with zebrafish:
We appreciate the reviewer's suggestion to move beyond qualitative observations toward rigorous quantification of the "partial regeneration" phenotype. As suggested by the reviewer for the PCNA analysis, we will provide direct quantitative comparisons with published zebrafish regeneration studies, including data from several relevant studies and our own lab's work. This comparison will delineate the extent of differences in proliferative response between complete regenerators (zebrafish) and limitted regenerators (poeciliids).
These additions will transform our descriptive observations into quantitative assessments that rigorously define the incomplete healing phenotype in poeciliids relative to complete regeneration in zebrafish. We believe these changes will substantially strengthen the manuscript and address the reviewer's concerns about comparative rigor.
(3) Lack of coronary vasculature
Our response regarding inconsistencies in vascularization data:
We thank the reviewer for his/her comment regarding our data on the absence of coronary vasculature in the platyfish heart. The reviewer noted differences between alkaline phosphatase (AP) enzymatic staining and anti-Podocalyxin-2 immunofluorescence staining. We would like to clarify that these observed differences are not inconsistencies but rather reflect the distinct specificities of these two complementary approaches.
Alkaline phosphatase staining is selective for arterial branches and capillaries in the heart (PMID: 13982613; PMID: 9477306; PMID: 8245430; PMID: 3562789; PMID: 29023576; PMID: 28632131) and revealed a typical vascular pattern in the bulbus arteriosus and ventricle in zebrafish but not in platyfish. Anti-Podocalyxin-2 staining displayed a vessel-like pattern in zebrafish but not in platyfish. However, in both species Podocalyxin staining also labeled other types of non-vascular structures. This is expected given that Podocalyxin is a cell surface sialomucin with broader expression beyond blood vessels, including the endocardium (PMID: 19142011) and certain neuronal populations, in addition to other non-cardiac tissue types (PMID: 19578008; PMID: 3511072; PMID: 34201212).
We will revise the manuscript to emphasize this distinction and clarify our rationale: we deliberately employed Podocalyxin-2 staining as a complementary, less selective approach to corroborate our alkaline phosphatase findings. In platyfish, the convergent evidence from both methods (the absence of typical vascular structures with a selective AP staining and the detection of only non-vascular patterns with the broader marker Podocalyxin-2) strengthens our conclusion that platyfish hearts lack a conventional coronary vascular network.
Our response regarding reproducibility:
The assays were performed independently by two researchers at different stages of the study using two different batches of adult platyfish. The results were consistent in both assays, and we are therefore confident in the reproducibility of our findings.
Our response regarding citations of references on revascularization:
We thank the reviewer for recommending the studies PMID: 27647901 and PMID: 31743664 that revealed the importance of rapid revascularization during heart regeneration in zebrafish. We will be pleased to integrate these works to present our data in the appropriate context of current knowledge.
Our response regarding a link to pseudoaneurysms:
We appreciate the reviewer's feedback regarding the link to pseudoaneurysm. We agree that the primary contributions of our work stand on their own merit, and we will revise the text to present the livebearer findings more cautiously without overstating their potential medical relevance. We will focus on the intrinsic biological significance of our findings.
Author response:
The following is the authors’ response to the previous reviews.
Public Reviews:
Reviewer #1 (Public review)
Mitochondrial staining difference is convincing, but the status of the mitochondria, fused vs fragmented, elongated vs spherical, does not seem convincing. Given the density of mito staining in CySC, it is difficult to tell whether what is an elongated or fused mito vs the overlap of several smaller mitos.
To address this, we have now removed the statements regarding the differences in the shape of mitochondria among the stem cell population. We have limited our statements to stating that the CySCs are more mitochondria dense compared to the neighbouring GSCs.
The quantification and conclusions about the gstD1 staining in CySC vs. GSCs is just not convincing-I cannot see how they were able to distinguish the relevant signals to quantify once cell type vs the other.
We appreciate the reviewer’s concern. To address this, we have included new images along with z-stack reconstructions (Fig 1G-P and S1C-D’’’), which now provide clearer distinction of gstD1 staining between CySCs and GSCs and improve the accuracy of quantification. The intensity of gstD1 staining overlapping with that of Vasa+ zone has been quantified as ROS levels for GSCs. Similarly, the cytoplasmic area of gstD1 stain bounded by Dlg and Tj+ nuclei was quantified as ROS levels for CySCs.
Images do not appear to show reduced number of GSCs, but if they counted GSCs at the niche, then that is the correct way to do it, but its odd that they chose images that do not show the phenotype. Further, their conclusion of reduced germline overall, e.g by vasa staining, does not appear to be true in the images they present and their indication that lower vasa equals fewer GSCs is invalid since all the early germline expresses Vasa.
We have replaced the figure with images where the GSC rosette is clearly visible, ensuring that the counted GSCs at the niche accurately reflect the phenotype (Fig. 2 C’’, D’’). We agree that Vasa is expressed in all early germline cells. The overall reduced Vasa signal intensity in our western blot analysis for Sod1RNAi reflects a general reduction in the germline population, not just the GSCs. We have modified our statements in the Results appropriately.
However, the effect on germline differentiation is less clear-the images shown do not really demonstrate any change in BAM expression that I can tell, which is even more confusing given the clear effect on cyst cell differentiation.
We appreciate the reviewer’s observation. To clarify this point, we have now included z-stack projection images of Bam expression in the revised version (Fig 3E’’-F’’) .
These images more clearly demonstrate the difference in Bam expression, thereby highlighting the effect on germline differentiation. Moreover, Bam expressing cells are present more closure to hub in Sod1RNAi condition, indicating early differentiation.
For the last figure, any effect of SOD OE in the germline on the germline itself is apparently very subtle and is within the range observed between different "wt" genetic backgrounds.
We acknowledge that the effect of SOD overexpression on the germline is not very significant. The germline cells already possess a modest ROS load and it is a well-established fact that they possess a robust anti-oxidant defence machinery in order to protect the genome. Therefore, elevating the levels of antioxidant enzymes such as Sod1 does not translate into a major change and the effect observed are generally subtle.
Reviewer #3 (Public review)
In Fig. 1N (tj-SODi), one can see that all of gst-GFP resides within the differentiating somatic cells and none is in the germ cells. Furthermore, the information provided in the materials and methods about quantification of gst-GFP is not sufficient. Focusing on Dlg staining is not sufficient. They need to quantify the overlap of Vasa (a cytoplasmic protein in GSCs) with GFP.
In our analysis, we have indeed quantified the GFP intensity in area of overlap between gstD1-GFP and Vasa-positive zone in the germ cells which are in direct contact with hub, in order to accurately quantify the ROS reporter signal within the germline compartment. Further, to ensure accurate cell boundary demarcation, we used Dlg staining as an additional parameter. While Dlg staining alone was included in the figure panels for clarity of visualization, the actual quantification was performed by considering both Vasa (for germ cells cytoplasm) and Dlg (for cellular boundaries). This has been clarified in the Materials and Methods.
Additionally, since Tj-gal4 is active in hub cells, it is not clear whether the effects of SOD depletion also arise from perturbation of niche cells.
We acknowledge that Tj-Gal4 also shows minimal activity in hub cells. To address this, we had tested C587-Gal4 and observed similar effects on niche architecture, though weaker than with Tj-Gal4, underlying the effect of ROS originating from CySC.
First, the authors are studying a developmental effect, rather than an adult phenotype. Second, the characterization of the somatic lineage is incomplete. It appears that high ROS in the somatic lineage autonomously decreases MAP kinase signaling and increases Hh signaling. They assume that the MAPK signaling is due to changes in Egfr activity but there are other tyrosine kinases active in CySCs, including PVR/VEGFR (PMID: 36400422), that impinge on MAPK. In any event ,their results are puzzling because lower Egfr should reduce CySC self-renewal and CySC number (Amoyel, 2016) and the ability of cyst cells to encapsulate gonialblasts (Lenhart Dev Cell 2015). The increased Hh should increase CySC number and the ability of CySCs to outcompete GSCs. The fact that the average total number of GSCs declines in tj>SODi testes suggests that high ROS CySCs are indeed outcompeting GSCs. However, as I wrote in myfirst critique, the characterization of the high ROS soma is incomplete. And the role of high ROS in the hub cells is acknowledged but not investigated.
We acknowledge the reviewer’s concern that our study primarily examines a developmental effect. Our rationale was that redox imbalance during early stages can set longterm trajectories for stem cell behavior and niche organization, which ultimately manifest in adult testes.
We agree that sole evaluation of Erk levels may not reflect the actual status of EGFR signalling and there is an apparent contradictory observation of low Erk and high CySC self-renewal. We believe that this ROS mediated change in Erk status, resulting in high CySC proliferation, might be an outcome of an interplay between other RTKs beyond EGFR. While the expansion of CySCs is primarily governed by Hh, a detailed dissection of these pathways under altered redox environment will be an interesting work to develop in future. Regarding the GSC number, it cannot be definitively stated that high ROS-CySCs are indeed outcompeting the GSCs, but yes, that possibility parallely exists. However, in presence case, there is no denying that the ROS levels of GSCs are indeed high under high CySC-ROS condition. It is known that ROS imbalance in GSCs promote their differentiation which was also observed in the present study through Bam staining. Therefore, redox mediated reduction in GSC number cannot be completely ruled out. We have already discussed these points in the revised manuscript and suggest possible non-canonical effects of ROS on signal integration within CySCs that might reconcile these findings. Further, in the present study, we have focussed on redox interplay between the two stem cell populations (GSC and CySC) of the niche. Hence, we have not covered the redox profiling of the hub in detail.
The paragraph in the introduction (lines 62-76) mentions autonomous ROS levels in stem cells, not the transfer of ROS from one cell to another. And this paragraph is confusing because it starts with the (inaccurate) statement all stem cells have low ROS and then they discuss ISCs, which have high ROS.
We have revised the paragraph for clarity. It now distinguishes between stem cell types with low versus relatively high ROS requirements (e.g., ISCs, HSCs, NSCs) and includes recent evidence of non-autonomous ROS signaling, such as paracrine ROS action from pericardial cells to cardiomyocytes and gap-junction–mediated ROS waves in cardiomyocyte monolayers. This resolves the ambiguity and presents a balanced view of autonomous and nonautonomous ROS regulation.
While there has been an improvement in the scholarship of the testis, there are still places where the correct paper is not cited and issues with the text.
All concerns regarding missing or incorrect citations and textual issues have now been carefully addressed and corrected. Relevant references have been added in the appropriate places to ensure accuracy.
The authors are encouraged to more completely characterize the phenotype of high ROS in hub and CySCs.
We have now included improved images showing the respective ROS profiles GSCs, CySCs and the hub. As mentioned in the earlier response, this work focuses on the redox interplay between GSCs and CySCs hence, we have not included any analysis on hub. However, we agree with reviewer that the hub contributions should also be evaluated as a future direction.
Reviewer #2 (Public review):
Summary:
This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).
Strengths:
This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.
Weaknesses:
(1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.
(2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?
(3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?
(4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.
(5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.
Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.
Strengths:
The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.
Weaknesses:
The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.
In the revised version we will highlight the human relevance of our work in the discussion section. Additionally, data and codes are public on single cell portal and GEO, accession numbers have been updated.
Reviewer #2 (Public review):
Summary:
This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).
Strengths:
This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.
Weaknesses:
(1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.
(2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?
(3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?
(4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.
(5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.
(1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we will qualify our interpretation accordingly in the text. We will also add clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP-BECs were mislabeled as BECs, which led to the confusion.
(2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).
Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data. Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21. TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section.
(3) For Figure 3a and b, we will work to add clarity by providing additional annotations and an additional illustration.
(4) We will include a table listing potential new markers for developing SC and TM populations.
(5) We will annotate the genes associated with DG in Figure S20.
Universal pre-school education to support school readiness before first grade
Universal pre-school education aims to ensure that all children enter Grade 1 with basic language, cognitive, and social readiness. In India, this is feasible by strengthening and integrating the existing Anganwadi system with structured early childhood curricula and teacher training. Global examples from Finland, France, and the UK show that universal early childhood education reduces early learning gaps and improves long-term educational outcomes, especially for disadvantaged children.
Why are Anganwadi reforms challenging? 1. Anganwadis were designed for nutrition & care, not education. Hence, the centers are not properly equipped nor is the staff. And if the early schooling is done in an incorrect manner it can lead to a major damage in child's curiosity
An Anganwadi worker today often:
Now we expect them to: * Teach early literacy * Build number sense * Do classroom management * Track learning progress
Without: * Deep training * Time * Support staff
This isn’t resistance - it’s capacity mismatch.
Early childhood pedagogy is deceptively hard Biggest misconception that teaching children is easy. This involves knowing how child's brain develops, designing play that secretly builds skills, managing attention spans, language scaffolding through language (Language scaffolding through conversation means helping a child develop language step-by-step by talking with them in a guided way, instead of just teaching words or letters directly.)
Coordination problem - Anganwadis operate at the intersection of the women and child development, health, and education systems, but are governed primarily as welfare units rather than educational institutions. As a result, there is no clear ownership or accountability for learning outcomes.
25374
DOI: 10.21203/rs.3.rs-7913970/v1
Resource: RRID:BDSC_25374
Curator: @maulamb
SciCrunch record: RRID:BDSC_25374
addgene_18751
DOI: 10.21203/rs.3.rs-7913970/v1
Resource: RRID:Addgene_18751
Curator: @scibot
SciCrunch record: RRID:Addgene_18751
addgene_26655
DOI: 10.21203/rs.3.rs-7913970/v1
Resource: RRID:Addgene_26655
Curator: @scibot
SciCrunch record: RRID:Addgene_26655
addgene_17448
DOI: 10.21203/rs.3.rs-7913970/v1
Resource: RRID:Addgene_17448
Curator: @scibot
SciCrunch record: RRID:Addgene_17448
RRID:SCR_014842
DOI: 10.21203/rs.3.rs-7913970/v1
Resource: Salk Institute Razavi Newman Integrative Genomics and Bioinformatics Core Facility (IGC) (RRID:SCR_014842)
Curator: @scibot
SciCrunch record: RRID:SCR_014842
Keamanan
Section ini ga perlu ada, karena kesalahan generation
Briefing : Éduquer à l'Orientation, de l'École Primaire à Parcoursup
Ce document synthétise les perspectives et stratégies clés concernant l'éducation à l'orientation dans le système éducatif français, de l'école primaire à l'enseignement supérieur.
Il ressort que l'orientation est un processus continu et complexe qui doit commencer dès le plus jeune âge pour être efficace.
L'enjeu principal est de lutter contre les déterminismes sociaux, géographiques et de genre qui se construisent très tôt.
Les enseignants jouent un rôle central, non pas seulement informatif, mais comme des acteurs majeurs permettant aux élèves d'élargir leurs horizons et de se projeter dans un avenir concret.
• L'orientation est un processus et non un acte ponctuel : Elle ne doit pas se réduire au choix final sur une plateforme comme Parcoursup, mais être un cheminement accompagné tout au long de la scolarité.
• La précocité de l'intervention est essentielle : Les actions menées dès l'école primaire sont fondamentales pour "ouvrir le champ des possibles" avant que les stéréotypes ne se cristallisent.
• La lutte contre les déterminismes est l'objectif suprême : L'école a pour mission de permettre à chaque jeune de s'émanciper de son genre, de son origine sociale ou de son territoire pour construire son propre parcours.
• Une approche collaborative est indispensable : Le succès de l'accompagnement repose sur l'articulation entre les différents acteurs : enseignants, parents, psychologues de l'Éducation nationale (PsyEN), et partenaires externes comme les Régions.
• Le développement de compétences transversales est central : Au-delà de la découverte des métiers, il s'agit de développer chez l'élève la connaissance de soi, l'esprit critique, la capacité à s'informer et à se projeter dans un monde incertain.
L'orientation est un concept polysémique, perçu différemment par chaque acteur impliqué dans le parcours de l'élève. Comprendre ces perspectives est essentiel pour construire un accompagnement cohérent.
• Pour les parents : L'orientation est souvent définie par un lieu d'affectation final. Leur principal intérêt est de voir leur enfant épanoui dans un environnement choisi.
• Pour les enseignants : L'objectif est de garantir la réussite de l'élève dans son parcours. Ils observent les compétences de l'adolescent pour l'orienter vers des formations où il pourra augmenter son potentiel et réussir.
• Pour les spécialistes (PsyEN) : L'accent est mis sur le développement psychocognitif. Ils accompagnent les élèves dans la projection, ce qui implique d'accepter de faire des deuils et de renoncer à certaines options, un processus parfois complexe à l'adolescence.
Ces acteurs, auxquels s'ajoute la Région qui détient la compétence de l'information sur les métiers et les formations, doivent s'articuler pour offrir un soutien complet.
Yoril Baudoin, DRAIO : "Les différents acteurs de l'orientation dans le système éducatif doivent s'articuler pour accompagner le jeûne dans ses projets et dans l'accomplissement de ses rêves."
Le "Plan Avenir", et plus spécifiquement son "plan pluriannuel d'éducation à l'orientation", formalise la nécessité de voir l'orientation comme une trajectoire continue.
• Une vision de parcours : Le plan vise à créer une articulation fluide entre l'école, le collège, le lycée et l'enseignement supérieur. Il met l'accent sur le "parcours" plutôt que sur le "projet" ponctuel.
• L'importance du primaire : L'implication du premier degré, bien que récente, est jugée cruciale.
Estelle Blanchard, directrice d'école, souligne que les élèves du primaire se posent déjà des questions sur leur avenir mais manquent de références.
L'objectif est donc de leur "ouvrir le champ des possibles" dès la maternelle, en découvrant les métiers des parents puis en élargissant progressivement les horizons.
• Le rôle renforcé des enseignants : Le plan rappelle que les enseignants sont les premiers au contact des élèves et ont un impact significatif sur leurs trajectoires.
Leur mission est d'agir activement contre les déterminismes pour permettre à chaque élève de se dire : "Et pourquoi pas moi ?".
Yoril Baudoin, DRAIO : "[Le plan Avenir, c'est] dire aux enseignants vous avez une place très forte pour travailler sur les déterminismes quel qu'ils soient et pour permettre finalement à des jeunes de se dire 'Bah l'école peut me servir d'ascenseur social ou en tout cas de m'extirper d'une assignation si tenté que ce soit mon choix.'"
L'un des objectifs centraux de l'éducation à l'orientation est de corriger les inégalités qui entravent les choix des élèves. Celles-ci sont de plusieurs ordres.
Particulièrement prégnant dans les territoires ruraux et enclavés, il se manifeste par :
• Des choix par défaut : Des élèves aux bons résultats scolaires s'orientent vers des filières professionnelles par dépit, simplement parce que le lycée général le plus proche est trop éloigné.
• La peur de l'inconnu : Les familles et les élèves appréhendent les "grandes villes", ce qui constitue un frein majeur aux projets ambitieux.
• Solutions concrètes mises en place :
◦ Visites de lycées dès le CM1-CM2 : Pour se projeter et découvrir les formations existantes.
◦ Implication des familles : Inviter les parents à participer à ces visites pour "dédramatiser le passage au lycée".
◦ Sorties scolaires stratégiques : Organiser des voyages dans de grandes villes (ex: Lille) en utilisant les transports en commun (train, métro) pour familiariser élèves et parents avec cet environnement.
Ce biais affecte non seulement les élèves mais aussi la perception des éducateurs.
• L'autocensure des accompagnants : Face à un élève de milieu modeste, la tendance peut être de proposer un "projet par petit pas" (ex: Bac Pro puis BTS) en anticipant des difficultés financières, une approche qui ne serait pas adoptée pour un élève de milieu favorisé.
• Solutions proposées :
◦ Informer sur les droits : Communiquer de manière explicite sur les aides financières et les dispositifs de soutien (bourses, services du CROUS) pour lever les freins financiers.
◦ Intégrer les services sociaux dans les temps forts de l'orientation, comme les forums des métiers.
Les stéréotypes de genre sont ancrés dès le plus jeune âge et influencent fortement les projections professionnelles.
• Des projections stéréotypées : En maternelle, les choix sont très genrés ("princesse", "chevalier").
Le travail consiste à déconstruire ces clichés (ex: les princesses peuvent être fortes et se sauver seules).
• Le risque de renforcer les stéréotypes : L'intervention de professionnels peut être contre-productive.
Un chef d'entreprise affirmant que l'industrie a besoin de filles parce qu'elles sont "minutieuses et plus sérieuses" ancre le stéréotype au lieu de le combattre.
• Les limites des "rôles modèles" : Présenter des figures au parcours exceptionnel peut être intimidant et perçu comme inaccessible, au lieu d'inspirer.
L'idéal est de présenter des modèles plus proches, dans une "zone proximale de développement", auxquels les jeunes peuvent s'identifier.
Les témoignages d'élèves révèlent le poids de l'entourage dans les décisions finales.
• Le dilemme de l'écoute : Une élève de terminale explique son tiraillement entre les conseils de ses professeurs (basés sur ses notes) et ceux de ses parents (basés sur sa personnalité et sa gestion du stress).
• Les choix faits "pour" les parents : Un élève raconte avoir choisi un lycée général "pour sa mère", alors qu'il aspirait à une filière professionnelle manuelle.
Ce cheminement, bien que contraint, lui a permis de mûrir son projet et de mieux l'argumenter par la suite.
Éduquer à l'orientation ne se limite pas à informer, mais vise à équiper les élèves d'un ensemble de compétences pour naviguer leur parcours.
| Stratégie | Description | Objectifs | | --- | --- | --- | | Le Référentiel de Compétences | Cadre officiel pour le collège et le lycée qui balise les compétences à développer (ex: s'informer, se projeter). | Intégrer l'orientation dans chaque discipline et non comme une activité annexe. Développer l'esprit critique, notamment face à l'information en ligne. | | L'Introspection et Connaissance de Soi | Travail sur les compétences psychosociales : estime de soi, confiance, savoir-être, apprendre à apprendre. | Permettre à l'élève de connaître ses forces, ses faiblesses et de prendre conscience de ses capacités via des outils comme l'auto-évaluation. | | L'Immersion et l'Expérience | Mise en situation concrète : journées portes ouvertes dans les lycées, stages, "classe en entreprise" (une classe délocalisée une fois par semaine dans une entreprise locale). | Rendre l'apprentissage concret en "passant par le geste". Faciliter la projection et démystifier le monde professionnel. L'interaction entre pairs est un levier puissant. | | La Gestion des "Grands Rêves" | Accompagner les élèves qui aspirent à des carrières très sélectives (footballeur, influenceur) sans briser leurs ambitions. | Le rêve est un moteur essentiel. Le rôle de l'éducateur est d'aider l'élève à développer les compétences nécessaires et la réflexivité pour, si besoin, réajuster son projet. | | L'Intelligence Artificielle (IA) | Utilisation d'outils comme les chatbots (ex: Orian) pour fournir de l'information. | Le risque est de court-circuiter le cheminement humain. L'opportunité est d'utiliser l'IA pour la recherche documentaire afin de libérer du temps pour l'accompagnement humain qualitatif. |
Le témoignage d'une élève sur l'anxiété générée par les choix finaux sur Parcoursup illustre l'échec d'une approche tardive.
Analyse de Yoril Baudoin : "Ce qu'elle nous décrit là c'est à un moment donné l'orientation se réduit à un acte administratif où il faut faire un choix et là d'un seul coup on n'est pas prêt [...] alors que nous souhaitons travailler collégialement pour montrer que c'est un cheminement."
Reviewer #3 (Public review):
Summary:
This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in a pre-trained model).
Strengths:
The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.
Weaknesses:
The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight. The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.
Reviewer #3 (Public review):
In this manuscript, Moss et al. demonstrate that Hsp70 phosphorylation at a conserved threonine residue integrates DNA damage responses with cell-cycle control. The authors present unbiased biochemical, cell-based, and yeast genetic analyses showing that phosphorylation of human Hsp70 at T495 (and the analogous Ssa1 T492 in yeast) is triggered by base-excision-repair intermediates and downstream DDR kinase activity, leading to delayed G1/S progression after DNA damage. They used orthogonal approaches such as ATPase assays, phospho-specific detection, kinase-inhibition studies, synchronization experiments, and phenotypic analyses of phosphomutants. They presented robust data that collectively supported the conclusion that dynamic Hsp70 phosphorylation functions as a conserved "molecular brake" to prevent inappropriate S-phase entry under genotoxic stress. However, there are a few minor questions and clarifications that the authors are well-positioned to address.
Reviewer #3 (Public review):
Summary:
This study evaluates the contributions of the mammalian PG-binding protein PGLYRP1 to Bordetella infection. The authors find potential roles for PGLYRP1 in both bacterial killing (canonical) and regulation of inflammation (non-canonical). While these are interesting findings and the idea that PG fragment release has differential impacts on infection depending on fragment structure, the study is limited by the lack of connection between the in vivo and in vitro experiments, and determining the precise mechanism of how PGLYRP1 regulates host responses and bacterial fitness during infection requires further study.
Strengths:
(1) The combination of scRNAseq with in vitro and in vivo assays provides complementary views of PGLYRP1 function during infection.
(2) The use of TCT-deficient B. pertussis provides a useful control and perturbation in the in vitro assays.
Weaknesses:
(1) The study does not ultimately resolve the initial early versus late phenotype divergence. While the in vitro assays suggest explanations for their in vivo observations, further mechanistic links are lacking and necessary for the author's conclusions throughout. To state one example, what is the early and late infection phenotype of TCT- Bp in mice lacking PGLYRP1? RNAseq data are reported from these mice, but there are no burden or pathology studies. Furthermore, what are the neutrophil phenotypes (NOD-1/TREM-1 activation) in vivo? And are they dependent on PGLYRP1 and/or TCT?
(2) It is unclear whether or how the NOD1 and TREM-1 pathways interact.
(3) Many of the study's conclusions rely on the use of HEK293 reporter lines in the absence of bacterial infection, which may not be physiologically representative.
(4) The methods lack detail overall, and the experimental procedures should be described more concretely, especially for the scRNAseq datasets.
Reviewer #3 (Public review):
The paper is well written and well presented. The topic is important, and its significance is explained succinctly and accurately. I am only capable of reviewing the clinical aspects of this work, which is very largely technical in nature. Several clinical points are worth considering:
(1) Tendons typically display large magic angle effects as a result of their highly ordered collagen structure (cortical bone much less so), and so it would have been of interest to know what orientation the tendons had to B 0 (in vitro and in vivo). This could affect the signal level at the longer echo time and thus the signal on the subtracted images.
(2) The in vivo transverse image looks about mid-forearm, where tendons are not prominent. A transverse image of the lower forearm, where there is an abundance of tendons, might have been preferable.
(3) The in vivo images show the interosseous membrane as a high signal on both the shorter and longer TE images. The structure contains ordered collagen with fibres at different oblique angles to the radius and ulnar, and thus potentially to B 0. Collagen fibres may have been at an orientation towards the magic angle, and this may account for the high signal on the longer TE image and the low signal on the subtracted image.
(4) Some of the signals attributed to the muscle may be from an attachment of the muscle to the aponeurosis.
(5) There is significant collagen in subcutaneous tissues, so the designation "skin" may more correctly be "skin and subcutaneous tissue".
(6) Cortical bone is very heterogeneous, with boundaries between hard bone and soft tissue with significant susceptibility differences between the two across a small distance. This might be another mechanism for ultrashort T 2 * tissue values in addition to the presence of collagen. The two effects might be distinguished by also including a longer TE spin echo acquisition.
Solid cortical bone may also have an ultrashort T 2 * in its own right.
(7) It may be worth noting that in disease T 2 * may be increased. As a result, the subtraction image may make abnormal tissue less obvious than normal tissue. Magic angle effects may also produce this appearance.
(8) It may be worth distinguishing fibrous connective tissue (loose or dense), which may be normal or abnormal, from fibrosis, which is an abnormal accumulation of fibrous connective tissue in damaged tissue. Fibrosis typically has a longer T 2 initially and decreases its T 2 * over time. In places, the context suggests that fibrous connective tissue may be more appropriate than fibrosis.
Overall, the paper appears very well constructed and describes thoughtful and important work.
Reviewer #1 (Public review):
Summary:
This manuscript analyzes a large dataset of [NiFe]-CODHs with a focus on genomic context and operon organization. Beyond earlier phylogenetic and biochemical studies, it addresses CODH-HCP co-occurrence, clade-specific gene neighborhoods, and operon-level variation, offering new perspectives on functional diversification and adaptation.
Strengths:
The study has a valuable approach.
Weaknesses:
Several points should be addressed.
(1) The rationale for excluding clades G and H should be clarified. Inoue et al. (Extremophiles 26:9, 2022) defined [NiFe]-CODH phylogenetic clades A-H. In the present manuscript, clades A-H are depicted, yet the analyses and discussion focus only on clades A-F. If clades G and H were deliberately excluded (e.g., due to limited sequence data or lack of biochemical evidence), the rationale should be clearly stated. Providing even a brief explanation of their status or the reason for omission would help readers understand the scope and limitations of the study. In addition, although Figure 1 shows clades A-H and cites Inoue et al. (2022), the manuscript does not explicitly state how these clades are defined. An explicit acknowledgement of the clade framework would improve clarity and ensure that readers fully understand the basis for subsequent analyses.
(2) The co-occurrence data would benefit from clearer presentation in the supplementary material. At present, the supplementary data largely consist of raw values, making interpretation difficult. For example, in Figure 3b, the co-occurrence frequencies are hard to reconcile with the text: clade A shows no co-occurrence with clade B and even lower tendencies than clades E or F, while clade E appears relatively high. Similarly, the claim that clades C and D "more often co-occur, especially with A, E, and F" does not align with the numerical trends, where D and E show stronger co-occurrence but C does not. A concise, well-organized summary table would greatly improve clarity and prevent such misunderstandings.
(3) The rationale for analyzing gene neighborhoods at the single-operon level needs clarification. Many microorganisms encode more than one CODH operon, yet the analysis was carried out at the level of individual operons. The authors should clarify the biological rationale for this choice and discuss how focusing on single operons rather than considering the full complement per organism might affect the interpretation of genomic context.
when you join the self care studio
Can you turn this section into a preview of what's ahead? As soon as you join... after a month... after 3 months
Post #200 – The Corona 3 tripod<br /> by [[Scott K]] in The Filthy Platen on 2014-12-21<br /> accessed on 2025-12-30T15:29:02
New in the collection, pt. 3: Canon PW-10/15/30<br /> by [[Marcin Wichary]] in Shift Happens newsletter<br /> accessed on 2025-12-30T15:23:12
Client Portal | React, Leaflet, MUI, Express.js, MySQL, Auth0, AWS, H3 Sept. 2025 – Dec. 2025
i do like the way you have the project broken down into sub-bullets, but i'd say take maybe the top 2-3 that had the most impact/sound most impressive and leave the rest for interview talking points. if you have any metrics you could throw in here for impact like time/money saved, that would be great too but no need to force it if you can't think of anything
We mutated these lysine residues to glutamines, and expressed the mutants (Arf1/6PAML-3KQ and Arf1/6IQ-4KQ) in mammalian cells.
This is the first section of the text where '3KQ' and '4KQ' are explained, both in what they mean and why the mutations were carried out. However, these first come up in Figure 3. For that reason, I would move up the mention of these definitions in the text, or add it into the figure description.
The key participants in the MCP architecture are: MCP Host: The AI application that coordinates and manages one or multiple MCP clients MCP Client: A component that maintains a connection to an MCP server and obtains context from an MCP server for the MCP host to use MCP Server: A program that provides context to MCP clients
The MCP architecture has 3 pieces The host (application, AI or not, that coords the interaction with MCP clients), an MCP client that interacts with a single server. MCP server, which provides the context, i.e. abstracts the access to other sources (filesystem, database, API etc). A server can have one or multiple clients it serves.
A comparison between VS Code and Obsidian. Doesn't state the obvious: any text editor can do this. The tools are just viewers and do not contain the data, which is part of your filesystem. Vgl [[3 Distributed Eigenschappen 20180703150724]]
Reviewer #1 (Public review):
Summary:
This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC)-a sensory processing area-but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride.The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.
Strengths:
(1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.
(2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.
(3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time.
Weaknesses:
(1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age. Authors pointed out that a similar sample size has been used in previous iEEG studies, but the cited works focus on adults and do not look at the developmental perspectives. Similar work looking at developmental changes in iEEG signals usually includes many more subjects (e.g., n = 101 children from Cross ZR et al., Nature Human Behavior, 2025) to account for inter-subject variabilities.
(2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, making the conclusion regarding the different developmental changes between DLPFC and pSTC hard to interpret (related to point 3 below). It is understood that it is rare to have such iEEG data collected in this age group, and the electrode location is only determined by clinical needs. However, the scientific rigor should not be compromised by the limited data access. It's the authors' decision whether such an approach is valid and appropriate to address the scientific questions, here the developmental changes in the brain, given all the advantages and constraints of the data modality.
(3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories. Also, see comments in point 2.
(4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing. Agree that this might be beyond the scope of this paper, but a discussion section might be insightful.
(5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A non-facial music block that could have served as a control was available but not analyzed. The validation of AI model's emotional output needs to be tested. It is understood that we cannot collect these behavioral data retrospectively within the recorded subjects. Maybe potential post-hoc experiments and analyses could be done, e.g., collect behavioral, emotional perception data from age-matched healthy subjects.
(6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses. At least some behavioral measures between the patient population and the healthy groups should be done to ensure the perception of emotions is similar.
(7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were downsampled and averaged in 500-ms windows. It seems like the authors are trying to compromise the iEEG data analyses to match up with the AI's output resolution, which is 2Hz. It is not clear then why not directly use fMRI, which is non-invasive and seems to meet the needs here already. The advantages of using iEEG in this study are missing here.
(8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants attended to. Related to point 5 as well.
Comments on revisions:
A behavioral measurement will help address a lot of these questions. If the data continues collecting, additional subjects with iEEG recording and also behavioral measurements would be valuable.
Reviewer #2 (Public review):
Summary:
In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shift from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.
Strengths:
(1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.
(2 ) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.
(3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.
(4) Feature-based analysis: The authors employ AI-based tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.
Weaknesses:
(1) While the authors leverage Hume AI, a tool pre-trained on a large dataset, its specific performance on the stimuli used in this study remains unverified. To strengthen the foundation of the analysis, it would be important to confirm that Hume AI's emotional classifications align with human perception for these particular videos. A straightforward way to address this would be to recruit human raters to evaluate the emotional content of the stimuli and compare their ratings to the model's outputs.
(2) Although the study includes data from four children with pSTC coverage-an increase from the initial submission-the sample size remains modest compared to recent iEEG studies in the field.
(3) The "post-childhood" group (ages 13-55) conflates several distinct neurodevelopmental periods, including adolescence, young adulthood, and middle adulthood. As a finer age stratification is likely not feasible with the current sample size, I would suggest authors temper their developmental conclusions.
(4) The analysis of DLPFC-pSTC directional connectivity would be significantly strengthened by modeling it as a continuous function of age across all participants, rather than relying on an unbalanced comparison between a single child and a (N=7) post-childhood group. This continuous approach would provide a more powerful and nuanced view of the developmental trajectory. I would also suggest including the result in the main text.
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This study examines a valuable question regarding the developmental trajectory of neural mechanisms supporting facial expression processing. Leveraging a rare intracranial EEG (iEEG) dataset including both children and adults, the authors reported that facial expression recognition mainly engaged the posterior superior temporal cortex (pSTC) among children, while both pSTC and the prefrontal cortex were engaged among adults. However, the sample size is relatively small, with analyses appearing incomplete to fully support the primary claims.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study investigates how the brain processes facial expressions across development by analyzing intracranial EEG (iEEG) data from children (ages 5-10) and post-childhood individuals (ages 13-55). The researchers used a short film containing emotional facial expressions and applied AI-based models to decode brain responses to facial emotions. They found that in children, facial emotion information is represented primarily in the posterior superior temporal cortex (pSTC) - a sensory processing area - but not in the dorsolateral prefrontal cortex (DLPFC), which is involved in higher-level social cognition. In contrast, post-childhood individuals showed emotion encoding in both regions. Importantly, the complexity of emotions encoded in the pSTC increased with age, particularly for socially nuanced emotions like embarrassment, guilt, and pride. The authors claim that these findings suggest that emotion recognition matures through increasing involvement of the prefrontal cortex, supporting a developmental trajectory where top-down modulation enhances understanding of complex emotions as children grow older.
Strengths:
(1) The inclusion of pediatric iEEG makes this study uniquely positioned to offer high-resolution temporal and spatial insights into neural development compared to non-invasive approaches, e.g., fMRI, scalp EEG, etc.
(2) Using a naturalistic film paradigm enhances ecological validity compared to static image tasks often used in emotion studies.
(3) The idea of using state-of-the-art AI models to extract facial emotion features allows for high-dimensional and dynamic emotion labeling in real time
Weaknesses:
(1) The study has notable limitations that constrain the generalizability and depth of its conclusions. The sample size was very small, with only nine children included and just two having sufficient electrode coverage in the posterior superior temporal cortex (pSTC), which weakens the reliability and statistical power of the findings, especially for analyses involving age
We appreciated the reviewer’s point regarding the constrained sample size.
As an invasive method, iEEG recordings can only be obtained from patients undergoing electrode implantation for clinical purposes. Thus, iEEG data from young children are extremely rare, and rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our main conclusions. Specifically, 8 children (53 recording contacts in total) and 13 control participants (99 recording contacts in total) with electrode coverage in the DLPFC are included in our DLPFC analysis. This sample size is comparable to other iEEG studies with similar experiment designs [1-3].
For pSTC, we returned to the data set and found another two children who had pSTC coverage. After involving these children’s data, the group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Notably, the two new children’s (S33 and S49) responses were highly consistent with our previous observations. Moreover, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).
(1) Zheng, J. et al. Multiplexing of Theta and Alpha Rhythms in the Amygdala-Hippocampal Circuit Supports Pafern Separation of Emotional Information. Neuron 102, 887-898.e5 (2019).
(2) Diamond, J. M. et al. Focal seizures induce spatiotemporally organized spiking activity in the human cortex. Nat. Commun. 15, 7075 (2024).
(3) Schrouff, J. et al. Fast temporal dynamics and causal relevance of face processing in the human temporal cortex. Nat. Commun. 11, 656 (2020).
(2) Electrode coverage was also uneven across brain regions, with not all participants having electrodes in both the dorsolateral prefrontal cortex (DLPFC) and pSTC, and most coverage limited to the left hemisphere-hindering within-subject comparisons and limiting insights into lateralization.
The electrode coverage in each patient is determined entirely by the clinical needs. Only a few patients have electrodes in both DLPFC and pSTC because these two regions are far apart, so it’s rare for a single patient’s suspected seizure network to span such a large territory. However, it does not affect our results, as most iEEG studies combine data from multiple patients to achieve sufficient electrode coverage in each target brain area. As our data are mainly from left hemisphere (due to the clinical needs), this study was not designed to examine whether there is a difference between hemispheres in emotion encoding. Nevertheless, lateralization remains an interesting question that should be addressed in future research, and we have noted this limitation in the Discussion (Page 8, in the last paragraph of the Discussion).
(3) The developmental differences observed were based on cross-sectional comparisons rather than longitudinal data, reducing the ability to draw causal conclusions about developmental trajectories.
In the context of pediatric intracranial EEG, longitudinal data collection is not feasible due to the invasive nature of electrode implantation. We have added this point to the Discussion to acknowledge that while our results reveal robust age-related differences in the cortical encoding of facial emotions, longitudinal studies using non-invasive methods will be essential to directly track developmental trajectories (Page 8, in the last paragraph of Discussion). In addition, we revised our manuscript to avoid emphasis causal conclusions about developmental trajectories in the current study (For example, we use “imply” instead of “suggest” in the fifth paragraph of Discussion).
(4) Moreover, the analysis focused narrowly on DLPFC, neglecting other relevant prefrontal areas such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), which play key roles in emotion and social processing.
We agree that both OFC and ACC are critically involved in emotion and social processing. However, we have no recordings from these areas because ECoG rarely covers the ACC or OFC due to technical constraints. We have noted this limitation in the Discussion(Page 8, in the last paragraph of Discussion). Future follow-up studies using sEEG or non-invasive imaging methods could be used to examine developmental patterns in these regions.
(5) Although the use of a naturalistic film stimulus enhances ecological validity, it comes at the cost of experimental control, with no behavioral confirmation of the emotions perceived by participants and uncertain model validity for complex emotional expressions in children. A nonfacial music block that could have served as a control was available but not analyzed.
The facial emotion features used in our encoding models were extracted by Hume AI models, which were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data[1-2]. Thus, the outputs of Hume AI model reflect what typical facial expressions convey, that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that children’s interpretation of complex emotions may differ from that of adults, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added a paragraph in the Discussion (see Page 8) to explicitly note that our study focused on the encoding of presented emotion.
We appreciated the reviewer’s point regarding the value of non-facial music blocks. However, although there are segments in music condition that have no faces presented, these cannot be used as a control condition to test whether the encoding model’s prediction accuracy in pSTC or DLPFC drops to chance when no facial emotion is present. This is because, in the absence of faces, no extracted emotion features are available to be used for the construction of encoding model (see Author response image 1 below). Thus, we chose to use a different control analysis for the present work. For children’s pSTC, we shuffled facial emotion feature in time to generate a null distribution, which was then used to test the statistical significance of the encoding models (see Methods/Encoding model fitting for details).
(1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).
(2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).
Author response image 1.
Time courses of Hume AI extracted facial expression features for the first block of music condition. Only top 5 facial expressions were shown here to due to space limitation.
(6) Generalizability is further limited by the fact that all participants were neurosurgical patients, potentially with neurological conditions such as epilepsy that may influence brain responses.
We appreciated the reviewer’s point. However, iEEG data can only be obtained from clinical populations (usually epilepsy patients) who have electrodes implantation. Given current knowledge about focal epilepsy and its potential effects on brain activity, researchers believe that epilepsy-affected brains can serve as a reasonable proxy for normal human brains when confounding influences are minimized through rigorous procedures[1]. In our study, we took several steps to ensure data quality: (1) all data segments containing epileptiform discharges were identified and removed at the very beginning of preprocessing, (2) patients were asked to participate the experiment several hours outside the window of seizures. Please see Method for data quality check description (Page 9/ Experimental procedures and iEEG data processing).
(1) Parvizi J, Kastner S. 2018. Promises and limitations of human intracranial electroencephalography. Nat Neurosci 21:474–483. doi:10.1038/s41593-018-0108-2
(7) Additionally, the high temporal resolution of intracranial EEG was not fully utilized, as data were down-sampled and averaged in 500-ms windows.
We agree that one of the major advantages of iEEG is its millisecond-level temporal resolution. In our case, the main reason for down-sampling was that the time series of facial emotion features extracted from the videos had a temporal resolution of 2 Hz, which were used for the modelling neural responses. In naturalistic contexts, facial emotion features do not change on a millisecond timescale, so a 500 ms window is sufficient to capture the relevant dynamics. Another advantage of iEEG is its tolerance to motion, which is excessive in young children (e.g., 5-year-olds). This makes our dataset uniquely valuable, suggesting robust representation in the pSTC but not in the DLPFC in young children. Moreover, since our method framework (Figure 1) does not rely on high temporal resolution method, so it can be transferred to non-invasive modalities such as fMRI, enabling future studies to test these developmental patterns in larger populations.
(8) Finally, the absence of behavioral measures or eye-tracking data makes it difficult to directly link neural activity to emotional understanding or determine which facial features participants afended to.
We appreciated this point. Part of our rationale is presented in our response to (5) for the absence of behavioral measures. Following the same rationale, identifying which facial features participants attended to is not necessary for testing our main hypotheses because our analyses examined responses to the overall emotional content of the faces. However, we agree and recommend future studies use eye-tracking and corresponding behavioral measures in studies of subjective emotional understanding.
Reviewer #2 (Public review):
Summary:
In this paper, Fan et al. aim to characterize how neural representations of facial emotions evolve from childhood to adulthood. Using intracranial EEG recordings from participants aged 5 to 55, the authors assess the encoding of emotional content in high-level cortical regions. They report that while both the posterior superior temporal cortex (pSTC) and dorsolateral prefrontal cortex (DLPFC) are involved in representing facial emotions in older individuals, only the pSTC shows significant encoding in children. Moreover, the encoding of complex emotions in the pSTC appears to strengthen with age. These findings lead the authors to suggest that young children rely more on low-level sensory areas and propose a developmental shiZ from reliance on lower-level sensory areas in early childhood to increased top-down modulation by the prefrontal cortex as individuals mature.
Strengths:
(1) Rare and valuable dataset: The use of intracranial EEG recordings in a developmental sample is highly unusual and provides a unique opportunity to investigate neural dynamics with both high spatial and temporal resolution.
(2) Developmentally relevant design: The broad age range and cross-sectional design are well-suited to explore age-related changes in neural representations.
(3) Ecological validity: The use of naturalistic stimuli (movie clips) increases the ecological relevance of the findings.
(4) Feature-based analysis: The authors employ AIbased tools to extract emotion-related features from naturalistic stimuli, which enables a data-driven approach to decoding neural representations of emotional content. This method allows for a more fine-grained analysis of emotion processing beyond traditional categorical labels.
Weaknesses:
(1) The emotional stimuli included facial expressions embedded in speech or music, making it difficult to isolate neural responses to facial emotion per se from those related to speech content or music-induced emotion.
We thank the reviewer for their raising this important point. We agree that in naturalistic settings, face often co-occur with speech, and that these sources of emotion can overlap. However, background music induced emotions have distinct temporal dynamics which are separable from facial emotion (See the Author response image 2 (A) and (B) below). In addition, face can convey a wide range of emotions (48 categories in Hume AI model), whereas music conveys far fewer (13 categories reported by a recent study [1]). Thus, when using facial emotion feature time series as regressors (with 48 emotion categories and rapid temporal dynamics), the model performance will reflect neural encoding of facial emotion in the music condition, rather than the slower and lower-dimensional emotion from music.
For the speech condition, we acknowledge that it is difficult to fully isolate neural responses to facial emotion from those to speech when the emotional content from faces and speech highly overlaps. However, in our study, (1) the time courses of emotion features from face and voice are still different (Author response image 2 (C) and (D)), (2) our main finding that DLPFC encodes facial expression information in postchildhood individuals but not in young children was found in both speech and music condition (Figure 2B and 2C). In music condition, neural responses to facial emotion are not affected by speech. Thus, we have included the DLPFC results from the music condition in the revised manuscript (Figure 2C), and we acknowledge that this issue should be carefully considered in future studies using videos with speech, as we have indicated in the future directions in the last paragraph of Discussion.
(1) Cowen, A. S., Fang, X., Sauter, D. & Keltner, D. What music makes us feel: At least 13 dimensions organize subjective experiences associated with music across different cultures. Proc Natl Acad Sci USA 117, 1924–1934 (2020).
Author response image 2.
Time courses of the amusement. (A) and (B) Amusement conveyed by face or music in a 30-s music block. Facial emotion features are extracted by Hume AI. For emotion from music, we approximated the amusement time course using a weighted combination of low-level acoustic features (RMS energy, spectral centroid, MFCCs), which capture intensity, brightness, and timbre cues linked to amusement. Notice that music continues when there are no faces presented. (C) and (D) Amusement conveyed by face or voice in a 30-s speech block. From 0 to 5 seconds, a girl is introducing her friend to a stranger. The camera focuses on the friend, who appears nervous, while the girl’s voice sounds cheerful. This mismatch explains why the shapes of the two time series differ at the beginning. Such situations occur frequently in naturalistic movies
(2) While the authors leveraged Hume AI to extract facial expression features from the video stimuli, they did not provide any validation of the tool's accuracy or reliability in the context of their dataset. It remains unclear how well the AI-derived emotion ratings align with human perception, particularly given the complexity and variability of naturalistic stimuli. Without such validation, it is difficult to assess the interpretability and robustness of the decoding results based on these features.
Hume AI models were trained and validated by human intensity ratings of large-scale, experimentally controlled emotional expression data [1-2]. The training process used both manual annotations from human raters and deep neural networks. Over 3000 human raters categorized facial expressions into emotion categories and rated on a 1-100 intensity scale. Thus, the outputs of Hume AI model reflect what typical facial expressions convey (based on how people actually interpret them), that is, the presented facial emotion. Our goal of the present study was to examine how facial emotions presented in the videos are encoded in the human brain at different developmental stages. We agree that the interpretation of facial emotions may be different in individual participants, resulting in different perceived emotion (i.e., the emotion that the observer subjectively interprets). Behavioral ratings are necessary to study the encoding of subjectively perceived emotion, which is a very interesting direction but beyond the scope of the present work. We have added text in the Discussion to explicitly note that our study focused on the encoding of presented emotion (second paragraph in Page 8).
(1) Brooks, J. A. et al. Deep learning reveals what facial expressions mean to people in different cultures. iScience 27, 109175 (2024).
(2) Brooks, J. A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat. Hum. Behav. 7, 240–250 (2023).
(3) Only two children had relevant pSTC coverage, severely limiting the reliability and generalizability of results.
We appreciated this point and agreed with both reviewers who raised it as a significant concern. As described in response to reviewer 1 (comment 1), we have added data from another two children who have pSTC coverage. Group-level analysis using permutation test showed that children’s pSTC significantly encode facial emotion in naturalistic contexts (Figure 3B). Because iEEG data from young children are extremely rare, rapidly increasing the sample size within a few years is not feasible. However, we are confident in the reliability of our conclusion that children’s pSTC can encode facial emotion. First, the two new children’s responses (S33 and S49) from pSTC were highly consistent with our previous observations (see individual data in Figure 3B). Second, the averaged prediction accuracy in children’s pSTC (r<sub>speech</sub>=0.1565) was highly comparable to that in post-childhood group (r<sub>speech</sub>=0.1515).
(4) The rationale for focusing exclusively on high-frequency activity for decoding emotion representations is not provided, nor are results from other frequency bands explored.
We focused on high-frequency broadband (HFB) activity because it is widely considered to reflect the responses of local neuronal populations near the recording electrode, whereas low-frequency oscillations in the theta, alpha, and beta ranges are thought to serve as carrier frequencies for long-range communication across distributed networks[1-2]. Since our study aimed to examine the representation of facial emotion in localized cortical regions (DLPFC and pSTC), HFB activity provides the most direct measure of the relevant neural responses. We have added this rationale to the manuscript (Page 3).
(1) Parvizi, J. & Kastner, S. Promises and limitations of human intracranial electroencephalography. Nat. Neurosci. 21, 474–483 (2018).
(2) Buzsaki, G. Rhythms of the Brain. (Oxford University Press, Oxford, 200ti).
(5) The hypothesis of developmental emergence of top-down prefrontal modulation is not directly tested. No connectivity or co-activation analyses are reported, and the number of participants with simultaneous coverage of pSTC and DLPFC is not specified.
Directional connectivity analysis results were not shown because only one child has simultaneous coverage of pSTC and DLPFC. However, the Granger Causality results from post-childhood group (N=7) clearly showed that the influence in the alpha/beta band from DLPFC to pSTC (top-down) is gradually increased above the onset of face presentation (Author response image 3, below left, plotted in red). By comparison, the influence in the alpha/beta band from pSTC to DLPFC (bottom-up) is gradually decreased after the onset of face presentation (Author response image 3, below left, blue curve). The influence in alpha/beta band from DLPFC to pSTC was significantly increased at 750 and 1250 ms after the face presentation (face vs nonface, paired t-test, Bonferroni corrected P=0.005, 0.006), suggesting an enhanced top-down modulation in the post-childhood group during watching emotional faces. Interestingly, this top-down influence appears very different in the 8-year-old child at 1250 ms after the face presentation (Author response image 3, below left, black curve).
As we cannot draw direct conclusions from the single-subject sample presented here, the top-down hypothesis is introduced only as a possible explanation for our current results. We have removed potentially misleading statements, and we plan to test this hypothesis directly using MEG in the future.
Author response image 3.
Difference of Granger causality indices (face – nonface) in alpha/beta and gamma band for both directions. We identified a series of face onset in the movie that paticipant watched. Each trial was defined as -0.1 to 1.5 s relative to the onset. For the non-face control trials, we used houses, animals and scenes. Granger causality was calculated for 0-0.5 s, 0.5-1 s and 1-1.5 s time window. For the post-childhood group, GC indices were averaged across participants. Error bar is sem.
(6) The "post-childhood" group spans ages 13-55, conflating adolescence, young adulthood, and middle age. Developmental conclusions would benefit from finer age stratification.
We appreciate this insightful comment. Our current sample size does not allow such stratification. But we plan to address this important issue in future MEG studies with larger cohorts.
(7) The so-called "complex emotions" (e.g., embarrassment, pride, guilt, interest) used in the study often require contextual information, such as speech or narrative cues, for accurate interpretation, and are not typically discernible from facial expressions alone. As such, the observed age-related increase in neural encoding of these emotions may reflect not solely the maturation of facial emotion perception, but rather the development of integrative processing that combines facial, linguistic, and contextual cues. This raises the possibility that the reported effects are driven in part by language comprehension or broader social-cognitive integration, rather than by changes in facial expression processing per se.
We agree with this interpretation. Indeed, our results already show that speech influences the encoding of facial emotion in the DLPFC differently in the childhood and post-childhood groups (Figure 2D), suggesting that children’s ability to integrate multiple cues is still developing. Future studies are needed to systematically examine how linguistic cues and prior experiences contribute to the understanding of complex emotions from faces, which we have added to our future directions section (last paragraph in Discussion, Page 8-9 ).
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
In the introduction: "These neuroimaging data imply that social and emotional experiences shape the prefrontal cortex's involvement in processing the emotional meaning of faces throughout development, probably through top-down modulation of early sensory areas." Aren't these supposed to be iEEG data instead of neuroimaging?
Corrected.
Reviewer #2 (Recommendations for the authors):
This manuscript would benefit from several improvements to strengthen the validity and interpretability of the findings:
(1) Increase the sample size, especially for children with pSTC coverage.
We added data from another two children who have pSTC coverage. Please see our response to reviewer 2’s comment 3 and reviewer 1’s comment 1.
(2) Include directional connectivity analyses to test the proposed top-down modulation from DLPFC to pSTC.
Thanks for the suggestion. Please see our response to reviewer 2’s comment 5.
(3) Use controlled stimuli in an additional experiment to separate the effects of facial expression, speech, and music.
This is an excellent point. However, iEEG data collection from children is an exceptionally rare opportunity and typically requires many years, so we are unable to add a controlled-stimulus experiment to the current study. We plan to consider using controlled stimuli to study the processing of complex emotion using non-invasive method in the future. In addition, please see our response to reviewer 2’s comment 1 for a description of how neural responses to facial expression and music are separated in our study.
Vernietigbaarheid
rechtshandeling die geldig is, maar ongeldig kan worden gemaakt door degene die door een gebrek is beschermd - met terugwerkende kracht ongeldig worden verklaard
Nietigheid
dat een rechtshandeling juridisch nooit heeft bestaan - van rechtswege ongeldig
Author response:
The following is the authors’ response to the original reviews.
Reviewer #1:
In this well-written and timely manuscript, Rieger et al. introduce Squidly, a new deep learning framework for catalytic residue prediction. The novelty of the work lies in the aspect of integrating per-residue embeddings from large protein language models (ESM2) with a biology-informed contrastive learning scheme that leverages enzyme class information to rationally mine hard positive/negative pairs. Importantly, the method avoids reliance on the use of predicted 3D structures, enabling scalability, speed, and broad applicability. The authors show that Squidly outperforms existing ML-based tools and even BLAST in certain settings, while an ensemble with BLAST achieves state-of-the-art performance across multiple benchmarks. Additionally, the introduction of the CataloDB benchmark, designed to test generalization at low sequence and structural identity, represents another important contribution of this work.
We thank the reviewer for their constructive and encouraging assessment of the manuscript. We appreciate the recognition of Squidly’s biology-informed contrastive learning framework with ESM2 embeddings, its scalability through the avoidance of predicted 3D structures, and the contribution of the CataloDB benchmark. We are pleased that the reviewer finds these aspects to be of value, and their comments will help us in further clarifying the strengths and scope of the work.
The manuscript acknowledges biases in EC class representation, particularly the enrichment for hydrolases. While CataloDB addresses some of these issues, the strong imbalance across enzyme classes may still limit conclusions about generalization. Could the authors provide per-class performance metrics, especially for underrepresented EC classes?
We thank the reviewer for raising this point. We agree that per-class performance metrics provide important insight into generalizability across underrepresented EC classes. In response, we have updated Figure 3 to include two additional panels: (i) per-EC F1, precision and recall scores, and (ii) a relative display of true positives against the total number of predictable catalytic residues. These additions allow the class imbalance to be more directly interpretable. We have also revised the text between lines 316-321 to better contextualize our generalizability claims in light of these results.
An ablation analysis would be valuable to demonstrate how specific design choices in the algorithm contribute to capturing catalytic residue patterns in enzymes.
We agree an ablation analysis is beneficial to show the benefits of a specific approach. We consider the main design choice in Squidly to be how we select the training pairs, hence we chose a standard design choice for the contrastive learning model. We tested the effect of different pair schemes on performance and report the results in Figure 2A and lines 244258. These results are a targeted ablation in which we evaluate Squidly against AEGAN using the AEGAN training and test datasets, while systematically varying the ESM2 model size and pair-mining scheme. As a baseline, we included the LSTM trained directly on ESM2 embeddings and random pair selection. We showed that indeed the choice of pairs has a large impact on performance, which is significantly improved when compared to naïve pairing. This comparison suggests that performance gains are attributable to reactioninformed pair-mining strategies. We recognize that the way these results were originally presented made this ablation less clear. We have revised the wording in the Results section (lines 244-247) and updated the caption to Figure 2A to emphasize the purpose of this section of the paper.
The statement that users can optionally use uncertainty to filter predictions is promising but underdeveloped. How should predictive entropy values be interpreted in practice? Is there an empirical threshold that separates high- from low-confidence predictions? A demonstration of how uncertainty filtering shifts the trade-off between false positives and false negatives would clarify the practical utility of this feature.
Thank you for the suggestion. Your comment prompted us to consider what is the best way to represent the uncertainty and, additionally, what is the best metric to return to users and how to visualize the results. Based on this, we included several new figures (Figure 3H and Supplementary Figures S3-5). We used these figures to select the cutoffs (mean prediction of 0.6, and variance < 0.225) which were then set as the defaults in Squidly, and used in all subsequent analyses. The effect of these cutoffs is most evident in the tradeoff of precision and recall. Hence users may opt to select their own filters based on the mean prediction and variance across the predictions, and these cutoffs can be passed as command line parameters to Squidly. The choice to use a consistent default cutoff selected using the Uni3175 benchmark has slightly improved the reported performance for the benchmarks seen in table 1, and figure 3C. However, our interpretation remains the same.
The excerpt highlights computational efficiency, reporting substantial runtime improvements (e.g., 108 s vs. 5757 s). However, the comparison lacks details on dataset size, hardware/software environment, and reproducibility conditions. Without these details, the speedup claim is difficult to evaluate. Furthermore, it remains unclear whether the reported efficiency gains come at the expense of predictive performance
Thank you for pointing out this limitation in how we presented the runtime results. We have rerun the tests and updated the table. An additional comment is added underneath, which details the hardware/software environment used to run both tools, as well as that the Squidly model is the ensemble version. As per the relationship between efficiency gains and predictive performance, both 3B and 15B models are benchmarked side by side across the paper.
Compared to the tools we were able to comprehensively benchmark, it does not come at a cost. However, we note that the increased benefits in runtime assume that a structure must be folded, which is not the case for enzymes already present in the PDB. If that is the case, then it is likely already annotated and, in those cases, we recommend using BLAST which is superior in terms of run time than either Squidly or a structure-based tool and highly accurate for homologous or annotated sequences.
Given the well-known biases in public enzyme databases, the dataset is likely enriched for model organisms (e.g., E. coli, yeast, human enzymes) and underrepresents enzymes from archaea, extremophiles, and diverse microbial taxa. Would this limit conclusions about Squidly's generalizability to less-studied lineages?
The enrichment for model organisms in public enzyme databases may indeed affect both ESM2 and Squidly when applied to underrepresented lineages such as archaea, extremophiles, and diverse microbial taxa. We agree that this limitation is significant and have adjusted and expanded the previous discussion of benchmarking limitations accordingly (lines 358, 369). We thank the reviewer for highlighting this issue, which has helped us to improve the transparency and balance of the manuscript.
Reviewer #2:
The authors aim to develop Squidly, a sequence-only catalytic residue prediction method. By combining protein language model (ESM2) embedding with a biologically inspired contrastive learning pairing strategy, they achieve efficient and scalable predictions without relying on three-dimensional structure. Overall, the authors largely achieved their stated objectives, and the results generally support their conclusions. This research has the potential to advance the fields of enzyme functional annotation and protein design, particularly in the context of screening large-scale sequence databases and unstructured data. However, the data and methods are still limited by the biases of current public databases, so the interpretation of predictions requires specific biological context and experimental validation.
Strengths:
The strengths of this work include the innovative methodological incorporation of EC classification information for "reaction-informed" sample pairing, thereby enhancing the discriminative power of contrastive learning. Results demonstrate that Squidly outperforms existing machine learning methods on multiple benchmarks and is significantly faster than structure prediction tools, demonstrating its practicality.
Weaknesses:
Disadvantages include the lack of a systematic evaluation of the impact of each strategy on model performance. Furthermore, some analyses, such as PCA visualization, exhibit low explained variance, which undermines the strength of the conclusions.
We thank the reviewer for their comments and feedback.
The authors state that "Notably, the multiclass classification objective and benchmarks used to evaluate EasIFA made it infeasible to compare performance for the binary catalytic residue prediction task." However, EasIFA has also released a model specifically for binary catalytic site classification. The authors should include EasIFA in their comparisons in order to provide a more comprehensive evaluation of Squidly's performance.
We thank the reviewer for raising this point. EasIFA’s binary classification task includes catalytic, binding, and “other” residues, which differs from Squidly’s strict catalytic residue prediction. This makes direct comparison non-trivial, which is why we originally had opted to not benchmark against EasIFA and instead highlight it in our discussion.
Given your comment, we did our best to include a benchmark that could give an indication of a comparison between the two tools. To do this, we filtered EasIFA’s multiclass classification test dataset for a non-overlapping subset with Squidly and AEGAN training data and <40% sequence identity to all training sets. This left only 66 catalytic residue– containing sequences that we could use as a held-out test set from both tools. We note it is not directly equal as Squidly and AEGAN had lower average identity to this subset (8.2%) than EasIFA (23.8%), placing them at a relative disadvantage.
We also identified a potential limitation in EasIFA’s original recall calculation, where sequences lacking catalytic residues were assigned a recall of 0. We adapted this to instead consider only the sequences which do have catalytic residues, which increased recall across all models. With the updated evaluation, EasIFA continues to show strong performance, consistent with it being SOTA if structural inputs are available. Squidly remains competitive given it operates solely from sequence and has a lower sequence identity to this specific test set.
Due to the small and imbalanced benchmark size, differences in training data overlap, and differences in our analysis compared with the original EasIFA analysis, we present this comparison in a new section (A.4) of the supplementary information rather than in the main text. References to this section have been added in the manuscript at lines 265-268. Additionally, we do update the discussion and emphasize the potential benefits of using EasIFA at lines (353-356).
The manuscript proposes three schemes for constructing positive and negative sample pairs to reduce dataset size and accelerate training, with Schemes 2 and 3 guided by reaction information (EC numbers) and residue identity. However, two issues remain:
(a) The authors do not systematically evaluate the impact of each scheme on model performance.
(b) In the benchmarking results, it is not explicitly stated which scheme was used for comparison with other models (e.g., Table 1, Figure 6, Figure 8). This lack of clarity makes it difficult to interpret the results and assess reproducibility.
(c) Regarding the negative samples in Scheme 3 in Figure 1, no sampling patterns are shown for residue pairs with the same amino acid, different EC numbers, and both being catalytic residues.
We thank the reviewer for these suggestions, which enabled us to improve the clarity and presentation of the manuscript. Please find our point by point response:
(a) We thank the reviewer for highlighting the lack of clarity in the way we have presented our evaluation in the section describing the Uni3175 benchmark. We aimed to systematically evaluate the impact of each scheme using the Uni3175 benchmark and refer to these results at lines 244-258, Additionally, we have adjusted the presentation of this section at lines 244-247 also in line with related comments from reviewer 1 in order to make the intention of this section and benchmark results to allow a comparison of each scheme to baseline models and AEGAN. These results led us to use Scheme 3 in both models for the other benchmarks in Figures 2 and 3. Please let us know if there is anything we can do to further improve the interpretability of Squidly’s performance.
(b) We thank the reviewer for highlighting this issue and improving the clarity of our manuscript. We agree that after the Uni3175 benchmark was used to evaluate the schemes, we did not clearly state in the other benchmarks that scheme 3 was chosen for both the 3B and 15B models. We have made changes in table 1 and the Figure legends of Figures 2 and 3 to state that scheme 3 was used. In addition, we integrated related results into panel figures (e.g. Figures 2 and 3 now show models trained and tested on consistent benchmark datasets) and standardized figure colors and legend formatting throughout. Furthermore, we suspect that the previous switch from using the individual vs ensembled Squidly models during the paper was not well indicated, and likely to confuse the reader. Therefore, we decided to consistently report the ensembled Squidly models for all benchmarks except in the ablation study (Figure 2A). In line with this, we altered the overview Figure 1A, so that it is clearer that the default and intended version of Squidly is the ensemble.
(c) We appreciate the reviewer pointing this out. You’re correct, we explicitly did not sample the negatives described by the reviewer in scheme 3 as our focus was on the hard negatives that relate most to the binary objective. We do think this is a great idea and would be worth exploring further in future versions of Squidly, where we will be expanding the label space used for hard-negative sampling and including binding sites in our prediction. We have updated the discussion at lines 395-396 to highlight this potential direction.
The PCA visualization (Figure 3) explains very little variance (~5% + 1.8%), but its use to illustrate the separability of embedding and catalytic residues may overinterpret the meaning of the low-dimensional projection. We question whether this figure is appropriate for inclusion in the main text and suggest that it be moved to the Supporting Information.
We thank the reviewer for this suggestion. We had discussed this as well, and in the end decided to include it in the main manuscript. We agree that the explained variance is low. However, when we first saw the PCA we were surprised that there was any separation at all. This then prompted us to investigate further, so we kept it in the manuscript to be true to the scientific story. However, we do agree that our interpretation could be interpreted as overly conclusive given the minimal variance explained by the top 2 PCs. Therefore, we agree with the assessment that the figure, alongside the accompanying results section, is more appropriately placed in the supplementary information. We moved this section (A.1) to the appendix to still explain the exploratory data analysis process that we used to tackle this problem, so that the general thought process behind Squidly is available for further reading.
Minor Comments:
(1) Figure Quality and Legends a) In Figure 4, the legend is confusing: "Schemes 2 and 3 (S1 and S2) ..." appears inconsistent, and the reference to Scheme 3 (S3) is not clearly indicated.
(b) In Figure 6, the legend overlaps with the y-axis labels, reducing readability. The authors should revise the figures to improve clarity and ensure consistent notation.
The reviewer correctly notes inconsistencies in figure presentation. We have revised the legend of Figure 4 (now 2A) to ensure schemes are referred to consistently and Scheme 3 (S3) is clearly indicated. We also adjusted Figure 6 (now 2c) to remove the overlap between the legend and y-axis labels.
Conclusion
We thank the reviewers and editor again for their constructive input. We believe the revisions and clarifications substantially strengthened the manuscript and the resource
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study used explicit-solvent simulations and coarse-grained models to identify the mechanistic features that allow for the unidirectional motion of SMC on DNA. Shorter explicit-solvent models describe relevant hydrogen bond energetics, which were then encoded in a coarse-grained structure-based model. In the structure-based model, the authors mimic chemical reactions as signaling changes in the energy landscape of the assembly. By cycling through the chemical cycle repeatedly, the authors show how these time-dependent energetic shifts naturally lead SMC to undergo translocation steps along DNA that are on a length scale that has been identified.
Strengths:
Simulating large-scale conformational changes in complex assemblies is extremely challenging. This study utilizes highly-detailed models to parameterize a coarse-grained model, thereby allowing the simulations to connect the dynamics of precise atomistic-level interactions with a large-scale conformational rearrangement. This study serves as an excellent example for this overall methodology, where future studies may further extend this approach to investigated any number of complex molecular assemblies.
We thank the reviewer for careful reading of our manuscript and highlighting the value of our bottom-up multiscale simulation approach.
Weaknesses:
The only relative weakness is that the text does not always clearly communicate which aspects of the dynamics are expected to be robust. That is, which aspects of the dynamics/energetics are less precisely described by this model? Where are the limits of the models, and why should the results be considered within the range of applicability of the models?
We appreciate this insightful comment and agree that it is important to more explicitly describe the robustness and limitations of the simulation model used in this study. In response to this comment, we have revised the Discussion section of our manuscript.
First, to clarify the robust aspects of our model, we have added a new subsection titled “Parametric choices and robustness of simulation model” to the Discussion, which is as follows:
“The switching Gō approach adopted in this study is a powerful tool for providing the relationship between known large-scale conformational changes and the resulting functional and mechanical dynamics of the molecular machine (Brandani and Takada, 2018b; Koga and Takada, 2006b; Nagae et al., 2025). In this study, we mimic conformational change induced by ATP binding and hydrolysis events by instantaneously switching the potential energy function from one that stabilized a given conformation to another that stabilized a different conformation. This drives the protein to undergo a conformational transition toward the minimum of the new energy landscape.
This approach is particularly well suited to investigate whether a given conformational change in a subunit of a molecular machine can produce the overall motion observed, and whether this process is mechanically feasible. Therefore, the fundamental mechanisms identified in this study, i.e., DNA segment capture mechanism, the correlation between step size and loop length, and the unidirectional translocation mechanism originating from the asymmetric kleisin path, can be considered as robust, as they emerge directly from the structural and topological constraints of the SMC-kleisin architecture rather than from tuned parameters.”
Additionally, to more clearly define the limits of our model, we have expanded the "Limitations in current simulations" subsection. Specifically, we have added a detailed discussion regarding the energetics and transition pathways inherent to the switching Gō approach, which is as follows:
“First, use of switching potentials to trigger conformational changes impose a limitation on predictive power for energetics and transition pathways. The switching of potentials is akin to a “vertical excitation” from one energy landscape to another, rather than a thermally activated crossing of an energy barrier. Consequently, the model cannot provide quantitative predictions of the transition rates or the free energy barriers associated with these changes. Furthermore, while the subsequent relaxation follows the new potential landscape, it is not guaranteed to reproduce the unique, physically correct transition pathway. Nevertheless, this simplification is justified because conformational changes within the protein are expected to occur on a much faster timescale than the large-scale motion of the DNA. Thus, this simplification has a limited impact on our main conclusions regarding the functional DNA dynamics driven by these large-scale conformational changes.”
We have not made any additions regarding the timescale and dwell times for each ATP state, as these were already discussed in the original manuscript.
Reviewer #2 (Public review):
Summary:
The authors perform coarse grained and all atom simulations to provide a mechanism for loop extrusion that is involved in genome compaction.
Strengths:
The simulations are very thoughtful. They provide insights into the translocation process, which is only one of the mechanisms. Much of the analyses is very good. Over all the study advances the use of simulations in this complicated systems.
We sincerely thank the reviewer for their thoughtful and encouraging comments.
Weaknesses:
Even the authors point out several limitations, which cannot be easily overcome in the paper because of the paucity of experimental data. Nevertheless, the authors could have done so to illustrate the main assertion that loop extrusion occurs by the motor translocating on DNA. They should mention more clearly that there are alternative theories that have accounted for a number of experimental data.
We thank the reviewer for these constructive suggestions. As the reviewer pointed out, it is important to state more explicitly how the unidirectional DNA translocation revealed in this study relates to the widely recognized loop-extrusion hypothesis of genome organization and situate our findings with the context of major alternative theories.
To address this, we first clarify the relationship between the translocation mechanism we observed and the phenomenon of loop extrusion. We emphasize that our simulations were designed to elucidate the core motor activity of the SMC complex, and we explicitly state our view that loop extrusion is a functional consequence of this motor activity when the complex is anchored to DNA.
Second, as the reviewer also suggested, we addressed alternative models of loop extrusion that also have experimental support in more details. We have revised the Discussion accordingly to provide a more balanced and comprehensive context. Further details are provided in our separate response to the comment below.
Reviewer #3 (Public review):
Summary:
In this manuscript, Yamauchi and colleagues combine all-atom and coarse-grained MD simulations to investigate the mechanism of DNA translocation by prokaryotic SMC complexes. Their multiscale approach is well-justified and supports a segment-capture model in which ATP-dependent conformational changes lead to the unidirectional translocation of DNA. A key insight from the study is that asymmetry in the kleisin path enforces directionality. The work introduces an innovative computational framework that captures key features of SMC motor action, including DNA binding, conformational switching, and translocation.
This work is well executed and timely, and the methodology offers a promising route for probing other large molecular machines where ATP activity is essential.
Strengths:
This manuscript introduces an innovative yet simple method that merges all-atom and coarse-grained, purely equilibrium, MD simulations to investigate DNA translocation by SMC complexes, which is triggered by activated ATP processes. Investigating the impact of ATP on large molecular motors like SMC complexes is extremely challenging, as ATP catalyses a series of chemical reactions that take and keep the system out of equilibrium. The authors simulate the ATP cycle by cycling through distinct equilibrium simulations where the force field changes according to whether the system is assumed to be in the disengaged, engaged, and V-shaped states; this is very clever as it avoids attempting to model the non-equilibrium process of ATP hydrolysis explicitly. This equilibrium switching approach is shown to be an effective way to probe the mechanistic consequences of ATP binding and hydrolysis in the SMC complex system.
The simulations reveal several important features of the translocation mechanism. These include identifying that a DNA segment of ~200 bp is captured in the engaged state and pumped forward via coordinated conformational transitions, yielding a translocation step size in good agreement with experimental estimates. Hydrogen bonding between DNA and the top of the ATPase heads is shown to be critical for segment capturtrans, as without it, translocation is shown to fail. Finally, asymmetry in the kleisin subunit path is shown to be responsible for unidirectionally.
This work highlights how molecular simulations are an excellent complement to experiments, as they can exploit experimental findings to provide high-resolution mechanistic views currently inaccessible to experiments. The findings of these simulations are plausible and expand our understanding of how ATP hydrolysis induces directional motion of the SMC complex.
We thank the reviewer for the thoughtful and encouraging assessment of our work. We appreciate the reviewer’s summary of our key contributions, especially our switching Gō strategy, the segment-capture mechanism of SMC translocation, and the role of kleisin-path asymmetry in ensuring unidirectionality.
Weaknesses:
There are aspects of the methodology and modelling assumptions that are not clear and could be better justified. The major ones are listed below:
(1) The all-atom MD simulations involve a 47-bp DNA duplex interacting with the ATPase heads, from which key residues involved in hydrogen bonding are identified. However, DNA mechanics-including flexibility and hydrogen bond formation-are known to be sequence-dependent. The manuscript uses a single arbitrary sequence but does not discuss potential biases. Could the authors comment on how sequence variability might affect binding geometry or the number of hydrogen bonds observed?
We thank the reviewer for this insightful comment regarding the potential effects of DNA sequence.
The primary biological role of the SMC complex is to organize genome architecture on a global scale; as such, its fundamental interaction with DNA is considered not to be sequence-specific. Our all-atom MD simulations and analysis pipeline were designed to probe the nature of this general interaction. Our approach confirms this rationale: the analysis exclusively identified hydrogen bonds formed between amino acid residues and the phosphate groups of the DNA's sugar-phosphate backbone. As shown in Figs. 1B and 1C, the results confirm that the key stabilizing interactions occur between basic residues on the SMC head surface and the DNA backbone. Since the backbone is chemically uniform, the stable binding mode we characterized is inherently sequence-independent.
While the final bound state is likely sequence-independent, we agree that sequence-dependent properties such as local DNA flexibility or intrinsic curvature could influence the kinetics of the binding process. For example, the rate of initial recognition or the ease of DNA bending on the head surface might vary between AT-rich and GC-rich regions. However, once the DNA is bound, we expect the stable binding geometry and the identity of the key interacting residues to be conserved across different sequences.
Therefore, we are confident that using a single, representative DNA sequence is a valid approach for elucidating the fundamental, non-sequence-specific aspects of SMC-DNA interaction and does not alter the general validity of the translocation mechanism proposed in this work.
(2) A key feature of the coarse-grained model is the inclusion of a specific hydrogen-bonding potential between DNA and residues on the ATPase heads. The authors select the top 15 hydrogen-bond-forming residues from the all-atom simulations (with contact probability > 0.05), but the rationale for this cutoff is not explained. Also, the strength of hydrogen bonds in coarse-grained models can be sensitive to context. How did the authors calibrate the strength of this interaction relative to electrostatics, and did they test its robustness (e.g., by varying epsilon or residue set)? Could this interaction be too strong or too weak under certain ionic conditions? What happens when salt is changed?
Thank you for these comments. We provide our rationale for the parameter choices below.
The contact probability cutoff of 0.05 was chosen to create a comprehensive set of residues that form physically robust interactions with DNA. To establish this robustness, we performed a parallel set of all-atom simulations using a different force field (see Fig. S2). This cross-validation revealed two key points. First, the top six residues (Arg120, Arg123, Ile63, Arg111, Arg62, and Lys56), which include experimentally confirmed DNA-binding sites, consistently exhibited the highest contact probabilities in both force fields, confirming the reliability of our identification. Second, and just as importantly, many residues with lower contact probabilities (e.g., Trp115, Tyr107, Arg105, Ser124, and Ser54) were also consistently detected across both simulations. This reproducibility suggests that these interactions are physically robust and not artifacts of a specific force field. We therefore concluded that a 0.05 cutoff is a well-balanced threshold that ensures the inclusion of not only the primary anchor residues but also the secondary, moderately interacting residues that are crucial for cooperatively stabilizing the DNA. We discussed this point in Method in the revised manuscript, which is as follows:
“The rationale for this cutoff is the physical robustness of the identified interactions; all-atom simulations using a different force field confirmed that the same set of key interacting residues, including both strong and moderate binders, was consistently identified (Fig. S2).”
The strength of the hydrogen bond potential was set to ϵ = 4.0 kT (≈2.4 kcal/mol), a physically plausible value corresponding to an ideal hydrogen bond. To test the robustness of this parameterization, we performed preliminary simulations where we varied these parameters by (i) reducing the value of ϵ and (ii) restricting the interaction to only the top six anchor residues. In both test cases, while a short DNA duplex (47 bp) could still bind to the ATPase heads, simulations with a long DNA (800 bp) failed to form a stable DNA loop after initial docking. These tests demonstrated that a larger set of cooperative interactions with a physically realistic strength was necessary for the full segment capture mechanism. Our final parameter set (15 residues at ϵ = 4.0 kT) was thus chosen as the parameter set required to capture both the initial anchoring of DNA and the subsequent cooperative stabilization of the captured loop.
As correctly pointed out, ionic conditions are a critical factor. Our simulations revealed that the salt concentration had a more pronounced effect on the kinetics of the DNA finding its correct binding site rather than on the thermodynamic stability of the final bound state. During our parameter tuning, we found that at physiological salt conditions (150 mM), long-range electrostatic interactions become dominant. This caused the DNA to be non-specifically captured by positively charged patches on the sides of the heads, which are not the functional binding sites. This off-pathway trapping kinetically prevented the DNA from reaching its proper location within the simulation timeframe. In contrast, the high-salt conditions (300 mM) used in this study screen these long-range interactions, suppressing non-specific trapping and allowing the DNA to efficiently explore the protein surface. This enables the correct binding to be established via the specific, short-range hydrogen bonds. Therefore, the ion concentration in our model is more as a crucial kinetic control factor to reproduce correct binding pathway within a realistic simulation timeframe. This point is discussed in the new subsection entitled “Parametric choices and robustness of simulation model”.
(3) To enhance sampling, the translocation simulations are run at 300 mM monovalent salt. While this is argued to be physiological for Pyrococcus yayanosii, such a concentration also significantly screens electrostatics, possibly altering the interaction landscape between DNA and protein or among protein domains. This may significantly impact the results of the simulations. Why did the authors not use enhanced sampling methods to sample rare events instead of relying on a high-salt regime to accelerate dynamics?
We agree that enhanced sampling methods are powerful for exploring rare events. However, many of these techniques require the pre-definition of a suitable, low-dimensional reaction coordinate (RC) to guide the simulation. The primary goal of our study was to discover the DNA translocation mechanism as it emerges naturally from fundamental physical interactions, without imposing a priori assumptions about the specific pathway.
The DNA segment capture process is complex, involving the coordinated motion of a long DNA polymer and multiple protein domains. Defining a simple RC in advance was not feasible and would have carried a significant risk of biasing the system toward an artificial pathway. Therefore, to avoid such bias, we chose to perform direct, unbiased molecular dynamics simulations. Using a physiologically relevant high-salt concentration (300 mM) for Pyrococcus yayanosii was a strategy to accelerate the system's natural dynamics, allowing us to observe these unbiased trajectories within a feasible computational timescale.
Because our current work has elucidated the fundamental steps of this mechanism, we agree that this work provides a foundation for more quantitative analyses. As suggested, future studies using methods like Markov State Model analysis or enhanced sampling techniques, guided by more sophisticated RCs defined from the insights of this work, would be a valuable next step for characterizing the free-energy landscape of the process or longer time scale dynamics.
(4) Only a small fraction of the simulated trajectories complete successful translocation (e.g., 45 of 770 in one set), and this is attributed to insufficient simulation time. While the authors are transparent about this, it raises questions about the reliability of inferred success rates and about possible artefacts (e.g., DNA trapping in coiled-coil arms). Could the authors explore or at least discuss whether alternative sampling strategies (e.g., Markov State Models, transition path sampling) might address this limitation more systematically?
We thank the reviewer for raising this point that is crucial for considering limitations and future directions of our study.
As we noted in a previous response, the primary reason we did not employ such enhanced sampling methods was the limited prior knowledge available to define previously uncharacterized DNA translocation process. Therefore, we first try to define the key conformational states and transitions without the potential bias of a predefined model or reaction coordinate. This approach was successful, as it allowed us to identify critical on-pathway states like “DNA segment capture” and significant off-pathway or kinetically trapped states such as 'DNA trapping' between the coiled-coil arms.
We fully agree that the low success rate observed is a key finding that points to significant kinetic bottlenecks, and that a more systematic analysis is required. Having identified the essential states, applying techniques such as Markov State Models (MSMs) or transition path sampling represents a powerful and logical next step. These methods, using a state-space definition based on our findings, will enable a quantitative characterization of the free-energy landscape and the transition rates between states. This will provide a rigorous understanding of the kinetic factors, such as the depth of the trapped-state energy well, that underlie the low translocation efficiency.
In the revised manuscript, we discuss the application of these advanced sampling methods as a feasible and promising future direction, which is as follows:
“Future studies can leverage the insights from this work to overcome the current timescale limitations. Techniques such as Markov state modeling (Husic and Pande, 2018; Prinz et al., 2011) or enhanced sampling methods (Hénin et al., 2022) may be employed to quantitatively characterize the free-energy landscape and transition rates. Such an approach would provide a rigorous understanding of the kinetic barriers, such as the stability of the trapped state, that govern the efficiency of SMC translocation.”
Reviewer #1 (Recommendations for the authors):
As noted in the public review, there could be a more systematic description of the limits of the model. The model appears to be carefully crafted, though every model has limits. It could be helpful for the general readership to give some idea of which parametric choices are more critical, and which mechanistic features should be robust to minor changes in parameters.
We sincerely thank the reviewer for this constructive comment. We agree that clarifying which aspects of our model is robust and sensitive to specific parameter choices is crucial for the reader's understanding.
We have expanded the Discussion to clarify how specific simulation parameters affect the efficiency and success rate of DNA translocation in our coarse-grained simulations. In particular, we have added a description of the parametric choices for (i) selection and strength of hydrogen bonds, (ii) ionic strength, and (iii) interaction strength between the coiled-coil arms. The discussion can be found in subsection entitled “Parametric choice and robustness of simulation model” in the Discussion, which is as follows:
“On the other hand, the efficiency and success rate of DNA translocation in our simulations are more sensitive to certain parametric choices. For instance, the selection and strength of hydrogen bond-like interactions are a key factor. Our model incorporates specific hydrogen bonds between the upper surface of the ATPase heads and DNA, based on all-atom simulations. These interactions are essential for initiating segment capture; without them, DNA fails to migrate to the correct binding surface. While the identification of these key residues is a robust finding—persisting across different all-atom force fields (Fig. S2)—their strength and number in the coarse-grained potential are critical parameters that directly influence the probability and kinetics of DNA capture. Another critical parameter is the ionic strength. We performed translocation simulations at an ionic strength of 300 mM to accelerate DNA dynamics. At lower concentrations, non-specific electrostatic interactions between DNA and positively charged patches on the sides of the ATPase heads or coiled-coil arm became dominant, hindering the efficient migration of DNA to its functional binding site. Using a higher-than-physiological ionic strength is a justified practice in coarse-grained simulations employing the Debye-Hückel approximation, as it serves as a first-order correction to mimic the strong local charge screening by condensed counterions that is not explicitly captured by the mean-field model (Brandani et al., 2021; Niina et al., 2017b). Finaly, the interaction strength between the coiled-coil arms is also important. In our model, once the arms closed during the transition from the V-shaped to the disengaged state, they remained closed on the simulated timescale, frequently trapping DNA pushed from the hinge and thereby leading to failed translocation. This behavior suggests that the arm–arm interactions may be overestimated. A parameterization that allows for more frequent, transient opening of the arms could increase the success rate of DNA pumping.”
Reviewer #2 (Recommendations for the authors):
This paper reports simulations (all atom and coarse grained) to provide molecular details of loop extrusion. In general, it is a well done paper. There are a few issues that the authors should address.
(1) The study supposes that loop extrusion occurs by translocation. Although they point out alternate models like scrunching (C Dekker; the theory by Takaki is also based on the scrunching model that the authors should mention), they should discuss this further. After all, the Takaki theory does predict several experimental outcomes very accurately. The precise mechanism has not been nailed down - The paper by Terakawa in Science suggests the extrusion is by translocation, but the evidence is not clear.
We thank the reviewer for this insightful comment. We agree that our discussion should briefly acknowledge alternative models such as scrunching. We have therefore revised the manuscript to mention the theory by Takaki et al. (Nat. Commun., 2021), which reproduces several experimental outcomes.
Because our present work specifically addresses the translocation mechanism based on DNA segment capture, we now state that scrunching and related models represent alternative proposals for loop extrusion.
In this revision, we have added discussion to the end of the subsection titled "DNA segment capture as the mechanism of the DNA translocation by SMC complexes." in the Discussion section, which is as follows:
“Turning to loop extrusion mechanisms, alternative mechanisms have been proposed in addition to the DNA-segment capture model. For example, Takaki et al. developed a scrunching-based theory that quantitatively accounts for several experimental observations, including force-velocity relationships and step-size distributions. While our present study focuses on the DNA translocation mechanism via segment capture, it is important to note that scrunching and other models remain plausible alternatives for loop extrusion. The precise mechanism may depends on the specific SMC complex and their subunits and remains to be fully resolved.”
(2) It is unclear how one can say from Figure 4I and J that translocation has taken place. These panels show that the base pair length increases. This should be explained more clearly. They should also simultaneously plot the location of the heads (2D plot).
Thank you for this valuable suggestion. In response to the comment on how translocation is presented in Fig. 4I and J, we have revised the text to make it clear that the SMC complex moves along DNA in subsection entitled “DNA translocation via DNA-segment capture”, as follows:
“Fig. 4I represents the one-dimensional contour coordinate of the DNA molecule, indexed by base pairs (1-800). In this plot, translocation is visualized as a discontinuous shift in the range of base-pair indices that the SMC complex contacts over one complete ATP cycle”
“This translocation is recorded in Fig. 4I as the average coordinate of the kleisin contact region (red dots) jumps from ~400 bp before the cycle to ~600bp after, which corresponds to a translocation event of ~200 bp”
We believe that adding this explanation makes it clearer to readers that Fig. 4I and 4J provide direct evidence for unidirectional translocation of the SMC complex.
(3) The transitions between the states are very abrupt (see Figure 2). Please explain. Also, in which state does extrusion take place? What is the role of the V-shape - is it part of the ATPase cycle?
We thank the reviewer for raising these questions.
In our simulation, we implemented ATP-binding state change by instantaneously switching the structure-based (Gō-type) potential between reference conformations for the disengaged (apo), engaged (ATP-bound), and V-shaped (ADP-bound) states at predetermined times. The system rapidly relaxes along the new funnel-shaped potential energy surface toward its minimum. This rapid relaxation is why the transition appears abrupt in metrics such as the Q-score in Fig.2.
The V-shaped state corresponds to a key ADP-bound intermediate within the ATP hydrolysis cycle. Its primary role in our model is preparatory; it establishes the necessary open geometry that allows for the subsequent "zipping" of the coiled-coil arms. Crucially, unidirectional pumping motion is generated during the transition from the V-shaped state to the disengaged state. That is, the zipping motion of the coiled-coil arm pushes the captured DNA segment forward, resulting in a net translocation along the DNA.
(4) It appears the heads do not move between the disengaged to engaged states. Why not in their model?
Thank you for pointing out the lack of clarity in explanation of the SMC head movement in our simulations.
In our model, the transition from the disengaged to the engaged state involves a dynamic rearrangement of the SMC heads. Specifically, one ATPase head slides (~10 Å) and rotates (~85°) relative to the other ATPase head to re-associate at a new dimer interface. This movement drives the global conformational change of the complex from a rod-like shape to an open ring, a mechanism proposed in a previous structural study (Diebold-Durand et al., Mol. Cell, 2017).
As reviewer 2 noted, this crucial motion, which is reflected in the changing head-head distance and hinge angle in Fig. 2A, was not sufficiently highlighted in the text. We have therefore revised the manuscript to explicitly describe this head rearrangement to improve clarity, which is as follows:
“Upon transition to the engaged state, the two ATPase heads were quickly rearranged to form the new inter-subunit contacts. Specifically, this rearrangement involves one ATPase head sliding by approximately 10 Å and rotating by 85° relative to the other, allowing it to associate through a different interface (Diebold-Durand et al., 2017b). The fractions of formed contacts, Q-scores, that exist at the disengaged (engaged) states quickly decreased (increased) (Fig. 2A, top two plots).”
(5) What is pumping - it has been used in Marko NAR in the DNA capture model. How is that illustrated in the simulations?
We thank the reviewer for raising this point. In the context of the DNA segment-capture model by Marko et al. (NAR, 2019), "pumping" refers to the conceptual process where a DNA loop, captured in an upper compartment of the SMC ring, is transferred to a lower compartment, resulting in net translocation.
Our simulations provide a direct, molecular-resolution visualization of the physical mechanism underlying this concept. We illustrate that the "pumping" action is not a passive transfer but an active, mechanical process driven by a specific conformational change. This occurs during the transition from the V-shaped (ADP-bound) to the disengaged state. As shown in our trajectories, the two coiled-coil arms close in a zipper-like manner, beginning from the hinge and progressing toward the ATPase heads. This zipping motion physically pushes the captured DNA segment from the hinge region toward the kleisin ring.
This process is visualized in our simulations as a clear, unidirectional translocation step (see Figs. 4B–D, 4I, and S6). The result is a net forward movement of the DNA by a distance that corresponds to the length of the initially captured loop, a key prediction of the Marko’s model that we quantify in our step-size analysis (Figs. 4K–L and S8).
To make this point clearer for the reader, we have revised the manuscript. We have explicitly defined this "zipping and pushing" action as the physical basis for the "pumping" mechanism in the subsection titled "Zipping motion of coiled-coil arms pushes the DNA from hinge domain toward kleisin ring", which is as follows:.
“This active, mechanical pushing of the DNA loop, driven by the sequential closing of the coiled-coil arm, constitutes the physical basis of the “pumping” mechanism that drives unidirectional translocation. Our simulations thus provide a concrete, molecular-level visualization for this key step in the DNA segment-capture model.”
(6) The length of DNA simulated is small for understandable reasons. Both experiments and theory show that loop extrusion sizes can be very large, far exceeding the sizes of the SMA complex. Could the small size of DNA be affecting the results?
We thank the reviewer for this important comment. The relationship between our simulated system size and the large-scale phenomena observed experimentally is a key point.
Our study was specifically designed to elucidate the fundamental mechanism of the elementary, single-cycle translocation step at near-atomic resolution. For this purpose, the 800 bp DNA length was sufficient. The observed translocation step size per cycle was 216 ± 71 bp, which is substantially smaller than the total length of the simulated DNA. This confirms that the boundaries of our system did not artificially constrain the core translocation process we aimed to investigate. Therefore, we think that the DNA length used in this study did not systematically bias our main findings regarding the motor mechanism itself.
As the reviewer pointed out, on the other hand, our current setup cannot reproduce the formation of kilobase-scale loops. We hypothesize that these large-scale events are intrinsically linked to the stochastic nature of the ATP hydrolysis cycle, which was simplified in our simulation model. We used fixed durations for each state for computational feasibility. In a more realistic scenario, a stochastically prolonged engaged state would provide a larger duration time for a captured DNA loop to grow via thermal diffusion. This could lead to occasional, much larger translocation steps upon ATP hydrolysis, contributing to the large loop sizes seen experimentally.
(7) Minor point: The first CG model using three sites was introduced in PNAS vol 102, 6789 2005. The authors should consider citing it.
Thank you for this suggestion. We have now cited the paper the reviewer recommended. Please find subsection entitled Coarse-grained simulations in Materials and Methods.
Reviewer #1 (Public review):
Summary:
Press et al test, in three experiments, whether responses in a speeded response task reflect people's expectations, and whether these expectations are best explained by the objective statistics of the experimental context (e.g., stimulus probabilities) or by participants' mental representation of these probabilities. The studies use a classical response time and accuracy task, in which people are (1) asked to make a response (with one hand), this response then (2) triggers the presentation of one of several stimuli (with different probabilities depending on the response), and participants (3) then make a speeded response to identify this stimulus (with the other hand). In Experiment 1, participants are asked to rate, after the experiment, the subjective probabilities of the different stimuli. In Experiments 2 and 3, they rated, after each trial, to what extent the stimulus was expected (Experiment 2), or whether they were surprised by the stimulus (Experiment 3). The authors test (using linear models) whether the subjective ratings in each experiment predict stimulus identification times and accuracies better than objective stimulus probabilities (Experiment 1), or than their objective probability derived from a Rescorla-Wagner model of prior stimulus history (Experiment 2 and 3). Across all three experiments, the results are identical. Response times are best described by contributions from both subjective and objective probabilities. Accuracy is best described by subjective probability.
Strengths:
This is an exciting series of studies that tests an assumption that is implicit in predictive theories of response preparation (i.e., that response speed/accuracy tracks subjective expectancies), but has not been properly tested so far, to my knowledge. I find the idea of measuring subjective expectancy and surprise in the same trials as the response very clever. The manuscript is extremely well written. The experiments are well thought-out, preregistered, and the results seem highly robust and replicable across studies.
Weaknesses:
In my assessment, this is a well-designed, implemented, and analysed series of studies. I have one substantial concern that I would like to see addressed, and two more minor ones.
(1) The key measure of the relationship between subjective ratings and response times/accuracy is inherently correlational. The causal relationship between both variables is therefore by definition ambiguous. I worry that the results don't reveal an influence of subjective expectancy of response times/accuracies, but the reverse: an influence of response times/accuracies on subjective expectancy ratings.
This potential issue is most prominent in Experiments 2 and 3, where people rate their expectations in a given trial directly after they made their response. We can assume that participants have at least some insight into whether their response in the current trial was correct/erroneous or fast/slow. I therefore wonder if the pattern of results can simply be explained by participants noticing they made an error (or that they responded very slowly) and subsequently being more inclined to rate that they did not expect this stimulus (in Experiment 2) or that they were surprised by it (in Experiment 3).
The specific pattern across the two response measures might support this interpretation. Typically, participants are more aware of the errors they make than of their response speed. From the above perspective, it would therefore be not surprising that all experiments show stronger associations between accuracy and subjective ratings than between response times and subjective ratings -- exactly as the three studies found.
I acknowledge that this problem is less strong in Experiment 1, where participants do not rate expectancy or surprise after each response, but make subjective estimates of stimulus probabilities after the experiment. Still, even here, the flow of information might be opposite to what the authors suggest. Participants might not have made more errors for stimuli that they thought as least likely, but instead might have used the number of their responses to identify a given stimulus as a proxy for rating their likelihood. For example, if they identify a square as a square 25% of the time, even though 5% of these responses were in error, it is perhaps no surprise if their rating of the stimulus likelihood better tracks the times they identified it as a square (25%) than the actual stimulus likelihoods (20%).
This potential reverse direction of effects would need to be ruled out to fully support the authors' claims.
(2) My second, more minor concern, is whether the Rescorla-Wagner model is truly the best approximation of objective stimulus statistics. It is traditionally a model of how people learn. Isn't it, therefore, already a model of subjective stimulus statistics, derived from the trial history, instead of objective ones? If this is correct, my interpretation of Experiments 2 and 3 would be (given my point 1 above is resolved) that subjective expectancy ratings predict responses better than this particular model of learning, meaning that it is not a good model of learning in this task. Comparing results against Rescorla-Wagner may even seem like a stronger test than comparing them against objective stimulus statistics - i.e., they show that subjective ratings capture expectancies better even than this model of learning. The authors already touch upon this point in the General Discussion, but I would like to see this expanded, and - ideally - comparisons against objective stimulus statistics (perhaps up to the current trial) to be included, so that the authors can truly support the claim that it is not the objective stimulus statistics that determine response speed and accuracy.
(3) There is a long history of research trying to link response times to subjective expectancies. For example, Simon and Craft (1989, Memory & Cognition) reported that stimuli of equal probability were identified more rapidly when participants had explicitly indicated they expect this stimulus to occur in the given trial, and there's similar more recent work trying to dissociate stimulus statistics and explicit expectations (e.g., Umbach et al., 2012, Frontiers; for a somewhat recent review, see Gaschler et al., 2014, Neuroscience & Biobehavioral Reviews). It has not become clear to me how the current results relate to this literature base. How do they impact this discussion, and how do they differ from what is already known?
Reviewer #2 (Public review):
Summary:
This work by Clarke, Rittershofer, and colleagues used categorization and discrimination tasks with subjective reports of task regularities. In three behavioral experiments, they found that these subjective reports explain task accuracy and response times at least as well and sometimes better than objective measures. They conclude that subjective experience may play a role in predicting processing.
Strengths:
This set of behavioral studies addresses an important question. The results are replicated three times with a different experimental design, which strengthens the claims. The design is preregistered, which further strengthens the results. The findings could inspire many studies in decision-making.
Weaknesses:
It seems to me that it is important, but difficult to distinguish whether the objective and subjective measures stem from reasonably different mechanisms contributing to behavior, or whether they are simply two noisy proxies to the same mechanism, in which case it is not so surprising that both contribute to the explained variance. The authors acknowledge in the discussion that the type of objective measure that is chosen is crucial.
For instance, the RW model's learning rates were not fitted to participants but to the sequence of stimuli, so they represent the optimal parameter values, not the true ones that participants are using. Is the subjective measure just a readout of the RW model's true state when using the participants' parameters? Relatedly, would the authors consider the RW predictions from participants using a sub-optimal alpha to be a subjective or an objective measure? Do the results truly show the importance of subjective measures, or is it another way of saying that humans are sub-optimal (Rahnev & Denison, 2018, BBS) ... or optimal for other goals. I see the difficulty of avoiding double-dipping on accuracy, but this seems essential to address. This relates to a more general question about the underlying mechanisms of subjective versus objective measures, which is alluded to in the discussion but could be interesting to develop a bit further.
In terms of methods, I did not fully understand the 'RW model expectedness' objective metric in Experiments 2 and 3. VT is defined as the 'model's expectation for the given tone T. A (signed?) prediction error is defined for the expectation update, but it seems that the RW model expectedness used in the figures and statistical models is VT, sign-inverted for unexpected stimuli. So how do we interpret negative values, and how often do they occur? Shouldn't it be the unsigned value that is taken as objective surprise? This could be explained in a bit more detail. Could this be related to the quadratic effect that one can see in Figures 4E and 5E, which is not taken into account in the statistical model? Figures 4A and 5A also seem to show a combination of linear and quadratic effects. A more complete description of the objective measure could help determine whether this is a serious issue or just noise in the data.
Gabor patches in Experiments 2 and 3 seemed to have been presented at quite a sharp contrast (I did not find this info), and accuracy seems to saturate at 100%. What was the distribution of error rates, i.e., how many participants were so close to 100% that there was no point in including them in the analysis?
In the second preregistration, the authors announced that BIC comparisons between the full model and the objective model will test whether subjective measures capture additional variance [...] beyond objective prediction error. This is also the conclusion reached in sections 3.3 and 4.3. The model comparison, however, is performed by selecting the best of three models, excluding the null model. It seems that the full model still wins over the objective model, but sometimes quite marginally. Could the authors not test the significance of the model comparison since models are nested?
Reviewer #3 (Public review):
Summary:
Clarke et al. investigate the role of subjective representations of task-based statistical structure on choice accuracy and reaction times during perceptual decision-making. Subjective representations of objective statistical structure are often overlooked in studies of predictive processing and, consequently, little is known about their role in predictive phenomena. By gauging the subjective experience of stimulus probability, expectedness, and surprise in tasks with fixed cue-stimulus contingencies, the authors aimed to separate subjective and objective (task-induced) contributions to predictive effects on behaviour.
Across three different experiments, subjective and objective contributions to predictions were found to explain unique portions of variance in reaction time data. In addition, choice accuracy was best predicted by subjective representations of statistical structure in isolation. These findings reveal that the subjective experience of statistical regularities may play a key role in the predictive processes that shape perception and cognition.
Strengths:
This study combines careful and thorough behavioral experimentation with an innovative focus on subjective experience in predictive processing. By collecting three independent datasets with different perceptual decision-making paradigms, the authors provide converging evidence that subjective representations of statistical structure explain unique variance in behavior beyond objective task structure. The analysis strategy, which directly contrasts the contributions of subjective and objective predictors, is conceptually rigorous and allows clear insight into how subjective and objective influences shape behavior. The methods are consistently applied across all three datasets and produce coherent results, lending strong support to the authors' conclusions. The study emphasizes the critical role of subjective experience in predictive processing, with implications for understanding learning, perception, and decision-making.
Weaknesses:
Despite these strengths, there are several conceptual and technical issues that should be addressed. In Experiments 2 and 3, the authors use a Rescorla-Wagner (RW) learning model to estimate trialwise expectedness (Experiment 2) and surprise (Experiment 3). While the RW model is a well-established model for explaining learning behaviour, it does not represent the objective 'ground truth' statistical structure of the environment, and treating RW trajectories as such imposes assumptions about learning that may not match participants' actual behavior. This assumption could strongly affect the comparison between subjective and 'objective' predictors. It would strengthen the primary conclusions of the manuscript if other implementations of the objective statistical structure, such as the true task-defined probabilities (i.e., 25% or 75%), were considered to provide a complementary 'ground truth' perspective.
Additionally, because objective statistical structure was predictive of subjective ratings in all three experiments, these predictors are likely collinear in the full model. Collinearity can lead to inflated standard errors and unstable coefficient estimates, even if the models converge. Currently, this potential critical problem of the applied statistical models is not assessed, reported on, or controlled for (e.g., by residualizing predictors). RW trajectories are also not reported in the manuscript, limiting the ability to assess how the model evolves over time and whether it maps onto the task-induced probabilities in a sensible way. This is particularly relevant because participants' subjective estimates of the task-induced probabilities seem to converge to the ground truth after just a few trials, especially for the 75% stimuli (Figure 3C).
Reviewer #1 (Public review):
This is an interesting manuscript aimed at improving the transcriptome characterization of 52 C. elegans neuron classes. Previous single-cell RNA seq studies already uncovered transcriptomes for these, but the data are incomplete, with a bias against genes with lower expression levels. Here, the authors use cell-specific reporter combinations to FACS purify neurons and use bulk RNA sequencing to obtain better sequencing depth. This reveals more rare transcripts, as well as non-coding RNAs, pseudo genes, etc. The authors develop computational approaches to combine the bulk and scRNA transcriptome results to obtain more definitive gene lists for the neurons examined.
To ultimately understand features of any cell, from morphology to function, an understanding of the full complement of the genes it expresses is a pre-requisite. This paper gets us a step closer to this goal, assembling a current "definitive list" of genes for a large proportion of C. elegans neurons. The computational approaches used to generate the list are based on reasonable assumptions, the data appear to have been treated appropriately statistically, and the conclusions are generally warranted. I have a few issues that the authors may chose to address:
(1) As part of getting rid of cross contamination in the bulk data, the authors model the scRNA data, extrapolate it to the bulk data and subtract out "contaminant" cell types. One wonders, however, given that low expressed genes are not represented in the scRNA data, whether the assignment of a gene to one or another cell type can really be made definitve. Indeed, it's possible that a gene is expressed at low levels in one cell, and in high levels in another, and would therefore be considered a contaminant. The result would be to throw out genes that actually are expressed in a given cell type. The definitive list would therefore be a conservative estimate, and not necessarily the correct estimate.
(2) It would be quite useful to have tested some genes with lower expression levels using in vivo gene-fusion reporters to assess whether the expression assignments hold up as predicted. i.e. provide another avenue of experimentation, non-computational, to confirm that the decontamination algorithm works.
(3) In many cases, each cell class would be composed of at least 2 if not more neurons. Is it possible that differences between members of a single class would be missed by applying the cleanup algorithms? Such transcripts would be represented only in a fraction of the cells isolated by scRNAseq, and might then be considered not real?
(4) I didn't quite catch whether the precise staging of animals was matched between the bulk and scRNAseq datasets. Importantly, there are many genes whose expression is highly stage specific or age specific so that even slight temporal difference might yield different sets of gene expression.
(5) To what extent does FACS sorting affect gene expression? Can the authors provide some controls?
Comments on revisions:
The authors have made reasonable arguments in response to my questions, and have done some additional experiments. I believe that although they did not do so, they could have generated additional reporters for the lower expressed genes, that would have validated their method of data integration. Nonetheless, I think the paper is rigorous and will be of use to the community.
Reviewer #3 (Public review):
Summary
This study aims to overcome key limitations of single-cell RNA-seq in C. elegans neurons-especially the under-detection of lowly expressed and non-polyadenylated transcripts and residual contamination-by integrating bulk RNA-seq from FACS-isolated neuron types with an existing scRNA-seq atlas. The authors introduce LittleBites, an iterative, reference-guided decontamination algorithm that uses a single-cell reference together with ground-truth reporter datasets to optimize subtraction of contaminating signal from bulk profiles. They then generate an "Integrated" dataset that combines the sensitivity of bulk data with the specificity of scRNA-seq and use it to call neuron-specific expression for protein-coding genes, "rescued" genes not detected in scRNA-seq, and multiple classes of non-coding RNAs across 53 neuron classes. All data, code, and thresholded matrices are made publicly available to enable community reuse.
Strengths
(1) Conceptual advance and useful resource. The work demonstrates in a concrete way how bulk and single-cell datasets can be combined to overcome the weaknesses of each approach, and delivers a high-resolution transcriptomic resource for a substantial fraction of C. elegans neuron classes . The integrated matrices, thresholded expression calls, and non-coding RNA catalog will be useful both for basic neurobiology and for method developers.
(2) Careful benchmarking and transparency. The revised manuscript includes extensive benchmarking of LittleBites and the Integrated dataset against multiple independent "ground-truth" sets: neuron-specific reporter lines, curated non-neuronal markers, and ubiquitous genes. The authors evaluate AUROCs over a wide range of thresholds, explain ROC/AUROC metrics for non-specialists, and quantify how integration affects both sensitivity and specificity relative to scRNA-seq alone.
(3) Improved methodological clarity. In response to review, the authors now provide a much more intuitive description of the LittleBites algorithm, including a stepwise explanation of (1) contamination estimation via NNLS using single-cell references, (2) weighted subtraction tuned by a learning-rate parameter, and (3) performance optimization based on AUROC against ground-truth genes. this makes the approach accessible to readers who are not computational specialists and will facilitate re-implementation.
(4) Systematic analysis of reference dependence. The authors explicitly address the concern that LittleBites depends on the completeness and accuracy of the scRNA-seq reference. They examine how performance varies with cluster size and by simulated degradation of the reference (e.g., reducing the number of cells per cluster), and show that AUROCs remain robust, but that gene-level assignments are more variable for clusters represented by fewer cells. This is an important and honest characterization of when the method is reliable and when users should be cautious.
(5) Additional biological context. The manuscript now more clearly situates the dataset in the context of previous and ongoing work. In particular, the authors highlight that other groups have already used these bulk data to discover and validate cell-type-specific alternative splicing events, strengthening the case that the data are biologically meaningful beyond the immediate analyses presented here. The expanded analysis of non-coding RNAs and GPCR pseudogenes also adds biological interest.
(6) Improved handling and documentation of "unexpressed" genes. The authors have trimmed the original list of 4,440 genes called "unexpressed" in scRNA-seq to a higher-confidence subset and provide new supplementary tables that include gene identities and tissue annotations. They also use a curated set of non-neuronal markers to estimate residual contamination and show that most such markers are not detected in the integrated data, with only a small number of apparent false positives remaining.
Weaknesses
(1) Novel assignments remain predictive rather than experimentally validated. Although the authors have strengthened their benchmarking and refer to external work that validates some splicing patterns from these data, the large sets of newly assigned lowly expressed genes and non-coding RNAs-particularly those rescued from the "unexpressed" gene pool-are still inferred from computational criteria (thresholding plus correlation-based decontamination) rather than direct orthogonal assays (e.g., smFISH, in situ hybridization, or reporter lines). This is understandable given scale and cost, but it means that many of these calls should be interpreted as well-supported predictions, not definitive expression maps. The revised manuscript acknowledges this, and a dedicated "Limitations of this study" subsection will further clarify this point for readers.
(2) Reduced stability for neuron types with sparse single-cell representation. The authors' new analyses show that while integration improves overall correlation and AUROC across a wide range of neuron types, gene-level assignments are less stable for neuron classes represented by relatively few cells in the scRNA-seq reference. For such neuron types, both false negatives and false positives are more likely, and users should be cautious when interpreting cell-type-specific expression differences based solely on these calls.
(3) Residual contamination and misclassification are not completely eliminated. Despite the careful design of LittleBites and the additional correlation-based decontamination of "unexpressed" genes, the authors' benchmarking against curated non-neuronal markers shows that a small fraction of putative non-neuronal genes remains detectable even at stricter thresholds, and some bona fide neuronal genes are removed as likely contaminants. The new supplementary tables documenting "unexpressed" genes and their tissue annotations, together with explicit statements about residual error rates and the predictive nature of these classifications, help users to judge the reliability of specific genes, but they also underscore that the dataset is not a perfect ground truth.
(4) Scope and coverage remain incomplete. As the authors note, the dataset covers 53 neuron classes and does not fully represent all 302 neurons or all known neuron subtypes. In addition, bulk samples represent pools of neurons, and so the approach cannot resolve within-class heterogeneity or subtype-specific expression within those pools. These are inherent limitations of the current experimental design rather than flaws in the analysis, but they are important for readers to keep in mind when using the resource.
Overall, the revised manuscript presents solid evidence for the main methodological and resource claims, with clearly articulated limitations. The work is likely to have valuable impact on the C. elegans community and provides a template for integrating bulk and single-cell data in other systems.
Author response:
The following is the authors’ response to the original reviews
Public Reviews:
Reviewer #1 (Public review):
(1) As part of getting rid of cross-contamination in the bulk data, the authors model the scRNA data, extrapolate it to the bulk data and subtract out "contaminant" cell types. One wonders, however, given that low expressed genes are not represented in the scRNA data, whether the assignment of a gene to one or another cell type can really be made definitive. Indeed, it's possible that a gene is expressed at low levels in one cell, and high levels in another, and would therefore be considered a contaminant. The result would be to throw out genes that actually are expressed in a given cell type. The definitive list would therefore be a conservative estimate, and not necessarily the correct estimate.
We agree that the various strategies we employ do not result in perfect annotation of gene expression. However, despite their limitations, they are significantly better than either the single cell or the bulk data alone. We represent these strengths and shortcomings throughout the manuscript (for example, in ROC curves).
(2) It would be quite useful to have tested some genes with lower expression levels using in vivo gene-fusion reporters to assess whether the expression assignments hold up as predicted. i.e. provide another avenue of experimentation, non-computational, to confirm that the decontamination algorithm works.
We agree that evaluating only highly-expressed genes might introduce bias. We used a large battery of in vivo reporters, made with best-available technology (CRISPR insertion of the fluorophore into the endogenous locus) to evaluate our approaches. These reporters were constructed without bias in terms of gene expression and therefore represent both high and low expression levels. These data are represented throughout the manuscript (for example, in ROC curves). Details about the battery of reporters may be found in Taylor et al 2021. In addition to these reporters, this manuscript also generates and analyzes two other types of gene sets: non-neuronal and ubiquitous genes. Again, these genes are selected without bias toward gene expression, and the techniques presented here are benchmarked against them as well, with positive results.
(3) In many cases, each cell class would be composed of at least 2 if not more neurons. Is it possible that differences between members of a single class would be missed by applying the cleanup algorithms? Such transcripts would be represented only in a fraction of the cells isolated by scRNAseq, and might then be considered not real.
For the data set presented in this manuscript, all cells of a single neuron type were labeled and isolated together by FACS, and sequencing libraries were constructed from this pool of cells. Thus, potential subtypes within a particular type (when that type includes more than one cell) cannot be resolved by this method. By contrast, such subtypes were in some cases resolved in the single cell approach. To make the two data sets compatible with each other, for the single cell data we combined any subtypes together. We state in the Methods:
“For this work, single cell clusters of neuron subtypes were collapsed to the resolution of the bulk replicates (example: VB and VB1 clusters in the single cell data were treated as one VB cluster).”
(4) I didn't quite catch whether the precise staging of animals was matched between the bulk and scRNAseq datasets. Importantly, there are many genes whose expression is highly stage-specific or age-specific so even slight temporal differences might yield different sets of gene expression.
We agree that accurate staging is critically important for valid comparisons between data sets and have included an additional supplemental table with staging metadata for each sample. The staging protocol used for the bulk data set was initially employed to generate scRNA seq data and should be comparable. An additional description of our approach is now included in Methods:
“Populations of synchronized L1s were grown at 23 C until reaching the L4 stage on 150 mM 8P plates inoculated with Na22. The time in culture to reach the L4 stage varied (40.5-49 h) and was determined for each strain. 50-100 animals were inspected with a 40X DIC objective to determine developmental stage as scored by vulval morphology (Mok et al., 2025). Cultures were predominantly composed of L4 larvae but also typically included varying fractions of L3 larvae and adults.”
We have also updated supplementary table 1 to include additional information about each sort including the observed developmental stages and their proportions when available, the temperature the worms were grown at, the genotype of each experiment, and the number of cells collected in FACS.
(5) To what extent does FACS sorting affect gene expression? Can the authors provide some controls?
We appreciate this suggestion. We agree that FACS sorting (and also dissociation of the animals prior to sorting) might affect gene expression, particularly of stress-related transcripts. We note that dissociation and FACS sorting was also used to collect cells for our single cell data set (Taylor et al 2021). We would note that clean controls for this approach can be prohibitively difficult to collect, as the process of dissociation and FACS will inevitably change the proportion of cell types present in the sample, and for bulk sequencing efforts it is difficult even with deconvolution approaches to accurately account for changes in gene expression that result from dissociation and FACS, versus changes in gene expression that result from differences in cell type composition. We regrettably omitted a discussion of these issues in the manuscript. We now write in the Results:
“The dissociation and FACS steps used to isolate neuron types induce cellular stress responsive pathways (van den Brink et al., 2017; Kaletsky et al., 2016, Taylor 2021). Genes associated with this stress response (Taylor 2021) were not removed from downstream analyses, but should be viewed with caution.”
Reviewer #2 (Public review):
The bulk RNA-seq data collected by the authors has high levels of contamination and, in some cases, is based on very few cells. The methodology to remove contamination partly makes up for this shortcoming, but the high background levels of contaminating RNA in the FACS-isolated neurons limit the confidence in cell-specific transcripts.
We agree that these are the limitations of the source data. One of the manuscript’s main goals is to analyze and refine these source data, reducing these limitations and quantifying the results.
The study does not experimentally validate any of the refined gene expression predictions, which was one of the main strengths of the initial CenGEN publication (Taylor et al, 2021). No validation experiments (e.g., fluorescence reporters or single molecule FISH) were performed for protein-coding or non-coding genes, which makes it difficult for the reader to assess how much gene predictions are improved, other than for the gold standard set, which may have specific characteristics (e.g., bias toward high expression as they were primarily identified in fluorescence reporter experiments).
We agree that evaluating only highly-expressed genes might introduce bias. We used a large battery of in vivo reporters, made with best-available technology (CRISPR insertion of the fluorophore into the endogenous locus) to evaluate our approaches. These reporters were constructed without bias in terms of gene expression and therefore represent both high and low expression levels. These data are represented throughout the manuscript (for example, in ROC curves). Details about the battery of reporters may be found in Taylor et al 2021. In addition to these reporters, this manuscript also generates and analyzes two other types of gene sets: non-neuronal and ubiquitous genes. Again, these genes are selected without bias toward gene expression, and the techniques presented here are benchmarked against them as well, with positive results.
The study notes that bulk RNA-seq data, in contrast to scRNA-seq data, can be used to identify which isoforms are expressed in a given cell. However, no analysis or genome browser tracks were supplied in the study to take advantage of this important information. For the community, isoform-specific expression could guide the design of cell-specific expression constructs or for predictive modeling of gene expression based on machine learning.
We strongly agree that these datasets allow for new discoveries in neuronal splicing patterns and regulators, which is explored further in other publications from our group and other research groups in the field. We did not sufficiently highlight these works in the body of our text, and have added a reference in the discussion. “In addition, the bulk RNA-seq dataset contains transcript information across the gene body, which parallel efforts have used to identify mRNA splicing patterns that are not found in the scRNA-seq dataset.” These works can be found in references 26 and 27.
(1) The study relies on thresholding to determine whether a gene is expressed or not. While this is a common practice, the choice of threshold is not thoroughly justified. In particular, the choice of two uniform cutoffs across protein-encoding RNAs and of one distinct threshold for non-coding RNAs is somewhat arbitrary and has several limitations. This reviewer recommends the authors attempt to use adaptive threshold-methods that define gene expression thresholds on a per-gene basis. Some of these methods include GiniClust2, Brennecke's variance modeling, HVG in Seurat, BASiCS, and/or MAST Hurdle model for dropout correction.
We appreciate the reviewer’s suggestion, and would note that the integrated data currently incorporates some gene-specific weighting to identify gene expression patterns, as the single-cell data are weighted by maximum expression for each gene prior to integration with the LittleBites cleaned data. This gene level normalization markedly improved gene detection accuracy, and is discussed in depth in our 2021 Paper “Molecular topography of an entire nervous system”. We previously explored several methods for setting gene specific thresholds for identifying gene expression patterns in the integrated dataset. Unfortunately we found that none of the tested methods out performed setting “static” thresholds across all genes in the integrated dataset, and tended to increase false positive rates for some low abundance genes, where gene-specific thresholding can tend towards calling a gene expressed in at least one cell type when it is actually not expressed in any cell types present. These methods are likely to provide better results for expanded datasets that cover all tissue types (where one might reasonably expect that a gene is likely to be expressed in at least one sample).
(2) Most importantly, the study lacks independent experimental validation (e.g., qPCR, smFISH, or in situ hybridization) to confirm the expression of newly detected lowly expressed genes and non-coding RNAs. This is particularly important for validating novel neuronal non-coding RNAs, which are primarily inferred from computational approaches.
We agree that smFISH and related in situ validation methods would be an asset in this analysis. Unfortunately because most ncRNAs are very short, they are prohibitively difficult to accurately measure with smFISH. Many ncRNAs we attempted to assay with smFISH methods can bind at most 3 fluorescent probes, which unfortunately was not reliably distinguishable from background autofluorescence in the worm. Many published methods for smFISH signal amplification have not been optimized for C. elegans, and the tough cuticle is a major barrier for those efforts.
(3) The novel biology is somewhat limited. One potential area of exploration would be to look at cell-type specific alternative splicing events.
We appreciate this suggestion. Indeed, as we put our source data online prior to publishing this manuscript, two published papers already use this source data set to analyze alternative splicing. Further, these works include validation of splicing patterns observed in this source data, indicating the biological relevance of these data sets.
(4) The integration method disproportionately benefits neuron types with limited representation in scRNA-seq, meaning well-sampled neuron types may not show significant improvement. The authors should quantify the impact of this bias on the final dataset.
We agree that cell-types that are well represented in the single-cell dataset tend to have fewer new genes identified in the Integrated dataset than “rare” cell-types in the single cell data. However we would note that cell-types that are highly abundant in the single-cell data appear to become increasingly vulnerable to non-neuronal false positives, and that integration’s primary effect in high abundance cell-types appears to be reducing the false positive rate for non-neuronal genes. Thus we suggest that integration benefits all cell-types across the spectrum of single-cell abundance. The false positives are likely caused by a side-effect of normalization steps in the single-cell dataset, which is moderated by using the LittleBites cleaned bulk samples as an orthogonal measurement. The benefit of integration for cell-types with low abundance in the single-cell dataset is now quantified, and the benefits of integration for low and high abundance cell-types from the single cell data are described in the following section (p.13):
“To test the stability of LittleBites cleanup across different single-cell reference dataset qualities, we ran the algorithm on a set of bulk samples by first subsetting the corresponding single-cell cluster’s population to 10, 50, 100, or 500 cells. We performed this process 500 times for each subsampling rate for each sample (2000 total runs per sample). We found that testing gene AUROC values are stable across reference cluster sizes (Fig. 2D), suggesting that even if the target cell type is rarely represented in a single cell reference, accurate cleaning is still possible. However, comparing gene level stability across target cluster population levels reveals that low abundance references have higher gene level variance (Fig. 2E), lower purity estimates (Fig. S2F), higher variance in the mean expression across genes (Fig. S2G), and they tend to have lower overall expression (suggesting more aggressive subtraction) (Fig. S2H). This indicates that while binary gene calling is improved even if the reference cluster is small, users should be cautious when using fewer than 100 cells in their single cell reference cluster as the resulting cleanup is less stable.”
(5) The authors employ a logit transformation to model single-cell proportions into count space, but they need to clarify its assumptions and potential pitfalls (e.g., how it handles rare cell types).
We agree that the assumptions and pitfalls of the logit model are key for evaluating its usefulness, especially for cell types that are rarely captured in the single-cell dataset. The assumptions and pitfalls are described in the methods section, but we regretfully omitted any mention of those pitfalls in the results, which we have now rectified.
The description in the methods section is: “We applied this formula to our real single cell dataset and used this equation to transform proportion measures of gene expression into a count space to generate the Prop2Count dataset for downstream analysis and integration with bulk datasets. This procedure allows for proportions data to be used in downstream analyses that work with counts datasets. However, it does limit the range of potential values that each gene can have, with the potential values set as: 
As n approaches 0, the number of potential values decreases, which can be incompatible with some downstream models. Thus, caution should be used when applying this transformation to datasets with few cells.”
The new mention in the results is: “However, caution should be taken when using this approach in scRNAseq cases where all replicates of a cell type contain few cells. scProp2Count values are limited to the space of possible proportion values, and so replicates with low numbers of cells will have fewer potential expression “levels” which may break some model assumptions in downstream applications (see Methods).”
(6) The LittleBites approach is highly dependent on the accuracy of existing single-cell references. If the scRNA-seq dataset is incomplete or contains classification biases, this could propagate errors into the bulk RNA-seq data. The authors may want to discuss potential limitations and sensitivity to errors in the single-cell dataset, and it is critical to define minimum quality parameters (e.g. via modeling) for the scRNAseq dataset used as reference.
We appreciate this suggestion, and agree that manuscript would benefit from a description of where the LittleBites method can give poor results. To this end, we subset our single cell reference for individual neurons of interest to the level of 10, 50, 100, or 500 cells (500 iterations per sample rate), and then ran Littlebites, and compared metrics of gene expression stability, sample composition estimates, and AUROC performance on test genes. We found that when fewer than 100 cells for the target cell type are included in the single cell reference, gene expression stability drops (variance between subsampling iterations was much higher when fewer reference cells were used). However, we found that AUROC values were consistently high regardless of how many reference cells were included, but that this stability in AUROC values was paired with lower overall counts in samples with <100 reference cells after cleanup. This indicates that in cases where few reference cells are present, higher AUROC values might be achieved by more aggressive subtraction, which is attenuated when the reference model is more complete. This analysis is shown in figure 2 and figure S2, and described in the results section, recreated here.
“To test the stability of Littlebites cleanup across different single-cell reference dataset qualities, we ran the algorithm on a set of bulk samples by first subsetting the corresponding single-cell cluster’s population to 10, 50, 100, or 500 cells. We performed this process 500 times for each subsampling rate for each sample (2000 total runs per sample). We found that testing gene AUROC values are stable across reference cluster sizes (Fig. 2D), suggesting that even if the target cell type is rarely represented in a single cell reference, accurate cleaning is still possible. However, comparing gene level stability across target cluster population levels reveals that low population references have higher gene level variance (Fig. 2E), lower purity estimates (Fig. S2F), higher variance in the mean expression across genes (Fig. S2G), and they tend to have lower overall expression (suggesting more aggressive subtraction) (Fig. S2H). This suggests that while binary gene calling is improved similarly even if the reference cluster is small, users should be cautious when using less than 100 cells in their single cell reference cluster as the resulting cleanup is less stable.”
(7) Also very important, the LittleBites method could benefit from a more intuitive explanation and schematic to improve accessibility for non-computational readers. A supplementary step-by-step breakdown of the subtraction process would be useful.
We appreciate this suggestion and implemented a step-by-steo breakdown of the subtraction process in the methods section, also copied below. We also updated the graphic representation in figure 2A.
“LittleBites Subtraction algorithm
LittleBites is an iterative algorithm for bulk RNA-seq datasets, that improves the accuracy of cell-type specific bulk RNA-seq sample profiles by removing counts from non-target contaminants (e.g. ambient RNA from dead cells, carry-over non-target cells from FACS enrichment due to imperfect gating). This method leverages single cell reference datasets and ground truth expression information to guide iterative and conservative subtraction to enrich for true target cell-type expression. Using this approach, LittleBites balances subtraction by optimizing using both a single-cell reference, and an orthogonal ground truth reference, moderating biases inherent to either reference.
This algorithm first calculates gene level specificity weights in a single cell reference dataset using SPM (Specificity Preservation Method) (re-add 22, re-add 23). SPM assigns high weights (approaching 1) to genes expressed in single cell types while applying conservative weights to genes with broader expression patterns, which helps to reduce inappropriate subtraction.
The algorithm proceeds in a loop of three steps:
Step 1: Estimate Contamination. Each bulk sample is modeled as the sum of a linear combination of single-cell profiles (target cell type and likely contaminants) using non-negative least squares (NNLS). The resulting coefficients provide the estimate of how much of the sample’s counts come from the target cell-type, and how much comes from each contaminant cell-type.
Step 2: Weighted Subtraction. Each bulk sample is cleaned by subtracting the weighted sum of contaminant single-cell profiles. This subtraction is attempted multiple times (separately) across a series of learning rate weights (usually ranging from 0-1) which moderate the size of the subtraction step (Equation 1). This produces a range of possible “cleaned” sample options for evaluation.
Step 3: Performance Optimization. For each learning rate, the cleaned result is evaluated against a set of ground truth genes by calculating the area under the receiver operating characteristic curve (AUROC). The learning rate that optimizes the AUROC is then selected. When multiple learning rates yielded equivalent AUROC values, the lowest learning rate value is chosen to minimize subtraction.
If the optimal learning rate at Step 3 is 0 (no subtraction option beats the baseline) then the loop is halted. Else, the cleaned bulk profile is returned to Step 1, and the loop continues until the AUROC cannot be improved upon using the single-cell reference modeling.“
(8) In the same vein, the ROC curves and AUROC comparisons should have clearer annotations to make results more interpretable for readers unfamiliar with these metrics.
We agree that the ROC and AUROC metrics need a clearer explanation to make their use and interpretations clearer. We included a description of both metrics, and a suggestion for how to interpret them in the results section, copied below.
“To evaluate the post-subtraction datasets accuracy we used the area under the Receiver-Operator Characteristic (AUROC) score. In brief, we set a wide range of thresholds to call genes expressed or unexpressed, and then compared it to expected expression from a set of ground truth genes. This comparison produces a true positive rate (TPR, the percentage of truly expressed genes that are called expressed), and false positive rate (FPR, the percentage of truly not expressed genes that are called expressed), and a false discovery rate (FDR, the percentage of genes called expressed that are truly not expressed). The Receiver-Operator Characteristic (ROC) is the graph of the line produced by the TPR and FPR values across the range of thresholds tested, and the AUROC is calculated as the sum of the area under that line. A “random” model of gene expression is expected to have an AUROC value of 0.5, and a “perfect” model is expected to have an AUROC value of 1. Thus, AUROCs below 0.5 are worse than a random guess, and values closer to 1 indicate higher accuracy.”
(9) Finally, after the correlation-based decontamination of the 4,440 'unexpressed' genes, how many were ultimately discarded as non-neuronal?
a) Among these non-neuronal genes, how many were actually known neuronal genes or components of neuronal pathways (e.g., genes involved in serotonin synthesis, synaptic function, or axon guidance)?
b) Conversely, among the "unexpressed" genes classified as neuronal, how many were likely not neuron-specific (e.g., housekeeping genes) or even clearly non-neuronal (e.g., myosin or other muscle-specific markers)?
Combined with point 10, see below.
(10) To increase transparency and allow readers to probe false positives and false negatives, I suggest the inclusion of:
a) The full list of all 4,440 'unexpressed' genes and their classification at each refinement step. In that list flag the subsets of genes potentially misclassified, including:
- Neuronal genes wrongly discarded as non-neuronal.
- Non-neuronal genes wrongly retained as neuronal.
b) Add a certainty or likelihood ranking that quantifies confidence in each classification decision, helping readers validate neuronal vs. non-neuronal RNA assignments.
This addition would enhance transparency, reproducibility, and community engagement, ensuring that key neuronal genes are not erroneously discarded while minimizing false positives from contaminant-derived transcripts.
We agree that the genes called “unexpressed” in the single-cell data need more context and clarity. First, we trimmed the list to only include 2,333 genes of highest confidence. Second, for those genes we identified any with published neuronal expression patterns. Identifying genes that were retained as neuronal but are likely non-neuronal in origin is difficult as many markers are expressed in a mixture of neuronal and non-neuronal cell-types, however we used a curated list of putative non-neuronal markers to assess the accuracy of the integrated data (see supplementary table 4), and established that most non-neuronal markers are undetected in the integrated data, with the number of detected genes decreasing as our threshold stringency increases. Of note, a few putative non-neuronal genes remain detected even at high thresholds, indicating that our dataset still retains a small percentage of neuronal false positives. This result has been collected in the new supplementary figure 4F, and addressed in the following text in the results section “Testing against a curated list of non-neuronal genes from fluorescent reporters and genomic enrichment studies, we found that of 445 non-neuronal markers, each gene was detected in an average of 12.5 cells or a median of 3 cells in the single-cell dataset, and an average of 8.7 cells or a median of 1 cell in the integrated dataset, at a 14% FDR threshold.”
We also included a list of “unexpressed” gene identities and tissue annotations as new supplementary tables 16 and 17.
Reviewer #2 (Recommendations for the authors):
The utility of the bulk RNA-seq data would be significantly increased if the authors were to analyze which isoforms are expressed in individual neurons. Also, it would be very useful to know if there are instances where a gene is expressed in several neurons, but different isoforms are specific to individual neurons.
We appreciate this suggestion. Indeed, as we put our source data online prior to publishing this manuscript, two published papers already use this source data set to analyze alternative splicing. Further, these works include validation of splicing patterns observed in this source data, indicating the biological relevance of these data sets. This is now noted in our discussion section “In addition, the bulk RNA-seq dataset contains transcript information across the gene body, which parallel efforts have used to identify mRNA splicing patterns that are not found in the scRNA-seq dataset.” These works can be found in references 26 and 27.
Reviewer #3 (Recommendations for the authors):
(1) Describe the number of L4 animals processed to obtain good-quality bulk RNAseq libraries from the different neuronal types. If the number of worms would be different for different neuronal types, then please make a supplementary table listing the minimum number of worms needed for each neuronal type.
We appreciate the reviewer’s recommendation, and agree that it would be a useful resource to provide suggestions for how many worms are needed per experiment. Unfortunately We did not track the total number of animals for each sample. We aimed to start with 200-300 ul of packed worms for each strain, generally equating to >500,000 worms, but yields of FACS-isolated cells varied among cell types. Because replicates for specific neuron types were also variable in some instances (See additions to supplemental Table 1), yields likely depend on multiple factors. We have previously noted (Taylor et al., 2021), for example, that some cell types were under-represented in scRNA-seq data (e.g, pharyngeal neurons) based on in vivo abundance presumptively due to the difficulty of isolation or sub-viability in the cell dissociation-FACS protocol.
(2) List the thresholds for the parameters used during the FASTQC quality control and the threshold number of reads that would make a sample not useful.
We now include parameters for sample exclusion in the methods section. “Samples were excluded after sequencing if they had: fewer than 1 million read pairs, <1% of uniquely mapping reads to the C. elegans genome, > 50% duplicate reads (low umi diversity), or failed deduplication steps in the nudup package.”
(3) In Figure 5C, include an overlapping bar that shows the total number of genes in each cell type. You may need to use a log scale to see both (new and all) numbers of genes in the same graph. Add supplementary tables with the names of all new genes assigned to each neuronal type.
We agree that this figure panel needed additional context. On further reflection we concluded that figure 5 was not sufficiently distinct from figure 4 to warrant separation, and incorporated some key findings from figure 5 into figure S4.
Reviewer #2 (Public review):
This study conducted by Lu et al. explores the molecular underpinnings of sexual dimorphism in antiviral immunity in zebrafish, with a particular emphasis on the male-biased gene cyp17a2. The authors demonstrate that male zebrafish exhibit stronger antiviral responses than females, and they identify a teleost-specific gene cyp17a2 as a key regulator of this dimorphism. Utilizing a combination of in vivo and in vitro methodologies, they demonstrate that Cyp17a2 potentiates IFN responses by stabilizing STING via K33-linked polyubiquitination and directly degrades the viral P protein via USP8-mediated deubiquitination. The work challenges conventional views of sex-based immunity and proposes a novel, hormone- and sex chromosome-independent mechanism.
Strengths:
(1) The following constitutes a novel concept, sexual dimorphism in immunity can be driven by an autosomal gene rather than sex chromosomes or hormones represents a significant advance in the field, offering a more comprehensive understanding of immune evolution.
(2) The present study provides a comprehensive molecular pathway, from gene expression to protein-protein interactions and post-translational modifications, thereby establishing a link between Cyp17a2 and both host immune enhancement (via STING) and direct antiviral activity (via viral protein degradation).
(3) In order to substantiate their claims, the authors utilize a wide range of techniques, including transcriptomics, Co-IP, ubiquitination assays, confocal microscopy, and knockout models.
(4) The utilization of a singular model is imperative. Zebrafish, which are characterized by their absence of sex chromosomes, offer a clear genetic background for the dissection of autosomal contributions to sexual dimorphism.
Author response:
The following is the authors’ response to the previous reviews
Public Reviews:
Reviewer #1 (Public review):
Weaknesses:
(1) Weaknesses of this study include a proposed mechanism underlying the sexual dimorphism phenotype based on experimentation in only males, and widespread reliance on over-expression when investigating protein-protein interaction and localization. Additionally, a minor weakness is that the text describing the identification of cyp17a2 as a candidate contains errors that are confusing.
We thank the reviewer for these insightful comments, which have helped us improve the manuscript.
(1) Experimentation in males. We focused on male zebrafish for our mechanistic studies to preclude potential confounding effects from female hormones and to directly interrogate the basis of the observed male-biased resistance. As confirmed in the manuscript (lines 151-153), both wild-type and cyp17a2⁻/⁻ males developed normal male sex organs and exhibited comparable androgen levels. This crucial control gives us confidence that the differences in antiviral immunity we observed are a direct consequence of Cyp17a2 loss-of-function, rather than secondary to developmental or hormonal abnormalities. We fully agree that elucidating the mechanism in females represents a valuable and interesting direction for future research.
(2) Over-expression studies. We acknowledge that overexpression approaches can have inherent limitations. To mitigate this and strengthen our conclusions, we complemented these experiments with loss-of-function data from both knockout zebrafish and knockdown cells, as well as validation at the endogenous level (e.g., Fig. 4J and S4C). The consistent results obtained across these diverse experimental models collectively reinforce our conclusion that Cyp17a2 interacts with and stabilizes STING.
(3) We thank the reviewer for pointing out the lack of clarity in the text regarding the selection process of Cyp17a2. We have thoroughly revised the manuscript to provide a precise and accurate description of our methodology. The relevant text is now as follows: “Differential expression analysis identified 1511 upregulated and 1117 downregulated genes (Fig. 2A and Table S2). We then focused on a subset of known or putative sexrelated genes. Among these eight candidates, cyp17a2 exhibited the most significant male-biased upregulation, a finding that was subsequently confirmed by qPCR (Fig. 2B and S1A)” (lines 142-144).
(2) Lines 139-140 describe the data for Figure 2 as deriving from "healthy hermaphroditic adult zebrafish". This appears to be a language error and should be corrected to something that specifies that the comparison made is between healthy adult male and female kidneys.
We thank the reviewer for pointing out this inaccuracy. This was a terminological error, and we have corrected the text to accurately state “transcriptome sequencing was performed on head-kidney tissues from healthy adult male and female zebrafish” (lines 139-140). We have carefully reviewed the manuscript to ensure no similar errors are present.
(3) In Figure 2A and associated text cyp17a2 is highlighted but the volcano plot does not indicate why this was an obvious choice. For example, many other genes are also highly induced in male vs female kidneys. Figure 2B and line 143 describe a subset of "eight sex-related genes" but it is not clear how these relate to Figure 2A. The narrative could be improved to clarify how cyp17a2 was selected from Figure 2A and it seems that the authors made an attempt to do this with Figure 2B but it is not clear how these are related. This is important because the available data do not rule out the possibility that other factors also mediate the sexual dimorphism they observed either in combination, in a redundant fashion, or in a more complex genetic fashion. The narrative of the text and title suggests that they consider this to be a monogenic trait but more evidence is needed.
We thank the reviewer for raising these important points. We have revised the manuscript to clarify the candidate gene selection process and to avoid any implication that the trait is monogenic.
The selection of cyp17a2 was not based solely on its position in the volcano plot (Fig. 2A), but on a multi-faceted rationale. We first prioritized genes with known or putative sex-related functions from the pool of differentially expressed genes. From this subset, cyp17a2 emerged as the lead candidate due to a combination of unique attributes, it exhibited the most significant and consistent male-biased upregulation among the validated candidates (Fig. 2B and S1A); it is a teleost-specific autosomal gene, suggesting a novel mechanism for sexual dimorphism independent of canonical sex chromosomes; and it showed conserved male-biased expression across multiple tissues (Fig. 2C and 2D). Regarding its representation in the volcano plot, cyp17a2 was included in the underlying dataset but was not explicitly labeled in the revised Figure 2A to maintain visual clarity, as the plot aimed to illustrate the global transcriptomic landscape rather than highlight individual genes.
We agree with the reviewer that other genetic factors may contribute to the observed sexual dimorphism. Accordingly, we have modified the text throughout the manuscript to remove any suggestion of a purely monogenic trait. Our functional data position cyp17a2 as a key and sufficient factor, as its knockout in males was sufficient to ablate the antiviral resistance phenotype (Fig. 2E-G), demonstrating a major, nonredundant role without precluding potential contributions from other genes.
The following specific changes have been made to the text.
(1) The title has been revised by replacing “governs” with “orchestrates.” (line 1)
(2) The abstract now states “the male-biased gene cyp17a2 as a critical mediator of this enhanced response” instead of “which are driven by the male-biased gene Cyp17a2 rather than by hormones or sex chromosomes.” (lines 33-34)
(3) The discussion now states “Our study leverages this unique context to demonstrate that enhanced antiviral immunity in males is mediated by the male-biased expression of the autosomal gene cyp17a2,” removing the comparative phrasing regarding hormones or sex chromosomes. (lines 364-366)
Reviewer #1 (Public review):
Summary:
This study employed a saccade-shifting sequential working memory paradigm, manipulating whether a saccade occurred after each memory array to directly compare retinotopic and transsaccadic working memory for both spatial location and color. Across four participant groups (young and older healthy adults, and patients with Parkinson's disease and Alzheimer's disease), the authors found a consistent saccade-related cost specifically for spatial memory - but not for color - regardless of differences in memory precision. Using computational modeling, they demonstrate that data from healthy participants are best explained by a complex saccade-based updating model that incorporates distractor interference. Applying this model to the patient groups further elucidates the sources of spatial memory deficits in PD and AD. The authors then extend the model to explain copying deficits in these patient groups, providing evidence for the ecological validity of the proposed saccade-updating retinotopic mechanism.
Strengths:
Overall, the manuscript is well written, and the experimental design is both novel and appropriate for addressing the authors' key research questions. I found the study to be particularly comprehensive: it first characterizes saccade-related costs in healthy young adults, then replicates these findings in healthy older adults, demonstrating how this "remapping" cost in spatial working memory is age-independent. After establishing and validating the best-fitting model using data from both healthy groups, the authors apply this model to clinical populations to identify potential mechanisms underlying their spatial memory impairments. The computational modeling results offer a clearer framework for interpreting ambiguities between allocentric and retinotopic spatial representations, providing valuable insight into how the brain represents and updates visual information across saccades. Moreover, the findings from the older adult and patient groups highlight factors that may contribute to spatial working memory deficits in aging and neurological disease, underscoring the broader translational significance of this work.
Weaknesses:
Several concerns should be addressed to enhance the clarity of the manuscript:
(1) Relevance of the figure-copy results (pp. 13-15).
Is it necessary to include the figure-copy task results within the main text? The manuscript already presents a clear and coherent narrative without this section. The figure-copy task represents a substantial shift from the LOCUS paradigm to an entirely different task that does not measure the same construct. Moreover, the ROCF findings are not fully consistent with the LOCUS results, which introduces confusion and weakens the manuscript's coherence. While I understand the authors' intention to assess the ecological validity of their model, this section does not effectively strengthen the manuscript and may be better removed or placed in the Supplementary Materials.
(2) Model fitting across age groups (p. 9).
It is unclear whether it is appropriate to fit healthy young and healthy elderly participants' data to the same model simultaneously. If the goal of the model fitting is to account for behavioral performance across all conditions, combining these groups may be problematic, as the groups differ significantly in overall performance despite showing similar remapping costs. This suggests that model performance might differ meaningfully between age groups. For example, in Figure 4A, participants 22-42 (presumably the elderly group) show the best fit for the Dual (Saccade) model, implying that the Interference component may contribute less to explaining elderly performance.
Furthermore, although the most complex model emerges as the best-fitting model, the manuscript should explain how model complexity is penalized or balanced in the model comparison procedure. Additionally, are Fixation Decay and Saccade Update necessarily alternative mechanisms? Could both contribute simultaneously to spatial memory representation? A model that includes both mechanisms-e.g., Dual (Fixation) + Dual (Saccade) + Interference-could be tested to determine whether it outperforms Model 7 to rule out the sole contribution of complexity.
Minor point: On p. 9, line 336, Figure 4A does not appear to include the red dashed vertical line that is mentioned as separating the age groups.
(3) Clarification of conceptual terminology.
Some conceptual distinctions are unclear. For example, the relationship between "retinal memory" and "transsaccadic memory," as well as between "allocentric map" and "retinotopic representation," is not fully explained. Are these constructs related or distinct? Additionally, the manuscript uses terms such as "allocentric map," "retinotopic representation," and "reference frame" interchangeably, which creates ambiguity. It would be helpful for the authors to clarify the relationships among these terms and apply them consistently.
(4) Rationale for the selective disruption hypothesis (p. 4, lines 153-154).
The authors hypothesize that "saccades would selectively disrupt location memory while leaving colour memory intact." Providing theoretical or empirical justification for this prediction would strengthen the argument.
(5) Relationship between saccade cost and individual memory performance (p. 4, last paragraph).
The authors report that larger saccades were associated with greater spatial memory disruption. It would be informative to examine whether individual differences in the magnitude of saccade cost correlate with participants' overall/baseline memory performance (e.g. their memory precision in the no-saccade condition). Such analyses might offer insights into how memory capacity/ability relates to resilience against saccade-induced updating.
(6) Model fitting for the healthy elderly group to reveal memory-deficit factors (pp. 11-12).
The manuscript discusses model-based insights into components that contribute to spatial memory deficits in AD and PD, but does not discuss components that contribute to spatial memory deficits in the healthy elderly group. Given that the EC group also shows impairments in certain parameters, explaining and discussing these outcomes of the EC group could provide additional insights into age-related memory decline, which would strengthen the study's broader conclusions.
(7) Presentation of saccade conditions in Figure 5 (p. 11).
In Figure 5, it may be clearer to group the four saccade conditions together within each patient group. Since the main point is that saccadic interference on spatial memory remains robust across patient groups, grouping conditions by patient type rather than intermixing conditions would emphasize this interpretation.
Reviewer #3 (Public review):
Summary:
The manuscript introduces a visual paradigm aimed at studying trans-saccadic memory.
The authors observe how memory of object location is selectively impaired across eye movements, whereas object colour memory is relatively immune to intervening eye movements.<br /> Results are reported for young and elderly healthy controls, as well as PD and AD participants.
A computational model is introduced to account for these results, indicating how early differences in memory encoding and decay (but not trans-saccadic updating per se) can account for the observed differences between healthy controls and clinical groups.
Strengths:
The data presented encompasses healthy and elderly controls, as well as clinical groups.
The authors introduce an interesting modelling strategy, aimed at isolating and identifying the main components behind the observed pattern of results.
Weaknesses:
The models tested differ in terms of the number of parameters. In general, a larger number of parameters leads to a better goodness of fit. It is not clear how the difference in the number of parameters between the models was taken into account.
It is not clear whether the modelling results could be influenced by overfitting (it is not clear how well the model can generalize to new observations).
Results specificity: it is not clear how specific the modelling results are with respect to constructional ability (measured via the Rey-Osterrieth Complex Figure test). As with any cognitive test, performance can also be influenced by general, non-specific abilities that contribute broadly to test success.
Author response:
(1) About ROCF figure-copy results
Reviewer #1 queried the necessity of including the Rey-Osterrieth Complex Figure (ROCF) results in the main text. We appreciate the reviewer’s perspective on the narrative flow and the transition between the LOCUS paradigm and the ROCF results. However, we remain keen to retain these findings in the main tex, as they provide critical ecological and clinical validation for the computational mechanisms identified in our study.
We argue that the following points support the retention of these results:
(1) The ROCF we used is a standard neuropsychological tool for identifying constructional apraxia. Our results bridge the gap between basic cognitive neuroscience and clinical application by demonstrating that specific remapping parameters—rather than general memory precision—predict real-world deficits in patients.
(2) The finding that our winning model explains approximately 62% of the variance in ROCF copy scores across all diagnostic groups further indicates that these parameters from the LOCUS task represent core computational phenotypes that underpin complex, real-life visuospatial construction (copying drawings).
(3) Previous research has often observed only a weak or indirect link between drawing ability and traditional working memory measures, such as digit span (Senese et al., 2020). This was previously attributed to “deictic” strategies—like frequent eye movements—that minimise the need to hold large amounts of information in memory (Ballard et al., 1995; Cohen, 2005; Draschkow et al., 2021). While our study was not exclusively designed to catalogue all cognitive contributions to drawing, our findings provide significant and novel evidence indicating that transsaccadic integration is a critical driver of constructional (copying drawing) ability. By demonstrating this link, we offer a new direction for future research, shifting the focus from general memory capacity toward the precision of spatial updating across eye movements.
By including the ROCF results in the main text, we provide evidence for a functional role for spatial remapping that extends beyond perceptual stability into the domain of complex visuomotor control. We will expand on these points in the Discussion in our revised manuscript.
(2) Model complexity and overfitting
We would like to clarify that the Bayesian model selection (BMS) procedure utilised in this manuscript inherently balances model fit with parsimony. Unlike maximum likelihood inference, where overfitting is a primary concern often requiring cross-validation via out-of-sample prediction, our approach depends upon the comparison of marginal likelihoods. This method directly penalises model complexity — a principle often described as the “Bayesian Occam’s Razor” (Rasmussen and Ghahramani, 2000). This means that a model is only favoured if the improvement in fit justifies the additional parameter space. If a parameter were redundant, it would lower the model's evidence by “diluting” the probability mass over the parameter space. The emergence of the “Dual (Saccade) + Interference” model as the winning candidate suggests it offers the most plausible generative account of the data while maintaining necessary parsimony. We would be happy to point toward literature that discusses how these marginal likelihood approximations provide a more robust guard against overfitting than standard metrics like BIC or AIC (MacKay, 2003; Murray and Ghahramani, 2005; Penny, 2012).
(3) On model fitting across age groups
This approach is primarily supported by our empirical findings: there was no significant interaction between age group and saccade condition for either location or colour memory. While older adults demonstrated lower baseline precision, the specific disruptive effect of saccades (the “saccade cost”) was remarkably consistent across cohorts. This justifies the use of a common generative model to assess quantitative differences in parameter estimates.
This approach does implicitly assume that participants perform the task in a qualitatively similar way. However, as this assumption is mitigated by the fact that our winning model nests simpler models as special cases, it supports the assessment of group differences in parameters that play consistent mechanistic roles. This flexibility allows the model to naturally accommodate groups where certain components—such as interference—may play a reduced role, while remaining sensitive to the specific mechanistic failures that differentiate healthy aging from neurodegeneration.
(4) Conceptual terminology and patient group descriptions
We will clarify our conceptual terminology, explicitly defining the relationships between retinotopic (eye-centred), transsaccadic (across-saccade), and spatiotopic (world-centred) representations.
Regarding the demographics of the clinical cohorts, we apologise for any lack of clarity in our initial presentation. The patient demographics for both the Parkinson’s disease (PD) and Alzheimer’s disease (AD) groups—including age, gender, education, and ACE-III scores—are currently detailed alongside the healthy control data (two groups: Young Healthy Controls and Elderly Healthy Controls) in the table within the Participants section of the Materials and Methods. In our revision. We will ensure that this table is correctly labelled as Table 2 and will provide more comprehensive recruitment and characterisation details for both patient groups within the main text. Finally, we will include a detailed discussion in the Supplementary Materials regarding eye-tracking data quality across all cohorts, specifically comparing calibration accuracy, trace stability, and trial rejection rates to demonstrate that our findings are not confounded by differences in recording quality between healthy and clinical populations.
References
Ballard DH, Hayhoe MM, Pelz JB. 1995. Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience 7:66–80. DOI: https://doi.org/10.1162/jocn.1995.7.1.66
Cohen DJ. 2005. Look little, look often: The influence of gaze frequency on drawing accuracy. Perception & Psychophysics 67:997–1009. DOI: https://doi.org/10.3758/BF03193626
Draschkow D, Kallmayer M, Nobre AC. 2021. When Natural Behavior Engages Working Memory. Current Biology 31:869-874.e5. DOI: https://doi.org/10.1016/j.cub.2020.11.013, PMID: 33278355
MacKay DJC. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press.
Murray I, Ghahramani Z. 2005. A note on the evidence and Bayesian Occam’s razor (Technical report No. GCNU TR 2005-003). Gatsby Unit.
Penny WD. 2012. Comparing Dynamic Causal Models using AIC, BIC and Free Energy. Neuroimage 59:319–330. DOI: https://doi.org/10.1016/j.neuroimage.2011.07.039, PMID: 21864690
Rasmussen C, Ghahramani Z. 2000. Occam’ s Razor. Advances in Neural Information Processing Systems. MIT Press.
Senese VP, Zappullo I, Baiano C, Zoccolotti P, Monaco M, Conson M. 2020. Identifying neuropsychological predictors of drawing skills in elementary school children. Child Neuropsychology 26:345–361. DOI: https://doi.org/10.1080/09297049.2019.1651834, PMID: 31390949
28733
DOI: 10.21203/rs.3.rs-7634140/v1
Resource: RRID:BDSC_28733
Curator: @maulamb
SciCrunch record: RRID:BDSC_28733
7194
DOI: 10.21203/rs.3.rs-7634140/v1
Resource: RRID:BDSC_7194
Curator: @maulamb
SciCrunch record: RRID:BDSC_7194
8442
DOI: 10.21203/rs.3.rs-7634140/v1
Resource: RRID:BDSC_8442
Curator: @maulamb
SciCrunch record: RRID:BDSC_8442
3605
DOI: 10.21203/rs.3.rs-7634140/v1
Resource: RRID:BDSC_3605
Curator: @maulamb
SciCrunch record: RRID:BDSC_3605
32185
DOI: 10.21203/rs.3.rs-7634140/v1
Resource: RRID:BDSC_32185
Curator: @maulamb
SciCrunch record: RRID:BDSC_32185
BL36782
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_36782
Curator: @maulamb
SciCrunch record: RRID:BDSC_36782
BL80939
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_80939
Curator: @maulamb
SciCrunch record: RRID:BDSC_80939
BL35155
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_35155
Curator: @maulamb
SciCrunch record: RRID:BDSC_35155
BL34336
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_34336
Curator: @maulamb
SciCrunch record: RRID:BDSC_34336
BL57404
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_57404
Curator: @maulamb
SciCrunch record: RRID:BDSC_57404
BL35338
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_35338
Curator: @maulamb
SciCrunch record: RRID:BDSC_35338
BL35218
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_35218
Curator: @scibot
SciCrunch record: RRID:BDSC_35218
BL35337
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_35337
Curator: @scibot
SciCrunch record: RRID:BDSC_35337
BL5905
DOI: 10.21203/rs.3.rs-7562944/v1
Resource: RRID:BDSC_5905
Curator: @scibot
SciCrunch record: RRID:BDSC_5905
Bloomington stock no. 24749
DOI: 10.3389/fphar.2025.1674737
Resource: RRID:BDSC_24749
Curator: @scibot
SciCrunch record: RRID:BDSC_24749
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This manuscript describes the use of computational tools to design a mimetic of the interleukin-7 (IL-7) cytokine with superior stability and receptor binding activity compared to the naturally occurring molecule. The authors focused their engineering efforts on the loop regions to preserve receptor interfaces while remediating structural irregularities that destabilize the protein. They demonstrated the enhanced thermostability, production yield, and bioactivity of the resulting molecule through biophysical and functional studies. Overall, the manuscript is well written, novel, and of high interest to the fields of molecular engineering, immunology, biophysics, and protein therapeutic design. The experimental methodologies used are convincing; however, the article would benefit from more quantitative comparisons of bioactivity through titrations.
Reviewer #2 (Public review):
Summary:
This manuscript presents the computational design and experimental validation of Neo-7, an engineered variant of interleukin-7 (IL-7) with improved folding efficiency, expression yield, and therapeutic activity. The authors employed a rational protein design approach using Rosetta loop remodeling to reconnect IL-7's functional helices through shorter, more efficient loops, resulting in a protein with superior stability and binding affinity compared to wild-type IL-7. The work demonstrates promising translational potential for cancer immunotherapy applications.
Strengths:
(1) The integration of Rosetta loop remodeling with AlphaFold validation represents an established computational pipeline for rational protein design. The iterative refinement process, using both single-sequence and multimer AlphaFold predictions, is methodologically sound.
(2) The authors provide thorough characterization across multiple platforms (yeast display, bacterial expression, mammalian cell expression) and assays (binding kinetics, thermostability, bioactivity), strengthening the robustness of their findings.
(3) The identification of the critical helix 1 kink stabilized by disulfide bonding and its recreation through G4C/L96C mutations demonstrates deep structural understanding and successful problem-solving.
(4) The MC38 tumor model results show clear therapeutic advantages of Neo-7 variants, with compelling immune profiling data supporting CD8+ T cell-mediated anti-tumor mechanisms.
(5) The transcriptomic profiling provides valuable mechanistic insights into T cell activation states and suggests reduced exhaustion markers, which are clinically relevant.
Weaknesses:
(1) While computational predictions are extensive, the manuscript lacks experimental structural validation of the designed Neo-7 variants. The term "Structural Validation" should not be used in the header.
We thank the reviewer for this constructive comment. To better reflect the work conducted, we have revised the section title from “Structural Validation of Neo-7 in AlphaFold single sequence mode” to “Structural Modeling of Neo-7 in AlphaFold single sequence mode.” This change clarifies that our study employed in silico modeling approaches rather than experimental structural validation.
We thank the reviewer for this insightful comment. We speculate that the slower off-rate observed for Neo-7 variants is primarily attributable to their enhanced structural stability, which promotes the formation of a more stable cytokine–receptor complex. This is consistent with prior observations in other engineered cytokines, such as IL-2 mimetics (Neo-2/15).
In terms of biological consequences, we believe the slower off-rate is unlikely to result in signaling bias or qualitatively distinct pathways for several reasons:
IL-7’s mechanism of action is inherently regulated to prevent over-signaling. T cells downregulate IL7R-α expression upon IL-7 stimulation, ensuring a built-in negative feedback mechanism.
IL-7 signaling is dominated by STAT5 activation, without the signaling plasticity observed in cytokines like IL-21 or IL-22, which can bias toward STAT1/3 and drive divergent functional outcomes.
Our RNA-seq data support this interpretation, as Neo-7–treated CD8⁺ T cells exhibited transcriptional profiles highly similar to those induced by WT-IL-7, with the difference being an enhanced magnitude of response rather than novel pathway engagement.
Taken together, we infer that the slower off-rate of Neo-7 enhances the potency and durability of IL-7 signaling without altering its downstream specificity, thereby strengthening the magnitude of immune responses while maintaining the canonical STAT5-driven biology of IL-7.
(3) While computational immunogenicity prediction is provided, these methods are very limited.
We fully agree with the reviewer that current in silico immunogenicity prediction tools are limited and cannot be considered definitive. Indeed, to date, none of these algorithms has demonstrated a strong correlation with clinical immunogenicity outcomes of biologics. For example, the presence of anti-drug antibodies (ADA) in murine or non-human primate models often does not translate into ADA induction in human clinical trials. This disconnect underscores the inherent challenges of predicting immunogenicity based solely on computational or preclinical models.
Our strategy to mitigate potential immunogenicity was therefore not to rely exclusively on prediction software, but instead to apply a conservative design principle: preserving the vast majority of the parental IL-7 sequence while introducing only the minimal number of amino acid substitutions required to achieve our engineering objectives. By maintaining sequence continuity with the native cytokine, we aim to minimize the risk of introducing novel epitopes while improving stability and developability. We acknowledge that definitive immunogenicity assessment can only be addressed in future clinical studies.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Specific Points:
(1) The authors should describe the molecular composition of CYT-107.
We thank the reviewer for this suggestion and have added clarification regarding the molecular composition of CYT-107. CYT-107 is a recombinant form of wild-type human interleukin-7 (IL-7) expressed in eukaryotic cells, which introduces N-linked glycosylation modifications to the protein. As a glycosylated recombinant IL-7, CYT-107 more closely mimics the natural human cytokine compared to bacterial expression systems that produce non-glycosylated IL-7. This feature contributes to its stability and bioavailability in clinical applications.
(Reference: U.S. National Center for Advancing Translational Sciences, GSRS record for IL-7, https://gsrs.ncats.nih.gov/ginas/app/ui/substances/46bd8013-1e2d-4b6e-afcf-340f447e8710
(2) The authors should indicate the receptor layout for IL-7 in the introduction and indicate available structural data. Also, in line 93, the authors should indicate that IL-7Ra is one subunit of the heterodimeric receptor complex.
We thank the reviewer for this insightful suggestion. However, due to page limitations, we have chosen to orient the introduction around the design rationale, computational workflow, and biological functionality of IL-7. To address the reviewer’s point while maintaining brevity, we have now included a concise description of the IL-7 receptor layout and its available structural data in the main text. Specifically, in line 93 we revised the sentence to read:“We began by examining the crystal structure of IL-7 bound to its receptor, IL7R-α (interleukin-7 receptor alpha; PDB ID: 3DI2), which recruits IL-2Rγ to form a heterodimeric receptor complex essential for downstream signaling.”
(3) The abbreviation IL-7Ra should be defined at first use.
We thank the reviewer for the comment. The abbreviation has now been defined at its first appearance in the manuscript. Specifically, at Line 93 we revised the sentence as follows:
“We began by examining the crystal structure of IL-7 bound to its receptor, IL7R-α (interleukin-7 receptor alpha; PDB ID: 3DI2), which recruits IL-2Rγ to form a heterodimeric receptor complex essential for downstream signaling..”
(4) The authors need to clarify whether the human or murine IL-7Ra is being used in each experiment mentioned in the results text.
We thank the reviewer for this important point. We have now specified in the main text and corresponding subsection titles whether human or murine IL-7Rα was used in each experiment.
(5) The authors sometimes use a dash in IL7Ra and IL2Rg and sometimes do not. This should be standardized.
We appreciate the reviewer’s observation. We have standardized the terminology throughout the manuscript to “IL7Rα” and “IL2Rγ” to maintain consistency.
(6) In Figure 3E, the authors left out the v in "Neo7-LDv1".
We have corrected the omission of “v” and updated the label to read Neo7-LDv1.
(7) In Figure 3E, the authors must indicate in the bottom row that they are visualizing sequential binding to IL-2Rg following incubation with IL-7Ra. This should be stated in the results text and the figure caption as well.
We have revised the results text and figure caption to clearly state that the bottom row illustrates sequential binding to IL-2Rγ following incubation with IL-7Rα.
“for detection of IL-2Rγ binding, yeast cells were first incubated with recombinant IL-7Rα, washed, and subsequently incubated with IL-2Rγ”
(8) In Figure 3E, "IL-7Rg" should be corrected to "IL-2Rg".
We have corrected “IL-7Rγ” to “IL-2Rγ” in Figure 3E for accuracy and consistency.
(9) In line 140, the authors claim that Neo7-LDv1 is partially folded based on the binding to the heterodimeric receptor complex. However, the data are insufficient to support this conclusion.
We understand the concern of the reviewer and we decided to rephrase the sentence for better understanding: “A degree of binding to IL2Rγ was detected, possibly reflecting partial folding of the displayed protein in the yeast display platform.” While we do not claim the protein to be fully or uniformly folded, this deduction is supported by the yeast display data and further corroborated by AlphaFold structural predictions.
(10) In lines 185-186, the authors claim that the binding affinity for IL-2Rg is improved, but this is not shown in Figure 3, which looks only at a single concentration and shows comparable binding between WT-IL7 and Neo7-LDv2.
We thank the reviewer for this valuable observation. Our original wording was ambiguous and may have implied a direct comparison with WT-IL7, which was not intended. The sentence was meant to highlight that within the Neo-7 variant series, Neo7-LDv2 displayed stronger binding to both IL-7Rα and IL-2Rγ compared to other Neo-7 variants. To avoid misinterpretation, we have revised the text as follows:
“Importantly, the enhanced binding affinity towards IL7Rα also led to improved binding towards the common IL2Rγ., relative to other variants in the Neo-7 series.”
(11) Lines 202-203 appear to be an error.
We thank the reviewer for pointing this out. The lines in question were indeed an error and have now been removed from the manuscript.
(12) In yeast display validation, negative controls showing binding to the fluorescent antibody only and an irrelevant control protein should be shown for all constructs in order to evaluate nonspecific interactions.
We agree with the reviewer that appropriate negative controls are important to validate specificity. To address this, we will include yeast display data with negative controls—native yeast (EBY100) stained with the corresponding fluorescent antibody in the Supplementary Information. This addition will provide clearer validation of binding specificity and reduce concerns regarding nonspecific interactions.
(13) For yeast display studies, titrations rather than single concentrations should be used to compare constructs (Figures 3 and 4). The claim that any of the constructs has a higher affinity than any other construct must be supported by performing titrations.
We thank the reviewer for this comment. We respectfully note that yeast display titrations provide relative rather than absolute estimates of binding affinity. In our study, constructs were compared under identical antigen concentrations, where the observed fluorescence intensity reflected their relative binding strength. These yeast display results served as an initial screening strategy, which we subsequently validated using surface plasmon resonance (SPR). SPR provided quantitative binding parameters and confirmed the binding differences observed in yeast display. Thus, while yeast titrations were not performed, the combination of side-by-side yeast display comparisons and orthogonal validation by SPR supports our affinity claims with both qualitative and quantitative evidence.
(14) The acronym SPR needs to be defined, and the authors should mention that this technique was used for quantitative binding studies in line 259.
We thank the reviewer for this suggestion. The acronym has now been defined in the main text at its first use, and we have clarified its role in the study. The revised text reads:
“We then characterized the binding affinities of Neo-7 variants to mouse IL-7 receptor alpha (mIL-7Rα) in a quantitative manner using surface plasmon resonance (SPR).”
(15) A titration of 2E8 cell proliferation versus concentration should be presented for IL-7 versus Neo-7 variants to directly compare EC50 values and make claims regarding potency in Figure 5H. Also, the authors should clarify whether a proliferation or viability assay was performed.
We thank the reviewer for the helpful comment regarding the use of EC₅₀ values when discussing potency. In response, we have revised the manuscript to avoid overinterpreting the data. Specifically, we replaced the term potency with ability to stimulate, as the 2E8 cell assay was designed to validate whether receptor binding by IL-7 and Neo-7 variants translates into biological function—namely, supporting immune cell viability and proliferation under limiting cytokine conditions. The assay was not optimized to determine formal EC₅₀ values, but rather to demonstrate functional activity consistent with IL-7 receptor engagement.
We have also clarified in the text that the experiment was a proliferation assay, with cell viability assessed as part of the readout. This revision better reflects the scope of the assay while aligning our claims with the data presented.
(16) Isotype control is not an appropriate name for the Fc-Only construct. This should be denoted as Fc Only.
We thank the reviewer for this comment. We have revised the terminology throughout the manuscript, changing isotype control to Fc control.
(17) A titration of mouse splenocyte proliferation versus concentration should be presented for IL-7 versus Neo-7 variants to directly compare EC50 values and make claims regarding potency in Figure 6.
We thank the reviewer for this insightful suggestion regarding EC₅₀ analysis. In this study, the splenocyte proliferation assay was designed as a preliminary in vitro screen to confirm the biological activity of Neo-7 variants relative to wild-type IL-7 prior to in vivo testing. The assay was not optimized for quantitative potency determination, but rather to provide an initial functional validation of the constructs. We have therefore revised the manuscript wording to avoid overinterpreting the data and refrained from making claims regarding EC₅₀-based potency. Instead, we emphasize that the in vivo tumor model provides a more physiologically relevant and rigorous platform for assessing cytokine functionality, including proliferation and immunomodulation.
(18) The legends in Figure 6 should indicate the colors used for each construct.
We thank the reviewer for pointing this out. We have revised the legend for Figure 6 to include the color codes corresponding to each construct.
(19) Metabolism should be singular in lines 433 and 435.
We have corrected the wording so that “metabolism” is consistently used in the singular form.
(20) In Figure 8D, "cycling" should be changed to "cycle".
The word “cycling” has been corrected to “cycle” in Figure 8D.
(21) The treatments need to be indicated in Figure 8D. Also, a color scale is needed.
We agree with the reviewer, and a color scale description has now been included in the Figure legend to aid interpretation. “The gene expression heatmap is derived from Z-scores calculated from the RNA sequencing data, with expression levels color-coded from high (red) to low (blue). ”
(22) More comparisons between RNASeq data for Fc-WTIL7 versus Fc-Neo7 (Figure 8) should be presented in the results section.
We thank the reviewer for this suggestion. Due to space limitations in the main manuscript, we are unable to include an expanded description of all RNA-Seq comparisons. However, we will provide a more detailed analysis of Fc-WT-IL7 versus Fc-Neo7 in the supplementary section, including expanded differential gene expression comparisons and pathway enrichment analyses. This will allow readers to fully appreciate the differences while maintaining focus in the main text.
(23) The strikethrough in line 464 needs to be corrected.
We have corrected the strikethrough error in line 464.
(24) It is unclear how stabilizing IL-7 improves its toxicity or half-life. The authors should indicate more clearly which limitations of IL-7 were addressed by their molecule in the abstract, introduction, and discussion.
Native IL-7 demonstrates an excellent safety profile but faces two major challenges in clinical application: (1) short plasma half-life and (2) suboptimal developability due to poor stability. The short half-life is typically addressed through Fc-fusion strategies, which extend systemic exposure via FcRn recycling. However, wild-type IL-7 exhibits a strong aggregation tendency when fused to Fc, rendering the fusion protein poorly developable. By redesigning IL-7 into the more stable Neo-7 format, we substantially improved the folding efficiency and purity of the Fc-fusion protein after affinity purification, thereby enabling its advancement as a recombinant biologic candidate.
We do not intend to claim that increased stability directly reduces in vivo toxicity. The favorable safety profile of IL-7 arises primarily from its intrinsic biology (mechanism of action and downstream signaling), rather than from its structural stability. That said, improved stability and reduced aggregation propensity could potentially lower the immunogenicity risk of protein biologics. Nevertheless, there are currently no validated in vitro or in vivo assays that reliably correlate protein stability or aggregation with clinical immunogenicity outcomes.
(25) The acronym MSA needs to be defined.
We have defined the acronym MSA (Multiple Sequence Alignment) on page 7, line 142.
(26) The acronym CPD needs to be defined.
We have defined the acronym CPD (Computational Protein Design) on page 23, line 468.
Reviewer #2 (Recommendations for the authors):
Any experimental structural data would be good to have.
We plan to pursue X-ray crystallography of Neo-7 in future studies to obtain high-resolution structural confirmation. However, we emphasize that such experiments require significant time and resources, and the results would not alter the biological claims made in this study. Our focus here is to demonstrate that with recent advances in in silico protein structure prediction algorithms, such as AlphaFold2, it is now feasible to redesign therapeutic proteins with sufficient accuracy to achieve improved developability and biological performance. This study highlights how computational approaches can streamline protein drug engineering, reducing reliance on labor-intensive structural studies during the early stages of therapeutic development.
Please add details of how the changed kinetics might affect downstream pathways.
We appreciate the reviewer’s suggestion to elaborate on the biological implications of the altered binding kinetics.
Our data show that Neo-7 variants display a slower off-rate relative to WT-IL-7, which likely reflects enhanced stabilization of the cytokine–receptor complex. In principle, this could prolong receptor occupancy and modestly extend downstream signaling duration. However, several biological features of IL-7 constrain the risk of excessive or aberrant signaling:
Receptor Regulation: IL-7 signaling induces rapid downregulation of IL7Rα on T cells, serving as a feedback mechanism to prevent sustained or uncontrolled activation. This "hardwired" receptor regulation reduces the likelihood that a slower off-rate translates into pathological over-signaling.
Pathway Specificity: IL-7 primarily signals through the JAK/STAT5 axis, with little evidence of signaling bias. Unlike other cytokines (e.g., IL-21, IL-22) that can activate STAT1 or STAT3 and drive distinct functional outcomes, IL-7’s pathway specificity minimizes concerns about altered signaling directionality.
Transcriptional Evidence: Our RNA-seq analysis further supports this, showing that Neo-7 and WT-IL-7 activate similar transcriptional programs. The differences we observed were in the magnitude of response, not in the qualitative nature of the pathways engaged. This suggests that Neo-7 variants enhance the intensity of canonical IL-7 signaling rather than redirecting it toward alternative or unintended pathways.
Together, these findings support the interpretation that the slower off-rate of Neo-7 variants likely contributes to stronger or more sustained activation of IL-7’s canonical STAT5 pathway, while intrinsic regulatory mechanisms and pathway fidelity safeguard against inappropriate signaling outcomes.
Minor:
(1) The Figure 3 text is hard to read.
We acknowledge the reviewer’s concern regarding the readability of Figure 3. In the revised manuscript, we will provide a higher-resolution version of the figure to ensure that all labels and text are clearly visible upon magnification.
(2) The manuscript switches between "Neo-7" and "Neo7" .
We agree with the reviewer’s observation. To maintain consistency throughout the manuscript, all references have been standardized to Neo-7.
Reviewer #1 (Public review):
Summary:
This manuscript reports a prospective longitudinal study examining whether infants with high likelihood (HL) for autism differ from low-likelihood (LL) infants in two levels of word learning: brain-to-speech cortical entrainment and implicit word segmentation. The authors report reduced syllable tracking and post-learning word recognition in the HL group relative to the LL group. Importantly, both the syllable-tracking entrainment measure and the word recognition ERP measure are positively associated with verbal outcomes at 18-20 months, as indexed by the Mullen Verbal Developmental Quotient. Overall, I found this to be a thoughtfully designed and carefully executed study that tackles a difficult and important set of questions. With some clarifications and modest additional analyses or discussion on the points below, the manuscript has strong potential to make a substantial contribution to the literature on early language development and autism.
Strengths:
This is an important study that addresses a central question in developmental cognitive neuroscience: what mechanisms underlie variability in language learning, and what are the early neural correlates of these individual differences? While language development has a relatively well-defined sensitive period in typical development, the mechanisms of variability - particularly in the context of neurodevelopmental conditions - remain poorly understood, in part because longitudinal work in very young infants and toddlers is rare. The present study makes a valuable contribution by directly targeting this gap and by grounding the work in a strong theoretical tradition on statistical learning as a foundational mechanism for early language acquisition.
I especially appreciate the authors' meticulous approach to data quality and their clear, transparent description of the methods. The choice of partial least squares correlation (PLS-c) is well motivated, given the multidimensional nature of the data and collinearity among variables, and the manuscript does a commendable job explaining this technique to readers who may be less familiar with it.
The results reveal interesting developmental changes in syllable tracking and word segmentation from birth to 2 years in both HL and LL infants. Simply mapping these trajectories in both groups is highly valuable. Moreover, the associations between neural indices of brain-to-speech entrainment and word segmentation with later verbal outcomes in the LL group support a critical role for speech perception and statistical learning in early language development, with clear implications for understanding autism. Overall, this is a rich dataset with substantial potential to inform theory.
Weaknesses:
(1) Clarifying longitudinal vs. concurrent associations
Because the current analytical approach incorporates all time points, including the final visit, it is challenging to determine to what extent the brain-language associations are driven by longitudinal relationships vs. concurrent correlations at the last time point. This does not undermine the main findings, but clarifying this issue could significantly enhance the impact of the individual-differences results. If feasible, the authors might consider (a) showing that a model excluding the final visit still predicts verbal outcomes at the last visit in a similar way, or (b) more explicitly acknowledging in the discussion that the observed associations may be partly or largely driven by concurrent correlations. Either approach would help readers interpret the strength and nature of the longitudinal claims.
(2) Incorporating sleep status into longitudinal models
Sleep status changes systematically across developmental stages in this cohort. Given that some of the papers cited to justify the paradigm also note limitations in speech entrainment and word segmentation during sleep or in patients with impaired consciousness, it would be helpful to account for sleep more directly. Including sleep status as a factor or covariate in the longitudinal models, or at least elaborating more fully on its potential role and limitations, would further strengthen the conclusions and reassure readers that these effects are not primarily driven by differences in sleep-wake state.
(3) Use of PLS-c and potential group × condition interactions
I am relatively new to PLS-c. One question that arose is whether PLS-c could be extended to handle a two-way interaction between group and condition contrasts (STR vs. RND). If so, some of the more complex supplementary models testing developmental trajectories within each group (Page 8, Lines 258-265) might be more directly captured within a single, unified framework. Even a brief comment in the methods or discussion about the feasibility (or limitations) of modeling such interactions within PLS-c would be informative for readers and could streamline the analytic narrative.
(4) STR-only analyses and the role of RND
Page 8, Lines 241-245: This analysis is conducted only within the STR condition. The lack of group difference observed here appears consistent with the lack of group difference in word-level entrainment (Page 9, Lines 292-294), suggesting that HL and LL groups may not differ in statistical learning per se, but rather in syllabic-level entrainment. As a useful sanity check and potential extension, it might be informative to explore whether syllable-level entrainment in the RND condition differs between groups to a similar extent as in Figure 2C-D. In other work (e.g., adults vs. children; Moreau et al., 2022), group differences can be more pronounced for syllable-level than for word-level entrainment. Figure S6 seems to hint that a similar pattern may exist here. If feasible, including or briefly reporting such an analysis could help clarify the asymmetry between the two learning measures and further support the interpretation of syllabic-level differences.
(5) Multi-speaker input and voice perception (Page 15, Lines 475-483)
The multi-speaker nature of the speech input is an interesting and ecologically relevant feature of the design, but it does add interpretive complexity. The literature on voice perception in autism is still mixed: for example, Boucher et al. (2000) reported no differences in voice recognition and discrimination between children with autism and language-matched non-autistic peers, whereas behavioral work in autistic adults suggests atypical voice perception (e.g., Schelinski et al., 2016; Lin et al., 2015). I found the current interpretation in this paragraph somewhat difficult to follow, partly because the data do not directly test how HL and LL infants integrate or suppress voice information. I think the authors could strengthen this section by slightly softening and clarifying the claims.
(6) Asymmetry between EEG learning measures
Page 16, Lines 502-507 touches on the asymmetry between the two EEG learning measures but leaves some questions for the reader. The presence of word recognition ERPs in the LL group suggests that a failure to suppress voice information during learning did not prevent successful word learning. At the same time, there is an interesting complementary pattern in the HL group, who show LL-like word-level entrainment but does not exhibit robust word recognition. Explicitly discussing this asymmetry - why HL infants might show relatively preserved word-level entrainment yet reduced word recognition ERPs, whereas LL infants show both - would enrich the theoretical contribution of the manuscript.
References:
(1) Moreau, C. N., Joanisse, M. F., Mulgrew, J., & Batterink, L. J. (2022). No statistical learning advantage in children over adults: Evidence from behaviour and neural entrainment. Developmental Cognitive Neuroscience, 57, 101154. https://doi.org/10.1016/j.dcn.2022.101154
(2) Boucher, J., Lewis, V., & Collis, G. M. (2000). Voice processing abilities in children with autism, children with specific language impairments, and young typically developing children. Journal of Child Psychology and Psychiatry, 41(7), 847-857. https://doi.org/10.1111/1469-7610.00672
(3) Schelinski, S., Borowiak, K., & von Kriegstein, K. (2016). Temporal voice areas exist in autism spectrum disorder but are dysfunctional for voice identity recognition. Social Cognitive and Affective Neuroscience, 11(11), 1812-1822. https://doi.org/10.1093/scan/nsw089
(4) Lin, I.-F., Yamada, T., Komine, Y., Kato, N., Kato, M., & Kashino, M. (2015). Vocal identity recognition in autism spectrum disorder. PLOS ONE, 10(6), e0129451. https://doi.org/10.1371/journal.pone.0129451
Reviewer #3 (Public review):
Summary:
This study provides novel insights into how individuals regulate the speed of their movements both alone and in pairs, highlighting consistent differences in movement vigor across people and showing that these differences can adapt in dyadic contexts. The findings are significant because they reveal stable individual patterns of action that are flexible when interacting with others, and they suggest that multiple factors, beyond reward sensitivity, may contribute to these idiosyncrasies. The evidence is generally strong, supported by careful behavioral measurements and appropriate modeling, though clarifying some statistical choices and including additional measures of accuracy and smoothness would further strengthen the support for the conclusions.
Major Comments:
(1) Given the idiosyncrasies in individual vigor, would linear mixed models (LMMs) be more appropriate than ANOVAs in some analyses (e.g., in the section "Solo session"), as they can account for random intercepts and slopes on vigor measures? Some figures (e.g., Figure 2.B and 3.E) indeed seem to show that some aspects of behaviour may present variability in slopes and intercepts across participants. In fact, I now realize that LMMs are used in the "Emergence of dyadic vigor from the partners' individual vigor" section, so could the authors clarify why different statistical approaches were applied depending on the sections?
(2) If I understand correctly, the introduction suggests that idiosyncrasies in movement vigor may be driven by inter-individual differences in reward sensitivity. However, the current task does not involve any explicit rewards, yet the authors still observe idiosyncrasies in vigor, which is interesting. Could this indicate that other factors contribute to these consistent individual differences? For example, could sensitivity to temporal costs or physical effort explain the slow versus fast subgrouping? Specifically, might individuals more sensitive to temporal costs move faster to minimize opportunity costs, and might those less sensitive to effort costs also move faster? Along the same lines, could the two subgroups (slow vs. fast) be characterized in terms of underlying computational "phenotypes," such as their sensitivities to time and effort? If this is not feasible with the current dataset, it would still be valuable to discuss whether these factors could plausibly account for the observed patterns, based on existing literature.
(3) The observation that dyads did not lose accuracy or smoothness despite changes in vigor is interesting and suggests a shift in the speed-accuracy tradeoff. Could the authors include accuracy and smoothness measures in the main figures rather than only in supplementary materials? I think it would make the manuscript more complete.
(4) It is a bit unclear to me whether the variance assumptions for ANOVAs were checked, for instance, in Figure 3H.
Reviewer #1 (Public review):
Summary
The strength of this manuscript lies in the behavior: mice use a continuous auditory background (pink vs brown noise) to set a rule for interpreting an identical single-whisker deflection (lick in W+ and withhold in W− contexts) while always licking to a brief 10 kHz tone. Behaviorally, animals acquire the rule and switch rapidly at block transitions and take a few trials to fully integrate the context cue. What's nice about this behavior is the separate auditory cue, which shows the animals remain engaged in the task, so it's not just that the mice check out (i.e., become disengaged in the W- context). The authors then use optical tools, combining cortex-wide optogenetic inactivation (using localized inhibition in a grid-like fashion) with widefield calcium imaging to map what regions are necessary for the task and what the local and global dynamics are. Classic whisker sensorimotor nodes (wS1/wS2/wM/ALM) behave as expected with silencing reducing whisker-evoked licking. Retrosplenial cortex (RSC) emerges as a somewhat unexpected, context-specific node: silencing RSC (and tjS1) increases licking selectively in W−, arguing that these regions contribute to applying the "don't lick" policy in that context. I say somewhat because work from the Delamater group points to this possibility, albeit in a Pavlovian conditioning task and without neural data. I would still recommend the authors of the current manuscript review that work to see whether there is a relevant framework or concept (Castiello, Zhang, Delamater, 'The retrosplenial cortex as a possible 'sensory integration' area: a neural network modeling approach of the differential outcomes effect of negative patterning', 2021, Neurobiology of Learning and Memory).
The widefield imaging shows that RSC is the earliest dorsal cortical area to show W+ vs W− divergence after the whisker stimulus, preceding whisker motor cortex, consistent with RSC injecting context into the sensorimotor flow. A "Context Off" control (continuous white noise; same block structure) impairs context discrimination, indicating the continuous background is actually used to set the rule (an important addition!) Pre-stimulus functional-connectivity analyses suggest that there is some activity correlation that maps to the context presumably due to the continuous background auditory context. Simultaneous opto+imaging projects perturbations into a low-dimensional subspace that separates lick vs no-lick trajectories in an interpretable way.
In my view, this is a clear, rigorous systems-level study that identifies an important role for RSC in context-dependent sensorimotor transformation, thereby expanding RSC's involvement beyond navigation/memory into active sensing and action selection. The behavioral paradigm is thoughtfully designed, the claims related to the imaging are well defended, and the causal mapping is strong. I have a few suggestions for clarity that may require a bit of data analysis. I also outline one key limitation that should be discussed, but is likely beyond the scope of this manuscript.
Major strengths
(1) The task is a major strength. It asks the animal to generate differential motor output to the same sensory stimulus, does so in a block-based manner, and the Context-Off condition convincingly shows that the continuous contextual cue is necessary. The auditory tone control ensures this is more than a 'motivational' context but is decision-related. In fact, the slightly higher bias to lick on the catch trials in the W+ context is further evidence for this.
(2) The dorsal-cortex optogenetic grid avoids a 'look-where-we-expect' approach and lets RSC fall out as a key node. The authors then follow this up with pharmacology and latency analyses to rule out simple motor confounds. Overall, this is rigorous and thoughtfully done.
(3) While the mesoscale imaging doesn't allow for cellular resolution, it allows for mapping of the flow of information. It places RSC early in the context-specific divergence after whisker onset, a valuable piece that complements prior work.
(4) The baseline (pre-stim) functional connectivity and the opto-perturbation projections into a task subspace increase the significance of the work by moving beyond local correlates.
Key limitation
The current optogenetic window begins ~10 ms before the sensory cue and extends 1s after, which is ideal for perturbing within-trial dynamics but cannot isolate whether RSC is required to maintain the context-specific rule during the baseline. Because context is continuously available, it makes me wonder whether RSC is the locus maintaining or, instead, gating the context signal. The paper's results are fully consistent with that possibility, but causality in the pre-stimulus window remains an open question. (As a pointer for future work, pre-stimulus-only inactivation, silencing around block switches, or context-omission probe trials (e.g., removing the background noise unexpectedly within a W+ or W- context block), could help separate 'holding' from 'gating' of the rule. But I'm not suggesting these are needed for this manuscript, but would be interesting for future studies.)
If B is a unit vector, the operation A⋅B yields ||A||cos(θ), signifying the magnitude of A's projection in B's direction, with a negative sign if the direction is reversed. This is termed the scalar projection of A onto B.
Chat with Gemini:<br /> Lee:
I'm taking an online course of 3D graphics. Geometry (as it pertains to 3D graphics on computers) is the current focus. The text just said:
If B is a unit vector, the operation A⋅B yields ∥A∥cos(θ), signifying the magnitude of A's projection in B's direction, with a negative sign if the direction is reversed. This is termed the scalar projection of A onto B. Can you explain this better to me? What does "magnitude of A's (a non-unit vector) projection in B's (a unit vector) direction " mean?
Gemini:<br /> This is one of the most fundamental concepts in 3D graphics, but the math jargon can sometimes obscure the simple visual reality.
To understand "the magnitude of A's projection in B's direction," the best tool is the Shadow Analogy.
Imagine Vector A is a stick floating in space. Vector B represents the ground (or a specific direction along the ground).
If you were to shine a light directly from above (perpendicular to the ground B), the stick (A) would cast a shadow onto the ground.
So, when the text says "magnitude of A's projection in B's direction," it is simply asking: "How much of Vector A is pointing in the same direction as Vector B?"
This is the crucial part that often trips people up.
In 3D graphics, a Unit Vector has a length of exactly 1. It is used purely to point the way; it contains direction information but no "scale" or "weight."
The formula for the dot product is:
** MISSING FORMULAS HERE **
If B is a unit vector, then . The formula simplifies to:
If B wasn't a unit vector (say, it was 5 units long), the result would be the length of the shadow multiplied by 5. That gives you a useless number if you just wanted to measure the shadow. By ensuring B is length 1, the math gives you the exact, unscaled length of the shadow of A.
The text mentioned a "negative sign if the direction is reversed." This happens based on the angle () between the two vectors.
``` | Concept | The Visual | The Math | | ----------------- | ---------------------------------------- | ------------- | | Vector A | The object casting the shadow. | ∥A∥ | | Vector B (Unit) | The ground/direction we measure against. | ∥B∥=1 | | Dot Product (A⋅B) | The length of the shadow. | A⋅B=∥A∥cos(θ) |
```
You will use this constantly. Here are two immediate examples:
Lee:
So if B was pointing directly along the +X axis (X=1, Y=0, Z=0) and A was angled up from that and longer (X=2, Y=2, Z=0), A's magnitude would be ~2.83. But A⋅B would be 2 (It's X coordinate in this specific scenario). If you made a triangle with A being one side and B specifying the direction of the adjacent side, and drew a straight line from the end of A straight down to and perpendicular to the direction B, the length of that "B" side would be A⋅B assuming ∥B∥ was 1. Does that sound correct?
Gemini:<br /> Yes, that is exactly correct. You have perfectly grasped the geometric interpretation of the dot product.
Your example scenario is spot on. Here is the breakdown of why your visualization works, mapping your description to the standard trigonometry terms.
Using your numbers:
The calculation is:
You correctly identified that because B is aligned perfectly with the X-axis, the "shadow" of A is simply its X-coordinate (2).
You described making a triangle. This is the exact geometric definition used in physics and graphics.
In trigonometry terms:
You added the condition: "assuming ∥B∥ was 1."
This is the most important part of your insight. If Vector B had been (2, 0, 0) (length of 2), the dot product would have been:
(2×2) + (2×0) + (0×0) = 4
The math would return 4, but the "shadow" is physically still only 2 units long. The dot product scales the result by the magnitude of B. That is why we must normalize B (make it a unit vector) before using it as a "measuring stick."
You have the concept of Scalar Projection (the length of the shadow) down cold.
When both vectors are normalized, the arc cosine (cos−1) of their dot product reveals the angle θ between them: θ=cos−1(A⋅B||A||||B||) or θ=cos−1(A^⋅B^), where cos−1 denotes the inverse cosine function, commonly represented as acos() in programming languages.
Chat with Gemini:<br /> Lee:
Here's another block of text in that same lesson. Can you help me visualize what it means?
When both vectors are normalized, the arc cosine (cos⁻¹) of their dot product reveals the angle θ between them: θ=cos⁻¹((A⋅B)/(∥A∥ ∥B∥)) or cos⁻¹(Â⋅B̂), where cos⁻¹ denotes the inverse cosine function, commonly represented as acos() in programming languages.
Gemini:<br /> This block of text is explaining how to "reverse-engineer" an angle using the dot product. In 3D graphics, you often know the coordinates of two objects (like a player and an enemy), but you don't know the angle between them. This formula is how you find it.
Think of the dot product formula we just discussed:
A⋅B = ∥A∥∥B∥cos(θ)
The text is simply rearranging that equation to solve for .
A⋅B
cos(θ) = ------
∥A∥∥B∥/ A⋅B \
θ = arccos( ------ )
\ ∥A∥∥B∥ /The text mentions: θ=cos(Â⋅B̂). (The "hat" symbol  means it is a normalized unit vector).
This is the "pro-gamer move" in graphics programming. If you normalize both vectors first (meaning you force their lengths to be 1), then ∥A∥ and ∥B∥ both become 1. The denominator in our formula disappears!
acos(). Imagine you are coding a stealth game. You have a Guard and a Player.
If you normalize both vectors and calculate (A⋅B), you get the exact angle (e.g., 30°). If the Guard's peripheral vision is only 45°, you now know the Guard can see the player!
When you use acos() in languages like C#, C++, or Python, the result is usually in radians, not degrees.
Lee:
You said
If you normalize both vectors and calculate $acos(A⋅B), you get the exact angle (e.g., 30°). If the Guard's peripheral vision is only 45°, you now know the Guard can see the player!By "only 45°", is that from left to right of the guard's visibility? So, from straight ahead to the far right of his vision would only be 22.5°, thus the player is not visible at a 30° again from the guard? Or is that 45° from straight ahead to the far right of the guard's vision (for a total viewable angle of 90°)?
Gemini:<br /> That is a fantastic catch. In game development and 3D graphics, this distinction is a common source of bugs!
When you use the dot product to check visibility, the angle θ you calculate is the offset from the center.
The dot product measures the angle between two specific lines: where the guard is looking (the "forward vector") and where the player is. It doesn't know about "left" or "right"; it only knows "how far away from my nose is this object?"
In your specific scenario:
To avoid doing the math every single frame, programmers usually define the guard's "View Cone" as a threshold value.
If the guard can see 45° to the left and 45° to the right (a 90° total FOV), the threshold is:<br />
cos(45°) ≈ 0.707
The code then looks like this:
```csharp // If dot product is greater than 0.707, the player is within the 90-degree cone if (DotProduct(GuardForward, DirectionToPlayer) > 0.707) { // I see you! }
```
One interesting thing to visualize: because the dot product only cares about the angle between two vectors, the "visibility" it creates is actually a cone, not a flat triangle.
If you say "anything within 30° of my forward vector is visible," that includes 30° up, 30° down, 30° left, and 30° right. It creates a circular spotlight of vision.
| Total FOV | Angle from Center | Dot Product Threshold (Approx) |
| ------------------ | ----------------- | ------------------------------ |
| 180° (Flat wall) | 90° | 0.0 |
| 90° (Standard) | 45° | 0.707 |
| 60° (Narrow) | 30° | 0.866 |
| 2° (Sniper Scope)| 1° | 0.999 |
Reviewer #2 (Public review):
In this manuscript, the authors built upon the Connectome Model literature and proposed SynaptoGen, a differentiable model that explicitly takes into account multiplicity and conductance in neural connectivity. The authors evaluated SynaptoGen through simulated reinforcement learning tasks and established its performance as often superior to two considered baselines. This work is a valuable addition to the field, supported by a solid methodology with some details and limitations missing.
Major points:
(1) The genetic features in the X and Y matrices in the CM were originally introduced as combinatorial gene expression patterns that correspond to the presence and even absence of a subset of genes. The authors oversimplify this original scope by only considering single-gene expression features. While this was arguably a reasonable first approximation for a case study of gap junctions in C. elegans, it is by no means expected to be a plausible expectation for chemical synapses. As the authors appear to motivate their model by chemical synapses that have polarities, they should either consider combinatorial rules in the model or at least present this explicitly as a key limitation of the model. Omitting combinatorial effects also renders the presented "bioplausible" baseline much less bioplausible, likely calling for a different name.
(2) It is not fully explained how Equation (11) is obtained, even conceptually. It is unclear why \bar{B} and \bar{G} should be element-wise multiplied together, both already being expected values. Moreover, the authors acknowledged in lines 147-149 that the components of \bar{G} actually depend on gene expression X, which is a component in \bar{B}, so the logic here seems circular.
(3) The authors considered two baselines, namely SNES and a bioplausible control. However, it would be of interest to also investigate: a) Vanilla DQN with the same size trained on the same MLP, to judge whether the biological insights behind SynaptoGen parameterization add value to performance. b) Using Equation (7) instead of Equation (11) to construct the weight matrices, to judge whether incorporating the conductance adds value to performance.
Reviewer #3 (Public review):
Summary
Boccato et al. present an ambitious and thoughtfully developed framework, SynaptoGen, which proposes a differentiable model of synaptogenesis grounded in gene-expression vectors, protein interaction probabilities, and conductance rules. The authors aim to bridge the gap between computational connectomics and synthetic biological intelligence by enabling gradient-based optimization of genetically encoded circuit architectures. They support this goal with mathematical derivations, simulation experiments across several RL benchmarks, and a biologically grounded validation using C. elegans adhesion-molecule co-expression data. The paper is timely and conceptually compelling, offering a unified formulation of synaptic multiplicity and synaptic weight formation that can be integrated directly into learning systems.
Strengths
(1) Well-motivated framework with clear conceptual contributions.
(2) Rigorous mathematical development.
(3) Compelling empirical validation.
(4) Excellent framing and discussion of future impact.
Weaknesses
(1) Overstated claims in the abstract and discussion.
(2) Ambiguity in "first of its kind" assertions.
To Gen or Not To Gen: The Ethical Use of Generative AI 33 minute read This blog entry started out as a translation of an article that my colleague Jakob and I wrote for a German magazine. After that we added more stuff and enriched it by additional references and sources. We aim at giving an overview about many - but not all - aspects that we learned about GenAI and that we consider relevant for an informed ethical opinion. As for the depth of information, we are just scratching the surface; hopefully, the loads of references can lead you to diving in deeper wherever you want. Since we are both software developers our views are biased and distorted. Keep also in mind that any writing about a “hot” topic like this is nothing but a snapshot of what we think to know today. By the time you read it the authors’ knowledge and opinions have already changed. Last Update: December 8, 2025. Table of ContentsPermalink Abstract About us Johannes Link Jakob Schnell Introduction Ethics, what does that even mean? Clarification of terms Basics Can LLMs think? What LLMs are good at GenAI as a knowledge source GenAI in software development Actual vs. promised benefits Harmful aspects of GenAI GenAI is an ecological disaster Power Water Electronic Waste GenAI threatens education and science GenAI is destroying the free internet. GenAI is a danger to democracy GenAI versus human creativity Digital colonialism Political aspects Conclusion Can there be ethical GenAI? How to act ethically AbstractPermalink ChatGPT, Gemini, Copilot. The number of generative AI applications (GenAI) and models is growing every day. In the field of software development in particular, code generation, coding assistants and vibe coding are on everyone’s lips. Like any technology, GenAI has two sides. The great promises are offset by numerous disadvantages: immense energy consumption, mountains of electronic waste, the proliferation of misinformation on the internet and the dubious handling of intellectual property are just a few of the many negative aspects. Ethically responsible behaviour requires us to look at all the advantages, disadvantages and collateral damages of a technology before we use it or recommend its use to others. In this article, we examine both sides and eventually arrive at our personal and naturally subjective answer to whether and how GenAI can be used in an ethical manner. About usPermalink Johannes LinkPermalink … has been programming for over 40 years, 30 of them professionally. Since the end of the last century, extreme programming and other human-centred software development approaches have been at the heart of his work. The meaningful and ethical implementation of his private and professional life has been his driving force for years. He has been involved with GenAI since the early days of OpenAI’s GPT language models. More about Johannes can be found at https://johanneslink.net. Jakob SchnellPermalink … studied mathematics and computer science and has been working as a software developer for 5 years. He works as a lecturer and course director in university and non-university settings. As a youth leader, he also comes into regular contact with the lives of children and young people. In all these environments, he observes the growing use of GenAI and its impact on people. IntroductionPermalink Ethics, what does that even mean?Permalink Ethical behaviour sounds like the title of a boring university seminar. However, if you look at the wikipedia article of the term 1, you will find that ‘how individuals behave when confronted with ethical dilemmas’ is at the heart of the definition. So it’s about us as humans taking responsibility and weighing up whether and how we do or don’t do certain things based on our values. We have to consider ethical questions in our work because all the technologies we use and promote have an impact on us and on others. Therefore, they are neither neutral nor without alternative. It is about weighing up the advantages and potential against the damage and risks; and that applies to everyone, not just us personally. Because often those who benefit from a development are different from those who suffer the consequences. As individuals and as a society, we have the right to decide whether and how we want to use technologies. Ideally, this should be in a way that benefits us all; but under no circumstances should it be in a way that benefits a small group and harms the majority. The crux of the matter is that ethical behaviour does not come for free. Ethics are neither efficient nor do they enhance your economic profit. That means that by acting according to your values you will, at some point, have to give something up. If you’re not willing to do that, you don’t have values - just opinions. Clarification of termsPermalink When we write ‘generative AI’ (GenAI), we are referring to a very specific subset of the many techniques and approaches that fall under the term ‘artificial intelligence’. Strictly speaking, these are a variety of very different approaches that range from symbolic logic, over automated planning up to the broad field of machine learning (ML). Nowadays most effort, hype and money goes into deep learning (DL): a subfield of ML that uses multi-layered artificial neural networks to discover statistical correlations (aka patterns) based on very large amounts of training data in order to reproduce those patterns later. Large language models (LLM) and related methods for generating images, videos and speech now make it possible to apply this idea to completely unstructured data. While traditional ML methods often managed with a few dozen parameters, these models now work with several trillion (10^12) parameters. In order for this to produce the desired results, both the amount of training data and the training duration must be increased by several orders of magnitude. This brings us to the definition of what we mean by ‘GenAI’ in this article: Hyperscaled models that can only be developed, trained and deployed by a handful of companies in the world. These are primarily the GenAI services provided by OpenAI, Anthropic, Google and Microsoft, or based on these services. We also focus primarily on language models; the generation of images, videos, speech and music plays only a minor role in this article. Our focus on hyperscale services does not mean that other ML methods are free of ethical problems; however, we are dealing with a completely different order of magnitude of damage and risk here. For example, there do exist variations of GenAI that use the same or similar techniques, but on a much smaller scale and restricted domains (e.g. AlphaFold 2). These approaches tend to bring more value with fewer downsides. BasicsPermalink GenAI models are designed to interpolate and extrapolate 3, i.e. to fill in the gaps between training data and speculate beyond the limits of the training data. Together with the stochastic nature of the training data, this results in some interesting properties: GenAI models ‘invent’ answers; with LLMs, we like to refer to this as ‘hallucinations’. GenAI models do not know what is true or false, good or bad, efficient or effective, only what is statistically probable or improbable in relation to training data, context and query (aka prompt). GenAI models cannot explain their output; they have no capability of introspection. What is sold as introspection is just more output, with the previous output re-injected. GenAI models do not learn from you; they only draw from their training material. The learning experience is faked by reinjecting prior input into a conversation’s context 4. The context, i.e. the set of input parameters provided, is decisive for the accuracy of the generated result, but can also steer the model in the wrong direction. Increasing the context window makes a query much more computation-intensive - likely in a quadratic way. Therefore, the promised increase of “maximum context window” of many models is mostly fake 5. The reliability of LLMs cannot be fundamentally increased by even greater scaling 6. Can LLMs think?Permalink Proponents of the language-of-thought hypothesis 7 believe it is possible for purely language-based models to acquire the capabilities of the human brain – reasoning, modelling, abstraction and much more. Some enthusiasts even claim that today’s models have already acquired this capability. However, recent studies 8 9 show that today’s models are neither capable of genuine reasoning nor do they build internal models of the world. Moreover, “…according to current neuroscience, human thinking is largely independent of human language 10” and there is fundamental scientific doubt that achieving human cognition through computation is achievable in practice let alone by scaling up training of deep networks 11. An example of a lack of understanding of the world is the prompt ‘Give me a random number between 0 and 50’. The typical GenAI response to this is ‘27’, and it is significantly more reliable than true randomness would allow. (If you don’t believe it, just try it out!) This is because 27 is the most likely answer in the GenAI training data – and not because the model understands what ‘random’ means. ‘Chain of Thought (CoT)’ approaches and ‘Reasoning models’ attempt to improve reasoning by breaking down a prompt, the query to the model, into individual (logical) steps and then delegating these individual steps back to the LLM. This allows some well-known reasoning benchmarks to be met, but it also multiplies the necessary computational effort by a factor between 30 and 700 12. In addition, multistep reasoning lets individual errors chain together to form large errors. And yet, CoT models do not seem to possess any real reasoning abilities 13 14 and improve the overall accuracy of LLMs only marginally 15. The following thought experiment from 16 underscores the lack of real “thinking” capabilities: LLMs have simultaneous access to significantly more knowledge than humans. Together with the postulated ability of LLMs to think logically and draw conclusions, new insights should just fall from the sky. But they don’t. Getting new insights from LLMs would require these to be already encoded in the existing training material, and to be decoded and extracted by pure statistical means. What LLMs are good atPermalink Undoubtedly, LLMs represent a major qualitative advance when it comes to extracting information from texts, generating texts in natural and artificial languages, and machine translation. But even here, the error rate, and above all the type of error (‘hallucinations’), is so high that autonomous, unsupervised use in serious applications must be considered highly negligent. GenAI as a knowledge sourcePermalink As we have pointed out above, LLMs cannot differentiate between true and false - regardless of the training material. It does not answer the question “What is XYZ?” but the question “How would an answer to question ‘What is XYZ?’ look like?”. Nevertheless, many people claim that the answers that ChatGPT and alike provide for the typical what-how-when-who queries are good enough and often better than what a “normal” web search would have given us. Arguably, this is the most prevalent use case for “AI” bots today. The problem is that most of the time we will never learn about the inaccuracies, left-outs, distortions and biases that the answer contained - unless we re-check everything, which defies the whole purpose of speeding up knowledge retrieval. The less we already know, the better the “AI’s” answer looks to us, but the less equipped we are to spot the problems. A recent by the BBC and 22 Public Service Media organizations shows that 45% of all “AI” assistants’ answers on questions about news and current affairs have significant errors 17. Moreover, LLMs are easy prey for manipulation - either by the service providing organization or by third parties. A recent study claims that even multi-billion-parameter models can be “poisoned” by injecting just a few corrupted documents 18. So, if anything is at stake all output from LLMs must be carefully validated. Doing that, however, would contradict the whole point of using “AI” to speed up knowledge acquisition. GenAI in software developmentPermalink The creation and modification of computer programmes is considered a prime domain for the use of LLMs. This is partly because programming languages have less linguistic variance and ambiguity than natural languages. Moreover, there are many methods for automatically checking generated source code, such as compiling, static code analysis and automated testing. This simplifies the validation of generated code and thereby gives an additional feeling of trust. Nevertheless, individual reports on the success of coding assistants such as Copilot, Cursor, etc. vary greatly. They range from ‘completely replacing me as a developer’ to ‘significantly hindering my work’. Some argue that coding agents considerably reduce the time they have to invest in “boilerplate” work, like writing tests, creating data transfer objects or connecting your domain code to external libraries. Others counter by pointing out that delegating these drudgeries to GenAI makes you miss opportunities to get rid of them, e.g. by introducing a new abstraction or automating parts of your pipeline, and to learn about the intricacies and failure modes of the external library. Other than old-school code generation or code libraries prompting a coding agent is not “just another layer of abstraction”. It misses out on several crucial aspects of a useful abstraction: Its output is not deterministic. You cannot rely on any agent producing the same code next time you feed it the same prompt. The agent does not hide the implementation details, nor does it allow you to reliably change those details if the previous implementation turns out to be inadequate. Code that is output by an LLM, even if it is generated “for free”, has to be considered and maintained each time you touch the related logic or feature. The agent does not tell you if the amount of details you give in your prompt is sufficient for figuring out an adequate implementation. On the contrary, the LLM will always fill the specification holes with some statistically derived assumptions. Sadly, serious studies on the actual benefits of GenAI in software development are rare. The randomised trial by Metr 19 provides an initial indication, measuring a decline in development speed for experienced developers. An informal study by ThoughtWorks estimates the potential productivity gain from using GenAI in software development at around 5-15% 20. If “AI coding” were increasing programmers’ productivity by any big number, we would see a measurable growth of new software in app stores and OSS repositories. But we don’t, the numbers are flat at best 2122. But even if we assume a productivity increase in coding through GenAI, there are still two points that further diminish this postulated efficiency gain: Firstly, the results of the generation must still be cross-checked by human developers. However, it is well known that humans are poor checkers and lose both attention and enjoyment in the process. Secondly, software development is only to a small extent about writing and changing code. The most important part is discovering solutions and learning about the use of these solutions in their context. Peter Naur calls this ‘programming as theory building’ 23. Even the perfect coding assistant can therefore only take over the coding part of software development. For the essential rest, we still need humans. If we now also consider the finding that using AI can relatively quickly lead to a loss of problem-solving skills 24 or that these skills are not acquired at all, then the overall benefit of using GenAI in professional software development is more than questionable. As long as programming - and every technicality that comes with it - will not be fully replaced by some kind of AI, we will still need expert developers who can programm, maintain and debug code to the finest level of detail. Where, we wonder, will those senior developers come from when companies replace their junior staff with coding agents? Actual vs. promised benefitsPermalink If you read testimonials about the use of GenAI that people perceive as successful, you will mostly encounter scenarios in which ‘AI’ helps to make tasks that are perceived as boring, unnecessarily time-consuming or actually pointless faster or more pleasant. So it’s mainly about personal convenience and perceived efficiency. Entertainment also plays a major role: the poem for Grandma’s birthday, the funny song for the company anniversary or the humorous image for the presentation are quickly and supposedly inexpensively generated by ‘AI’. However, the promises made by the dominant GenAI companies are quite different: solving the climate crisis, providing the best medical advice for everyone, revolutionising science, ‘democratising’ education and much more. GPT5, for example, is touted by Sam Altman, CEO of OpenAI, as follows: ‘With GPT-5, it’s now like talking to an expert — a legitimate PhD-level expert in any area you need […] they can help you with whatever your goals are.’ 25 However, to date, there is still no actual use case that provides a real qualitative benefit for humanity or at least larger groups. The question ‘What significant problem (for us as a society) does GenAI solve?’ remains unanswered. On the contrary: While machine learning and deep learning methods certainly have useful applications, the most profitable area of application for ‘AI’ at present is the discovery and development of new oil and gas fields 26. Harmful aspects of GenAIPermalink But regardless of how one assesses the benefits of this technology, we must also consider the downsides, because only then can we ultimately make an informed and fair assessment. In fact, the range of negative effects of hyperscaled generative AI that can already be observed is vast. Added to this are numerous risks that have the potential to cause great social harm. Let’s take a look at what we consider to be the biggest threats: GenAI is an ecological disasterPermalink PowerPermalink The data centres required for training and operating large generative models 27 far exceed today’s dimensions in terms of both number and size. The projected data centre energy demand in the USA is predicted to grow from 4.4% of total electricity in 2023 to 22% in 2028 28. In addition, the typical data centre electricity mix is more CO2-intensive than the average mix. There is an estimated raise of ~11 percent for coal generated electricity in the US, as well as tripled emissions of greenhouse gases worldwide by 2030 - compared to the scenario without GenAI technology 29. Just recently Sam Altman from OpenAI blogged some numbers about the energy and water usage of ChatGPT for “the average query” 30. On the one hand, an average is rather meaningless when a distribution is heavily unsymmetric; the numbers for queries with large contexts or “chain of reasoning” computations would be orders of magnitude higher. Thus, the potential efficiency gains from more economical language models are more than offset by the proliferation of use, e.g. through CoT approaches and ‘agent systems’. On the other hand, big tech’s disclosure of energy consumption (e.g. by Google 31) is intentionally selective. Ketan Joshi goes into quite some details why experts think that the AI industry is hiding the full picture 32. Since building new power plants - even coal or gas fuelled ones - takes a lot of time, data center companies are even reviving old jet engines for powering their new hyper-scalers 33. You have to be aware that those engines are not only much more noisy than other power plants but also pump out nitrous oxide, one of the main chemicals responsible for acid rain 34. WaterPermalink Another problem is the immensely high water consumption of these data centres 35. After all, cooling requires clean water in drinking quality in order to not contaminate or clog the cooling pipes and pumps. Already today, new data centre locations are competing with human consumption of drinking water. According to Bloomberg News about two-thirds of data-centers that were built or developed in 2022 are located in areas that are already under “water-stress” 36. In the US alone “AI servers […] could generate an annual water footprint ranging from 731 to 1,125 million m3” 37. It’s not only an American problem, though. In other areas of the world the water-thirsty data centers also compete with the drinking water supply for humans 38. Electronic WastePermalink Another ecological problem is being noticeably exacerbated by ‘AI’: the amount of electronic waste (e-waste) that we ship mainly to “Third World” countries and which is responsible for soil contamination there. Efficient training and querying of very large neural networks requires very large quantities of specialised chips (GPUs). These chips often have to be replaced and disposed of within two years. The typical data center might not last longer than 3 to 5 years before it has to be rebuilt in large parts39. In summary, it can be said that GenAI is at least an accelerator of the ecological catastrophe that threatens the earth. And it is the argument for Google, Amazon and Microsoft to completely abolish their zero CO2 targets 40 and replace them with investments of several hundred billion dollars for new data centers. GenAI threatens education and sciencePermalink People often try to use GenAI in areas where they feel overloaded and overwhelmed: training, studying, nursing, psychotherapeutic care, etc. The fields of application for ‘AI’ are therefore a good indication of socially neglected and underfunded areas. The fact that LLMs are very good at conveying the impression of genuine knowledge and competence makes their use particularly attractive in these areas. A teacher under the simultaneous pressure of lesson preparation, corrections and covering for sick colleagues turns to ChatGPT to quickly create an exercise sheet. A student under pressure to get good grades has their English essay corrected by ‘AI’. The researcher under pressure to publish will ‘save’ research time by reading the AI-generated summary of relevant papers – even if they are completely wrong in terms of content 41. Tech companies like OpenAI and Microsoft play on that situation by offering their ‘AI’ for free or for little money to students and universities. The goal is obvious: Students that get hooked on outsourcing some of their “tedious” task to a service will continue to use - and eventually buy - this service after graduation. What falls by the wayside are problem-solving skills, engagement with complex sources, and the generation of knowledge through understanding and supplementing existing knowledge. Some even argue that AI is destroying critical education and learning itself 42: Students aren’t just learning less; their brains are learning not to learn. The training cycle of schools and universities is fast. Teachers are already reporting that pupils and students have acquired noticeably less competence in recent years, but have instead become dependent on unreliable ‘tools’ 43. The real problem with using GenAI to do assignments is not cheating, but students “are not just undermining their ability to learn, but to someday lead.” 44 GenAI is destroying the free internet.Permalink The fight against bots on the internet is almost as old as the internet itself – and has been quite successful so far. Multifactor authentication, reCaptcha, honeypots and browser fingerprinting are just a few of the tools that help protect against automated abuse. However, GenAI takes this problem to a new level – in two ways. To make ‘the internet’ usable as the main source for training LLMs, AI companies use so-called ‘crawlers’. These essentially behave like DDoS attackers: They send tens of thousands of requests at once, from several hundred IPs in a very short time. Robot.txt files are ignored; instead, the source IP and user agent are obscured 45. These practices have massive disadvantages for providers of genuine content: Costs for additional bandwidth. Lost advertising revenue, as search engines now offer LLM-generated summaries instead of links to the sources. This threatens the existence of remaining independent journalism in particular 46. Misuse of own content for AI-supported competition. If the place where knowledge is generated is separated from the place where it is consumed, and if this makes the performance of generation even more opaque than before, the motivation to continue generating knowledge also declines. For projects such as Wikipedia, this means fewer donors and fewer contributors. Open communities often have no other option but to shut themselves off. Another aspect is the flooding of the internet with generated content that cannot be automatically distinguished from non-generated content. This content overwhelms the maintainers of open source software or portals such as Wikipedia 47. If this content is then also entered by humans – often in the belief that they are doing good – it is no longer possible to take action against the methodology. In the long run, this means that less and less authentic training material will lead to increasingly poor results from the models. Last but not least, autonomously acting agents make the already dire state of internet security much worse 48. Think of handing all your personal data and credentials to a robot that is distributing and using that data across the web, wherever and whenever it deems it necessary for reaching some goal. is controlled by LLMs who are vulnerable to all kinds of prompt injection attacs 49. is controlled by and reporting to companies that do not have your best interest in mind. has no awareness and knowledge about the implication of its actions. is acting on your behalf and thereby making you accountable. GenAI is a danger to democracyPermalink The manipulation of public opinion through social media precedes the arrival of LLMs. However, this technology gives the manipulators much more leverage. By flooding the web with fake news, fake videos and fake everything undemocratic (or just criminal) parties make it harder and harder for any serious media and journalism to get the attention of the public. People no longer have a common factual basis, which is necessary for all social negotiations. If you don’t agree on at least some basic facts, arguing about policies and measures to take is pointless. Without negotiations democracy will be dying; in many parts of the world it already is. GenAI versus human creativityPermalink Art and creativity are also threatened by generative AI. The impact on artists’ incomes of logos, images and illustrations now being easily and quickly created by AI prompts is obvious. A similar effect can also be observed in other areas. Studies show that poems written by LLMs are indistinguishable from those written by humans and that generative AI products are often rated more highly 50. This can be explained by a trend towards the middle and the average, which can also be observed in the music and film scenes film scene: due to its basic function, GenAI cannot create anything fundamentally new, but replicates familiar patterns, which is precisely why it is so well received by the public. Ironically, ‘AI’ draws its ‘creativity’ from the content of those it seeks to replace. Much of this content was used as training material against the will of the rights holders. Whether this constitutes a copyright infringement has not yet been decided; morally, the situation seems clear. The creative community is the first to be seriously threatened by GenAI in its livelihood 51. It’s not a coincidence that a big part of GenAI efforts is targeted at “democratizing art”. This framing is completely upside down. Art has been one of the most democratic activities for a very long time. Everybody can do it; but not everybody wants to do put in the effort, the practicing time and the soul. Real art is not about the product but about the process, which requires real humans. Generating art without the friction is about getting rid of the humans in the loop - and still making money. Digital colonialismPermalink The huge amount of data required by hyperscaled AI approaches makes it impossible to completely curate the learning content. And yet, one would like to avoid the reproduction of racist, inhuman and criminal content. Attempts are being made to get the problem under control by subsequently adapting the models to human preferences and local laws through additional ‘reinforcement learning from human feedback (RLHF)’ 52. The cheap labour for this very costly process can be found in the Global South. There, people are exposed to hours of hate speech, child abuse, domestic violence and other horrific scenarios in their poorly paid jobs in order to filter them out of the training material of large AI companies 53. Many emerge from these activities traumatised. However, it is not only people who are exploited in the less developed regions of the world, but also nature: the poisoning of the soil with chemicals during the extraction of raw materials for digital chips, as well as the contamination caused by our electronic waste and its improper disposal, are collateral damage that we willingly accept and whose long-term consequences are currently extremely difficult to assess. Here, too, the “developed” world profits, whereas the negative aspects are outsourced to the former colonies and other poor regions of the world. Political aspectsPermalink As software developers, we would like to ‘leave politics out of it’ and instead focus entirely on the cool tech. However, this is impossible when the advocates of this technology pursue strong political and ideological goals. In the case of GenAI, we can cleary see that the US corporations behind it (OpenAI, Google, Meta, Microsoft, etc.) have no problem with the current authoritarian – some say fascist – US government 54. In concrete terms, this means, among other things, that the models are explicitly manipulated to be less liberal or simply not to generate any output that could upset the CEO or the president 55. Even more serious is the fact that many of the leading minds behind these corporations and their financiers adhere to beliefs that can be broadly described as digital fascism. These include Peter Thiel, Marc Andreessen, Alex Karp, JD Vance, Elon Musk and many others on “The Authoritarian Stack” 56. Their ideologies, disguised as rational theories, are called longtermism and effective altruism. What they have in common is that they consider democracy and the state to be obsolete models, compassion to be ‘woke’, and that the current problems of humanity are insignificant, as our future lies in the colonisation of space and the merging of humans with artificial superintelligence 57. Do we want to give people who adhere to these ideologies (even) more power, money and influence by using and paying for their products? Do we want to feed their computer systems with our data? Do we really want to expose ourselves and our children to the answers from chatbots which they have manipulated? Not quite as abstruse, but similarly misanthropic, is the imminent displacement of many jobs by AI, as postulated by the same corporations in order to put pressure on employees with this claim. Demanding a large salary? Insisting on your legal rights? Complaining about too much workload? Doubts about the company’s goals? Then we’ll just replace you with cheap and uncomplaining AI! Whichever way you look at it, AI and GenAI are already being used politically. If we go along without resistance, we are endorsing this approach and supporting it with our time, our attention and our money. ConclusionPermalink Ideally, we would like to quantify our assessment by adding up the advantages, adding up the disadvantages and finally checking whether the balance is positive or negative. Unfortunately, in our specific case, neither the benefits nor the harm are easily quantifiable; we must therefore consult our social and personal values. Discussions about GenAI usually revolve purely around its benefits. Often, the capabilities of all ‘AI’ technologies (e.g. protein folding with AlphaFold 2) are lumped together, even though they have little in common with hyperscaling GenAI. However, if we consider the consequences and do not ignore the problems this technology entails – i.e. if we consider both sides in terms of ethics – the assessment changes. Convenience, speed and entertainment are then weighed against numerous damages and risks to the environment, the state and humanity. In this sense, the ethical use and further expansion of GenAI in its current form is not possible. Can there be ethical GenAI?Permalink If the use of GenAI is not ethical today what would have to change, which negative effects of GenAI would have to disappear or at least be greatly reduced in order to tip the balance between benefits and harms in the other direction? The models would have to be trained exclusively with publicly known content whose original creators consent to its use in training AI models. The environmental damage would have to be reduced to such an extent that it does not further fuel the climate crisis. Society would have to get full access to the training and operation of the models in order to rule out manipulation by third parties and restrict their use to beneficial purposes. This would require democratic processes, good regulation and oversight through judges and courts. The misuse and harming of others, e.g., through copyright theft or digital colonialism, would have to be prevented. Is such a change conceivable? Perhaps. Is it likely, given the interest groups and political aspects involved? Probably not
All these factors are achievable I think, or will be soonish. Smaller models, better sourced data sets, niche models, etc. But not with current actors as mentioned at the end.
Reviewer #2 (Public review):
Summary:
This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.
Strengths:
(1) Rigid model comparison and parameter recovery procedure.
(2) Conceptually comprehensive model space.
(3) Well-powered samples.
Weaknesses:
A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-by-trial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.
This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.
Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.
Finally, the two age groups compared-adolescents (high school students) and adults (university students)-differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.
Comments on revisions:
The authors have adequately addressed my previous comments.
Author response:
The following is the authors’ response to the original reviews.
Public reviews:
Reviewer #1 (Public review):
Summary:
Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.
Strengths:
The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and well-structured.
We thank the reviewer for recognizing the strengths of our work.
Weaknesses:
(1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.
We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7).
We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis. These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.
(2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.
We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma.
However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.
Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.
“Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoff-dominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”
Appraisal & Discussion:
(3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.
We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to (1).
Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:
“The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”
(4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.
We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:
“Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.
(5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.
We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to 2).
(6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.
We thank the reviewer for the professional comments, which have helped us improve our work.
Reviewer #2 (Public review):
Summary:
This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.
Strengths:
(1) Rigid model comparison and parameter recovery procedure.
(2) Conceptually comprehensive model space.
(3) Well-powered samples.
We thank the reviewer for highlighting the strengths of our work.
Weaknesses:
A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.
We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.
Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.
This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.
We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9: Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).
The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (see Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.
Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.
We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α− (ΔBICquadratic-linear = 3.04), β (ΔBICquadratic-linear = 3.9), or ω (ΔBICquadratic-linear = 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.
Author response image 1.
Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.<br />

Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.
We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure.
Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement: “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”
Reviewer #3 (Public review):
Summary:
Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.
To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.
Strengths:
The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.
We thank the reviewer’s recognition of our work’s strengths.
Weaknesses:
Some of the claims made are not fully supported by the data:
The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.
We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs.
In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.
It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.
We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.
Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.
Author response image 2.
Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the best-fitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.
Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.
We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions.
Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.
The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.
We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.
“Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”
We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.
Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.
We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section:
“Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”
Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.
We are grateful for the reviewer’s generous appraisal and insightful comments.
Recommendations for the authors
Reviewer #1 (Recommendations for the authors):
I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.
We thank the reviewer for commending the organization of our paper.
Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.
We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).
Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:
“The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”
Methods:
Did the participants know how long the PD would go on for?
Were the participants informed that the partner was real/simulated?
Were the participants informed that the partner was going to be the same for all rounds?
We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:
“Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”
The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?
We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.
Why was AIC chosen rather an BIC to compare model dominance?
Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.
“We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”
I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis; see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).
We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.
Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.
The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.
Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).
We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.
We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.
“Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”
A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).
We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.
It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.
We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section:
“Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”
Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).
We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.
Reviewer #2 (Recommendations for the authors):
The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).
We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:
“We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”
Reviewer #3 (Recommendations for the authors):
In addition to the points mentioned above, I suggest the following:
(1) Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs. 1,2,3 were not immediately understandable.
We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly:
“The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials; t<sub>−1,2,3</sub>: last three trials).”
It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.
We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.
The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.
We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub>and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.
Interpretation of parameters should closely reflect what they specifically measure.
We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.
Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.
We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.
More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.
We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.
I hope these comments are helpful and support the authors in further strengthening their manuscript.
We thank the three reviewers for their comments, which have been helpful in strengthening this work.
Reference
(1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111
(2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581
(3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080
(4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L.,Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.,Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315
(5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717
(6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15, 1342458. https://doi.org/10.3389/fpsyg.2024.1342458
(7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. NatureCommunications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8
(8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062
(9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9
(10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015
(11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705
(12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136
(13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636
(14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x
(15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018
(16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127.
Reviewer #2 (Public review):
The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
Zhang and colleagues examine neural representations underlying abstract navigation in the entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors claim to identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Finally, the authors propose an EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. While the wide array of techniques used is impressive and their creativity in analysis is admirable, overall, I found the paper a bit confusing and unconvincing. I recommend a significant rewrite of their paper to motivate their methods and clarify what they actually did and why. The claim of three-fold modulation in HC, while potentially highly interesting to the community, needs more background to motivate why they did the analysis in the first place, more interpretation as to why this would emerge in biology, and more care taken to consider alternative hypotheses seeped in existing models of HC function. I think this paper does have potential to be interesting and impactful, but I would like to see these issues improved first.
General comments:
(1) Some of the terminology used does not match the terminology used in previous relevant literature (e.g., sinusoidal analysis, 1D directional domain).
We thank the reviewer for this valuable suggestion, which helps to improve the consistency of our terminology with previous literature and to reduce potential ambiguity. Accordingly, we have replaced “sinusoidal analysis” with “sinusoidal modulation” (Doeller et al., 2010; Bao et al., 2019; Raithel et al., 2023) and “1D directional domain” with “angular domain of path directions” throughout the manuscript.
(2) Throughout the paper, novel methods and ideas are introduced without adequate explanation (e.g., the spectral analysis and three-fold periodicity of HC).
We thank the reviewer for raising this important point. In the revised manuscript, we have substantially extended the Introduction (paragraphs 2–4) to clarify our hypothesis, explicitly explaining why the three primary axes of the hexagonal grid cell code may manifest as vector fields. We have also revised the first paragraph of the “3-fold periodicity in the HPC” section in the Results to clarify the rationale for using spectral analysis. Please refer to our responses to comment 2 and 3 below for details.
Reviewer #2 (Public review):
The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses are thoroughly done, and the results and simulations are very interesting.
We sincerely thank the reviewer for the positive and encouraging comments on our study.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
(1) This paper has quite a few spelling and grammatical mistakes, making it difficult to understand at times.
We apologize for the wordings and grammatical errors. We have thoroughly re-read and carefully edited the entire manuscript to correct typographical and grammatical errors, ensuring improved clarity and readability.
(2) Introduction - It's not clear why the three primary axes of hexagonal grid cell code would manifest as vector fields.
We thank the reviewer for raising this important point. In the revised Introduction (paragraphs 2, 3, and 4), we now explicitly explain the rationale behind our hypothesis that the three primary axes of the hexagonal grid cell code manifest as vector fields.
In paragraph 2, we present empirical evidence from rodent, bat, and human studies demonstrating that mental simulation of prospective paths relies on vectorial representations in the hippocampus (Sarel et al., 2017; Ormond and O’Keefe, 2022; Muhle-Karbe et al., 2023).
In paragraphs 3 and 4, we introduce our central hypothesis: vectorial representations may originate from population-level projections of entorhinal grid cell activity, based on three key considerations:
(1) The EC serves as the major source of hippocampal input (Witter and Amaral, 1991; van Groen et al., 2003; Garcia and Buffalo, 2020).
(2) Grid codes exhibit nearly invariant spatial orientations (Hafting et al., 2005; Gardner et al., 2022), which makes it plausible that their spatially periodic activity can be detected using fMRI.
(3) A model-based inference: for example, in the simplest case, when one mentally simulates a straight pathway aligned with the grid orientation, a subpopulation of grid cells would be activated. The resulting population activity would form a near-perfect vectorial representation, with constant activation strength along the path. In contrast, if the simulated path is misaligned with the grid orientation, the population response becomes a distorted vectorial code. Consequently, simulating all possible straight paths spanning 0°–360° results in 3-fold periodicity in the activity patterns—due to the 180° rotational symmetry of the hexagonal grid, orientations separated by 180° are indistinguishable.
We therefore speculate that vectorial representations embedded in grid cell activity exhibit 3-fold periodicity across spatial orientations and serve as a periodic structure to represent spatial direction. Supporting this view, reorientation paradigms in both rodents and young children have shown that subjects search equally in two opposite directions, reflecting successful orientation encoding but a failure to integrate absolute spatial direction (Hermer and Spelke, 1994; Julian et al., 2015; Gallistel, 2017; Julian et al., 2018).
(3) It took me a few reads to understand what the spectral analysis was. After understanding, I do think this is quite clever. However, this paper needs more motivation to understand why you are performing this analysis. E.g., why not just take the average regressor at the 10º, 70º, etc. bins and compare it to the average regressor at 40º, 100º bins? What does the Fourier transform buy you?
We are sorry for the confusion. we outline the rationale for employing Fast Fourier Transform (FFT) analysis to identify neural periodicity. In the revised manuscript, we have added these clarifications into the first paragraph of the “3-fold periodicity in the HPC” subsection in the Results.
First, FFT serves as an independent approach to cross-validate the sinusoidal modulation results, providing complementary evidence for the 6-fold periodicity in EC and the 3-fold periodicity in HPC.
Second, FFT enables unbiased detection of multiple candidate periodicities (e.g., 3–7-fold) simultaneously without requiring prior assumptions about spatial phase (orientation). By contrast, directly comparing “aligned” versus “misaligned” angular bins (e.g., 10°/70° vs. 40°/100°) would implicitly assume knowledge of the phase offset, which was not known a priori.
Finally, FFT uniquely allows periodicity analysis of behavioral performance, which is not feasible with standard sinusoidal GLM approaches. This methodological consistency makes it possible to directly compare periodicities across neural and behavioral domains.
(4) A more minor point: at one point, you say it’s a spectral analysis of the BOLD signals, but the methods description makes it sound like you estimated regressors at each of the bins before performing FFT. Please clarify.
We apologize for the confusion. In our manuscript, we use the term spectral analysis to distinguish this approach from sinusoidal modulation analysis. Conceptually, our spectral analysis involves a three-level procedure:
(1) First level: We estimated direction-dependent activity maps using a general linear model (GLM), which included 36 regressors corresponding to path directions, down-sampled in 10° increments.
(2) Second level: We applied a Fast Fourier Transform (FFT) to the direction-dependent activity maps derived from the GLM to examine the spectral magnitude of potential spatial periodicities.
(3) Third level: We conducted group-level statistical analyses across participants to assess the consistency of the observed periodicities.
We have revised the “Spectral analysis of MRI BOLD signals” subsection in the Methods to clarify this multi-level procedure.
(5) Figure 4a:
Why do the phases go all the way to 2*pi if periodicity is either three-fold or six-fold?
When performing correlation between phases, you should perform a circular-circular correlation instead of a Pearson's correlation.
We thank the reviewer for raising this important point. In the original Figure 4a, both EC and HPC phases spanned 0–2π because their sinusoidal phase estimates were projected into a common angular space by scaling them according to their symmetry factors (i.e., multiplying the 3-fold phase by 3 and the 6-fold phase by 6), followed by taking the modulo 2π. However, this projection forced signals with distinct intrinsic periodicities (120° vs. 60° cycles) into a shared 360° space, thereby distorting their relative angular distances and disrupting the one-to-one correspondence between physical directions and phase values. Consequently, this transformation could bias the estimation of their phase relationship.
In the revised analysis and Figure 4a, we retained the original phase estimates derived from the sinusoidal modulation within their native periodic ranges (0–120° for 3-fold and 0–60° for 6-fold) by applying modulo operations directly. Following your suggestion, the relationship between EC and HPC phases was then quantified using circular–circular correlation (Jammalamadaka & Sengupta, 2001), as implemented in the CircStat MATLAB toolbox. This updated analysis avoids the rescaling artifact and provides a statistically stronger and conceptually clearer characterization of the phase correspondence between EC and HPC.
(6) Figure 4d needs additional clarification:
Phase-locking is typically used to describe data with a high temporal precision. I understand you adopted an EEG analysis technique to this reconstructed fMRI time-series data, but it should be described differently to avoid confusion. This needs additional control analyses (especially given that 3 is a multiple of 6) to confirm that this result is specific to the periodicities found in the paper.
We thank the reviewer for this insightful comment. We have extensively revised the description of the Figure 4 to avoid confusion with EEG-based phase-locking techniques. The revised text now explicitly clarifies that our approach quantifies spatial-domain periodic coupling across path directions, rather than temporal synchronization of neural signals.
To further address the reviewer’s concern about potential effects of the integer multiple relationship between the 3-fold HPC and 6-fold EC periodicities, we additionally performed two control analyses using the 9-fold and 12-fold EC components, both of which are also integer multiples of the 3-fold HPC periodicity. Neither control analysis showed significant coupling (p > 0.05), confirming that the observed 3-fold–6-fold coupling was specific and not driven by their harmonic relationship.
The description of the revised Figure 4 has been updated in the “Phase Synchronization Between HPC and EC Activity” subsection of the Results.
(7) Figure 5a is misleading. In the text, you say you test for propagation to egocentric cortical areas, but I don’t see any analyses done that test this. This feels more like a possible extension/future direction of your work that may be better placed in the discussion.
We are sorry for the confusion. Figure 5a was intended as a hypothesis-driven illustration to motivate our analysis of behavioral periodicity based on participants’ task performance. However, we agree with the reviewer that, on its own, Figure 5a could be misleading, as it does not directly present supporting analyses.
To provide empirical support for the interpretation depicted in Figure 5a, we conducted a whole-brain analysis (Figure S8), which revealed significant 3-fold periodic signals in egocentric cortical regions, including the parietal cortex (PC), precuneus (PCU), and motor regions.
To avoid potential misinterpretation, we have revised the main text to include these results and explicitly referenced Figure S8 in connection with Figure 5a.
The updated description in the “3-fold periodicity in human behavior” subsection in the Results is as follows:
“Considering the reciprocal connectivity between the medial temporal lobe (MTL), where the EC and HPC reside, and the parietal cortex implicated in visuospatial perception and action, together with the observed 3-fold periodicity within the DMN (including the PC and PCu; Fig. S8), we hypothesized that the 3-fold periodic representations of path directions extend beyond the MTL to the egocentric cortical areas, such as the PC, thereby influencing participants' visuospatial task performance (Fig. 5a)”.
Additionally, Figure 5a has been modified to more clearly highlight the hypothesized link between activity periodicity and behavioral periodicity, rather than suggesting a direct anatomical pathway.
(8) PhaseSync model: I am not an expert in this type of modeling, so please put a lower weight on this comment (especially compared to some of the other reviewers). While the PhaseSync model seems interesting, it’s not clear from the discussion how this compares to current models. E.g., Does it support them by adding the three-fold HC periodicity? Does it demonstrate that some of them can't be correct because they don't include this three-fold periodicity?
We thank the reviewer for the insightful comment regarding the PhaseSync model. We agree that further clarifying its relationship to existing computational frameworks is important.
The EC–HPC PhaseSync model is not intended to replace or contradict existing grid–place cell models of navigation (e.g., Bicanski and Burgess, 2019; Whittington et al., 2020; Edvardsen et al., 2020). Instead, it offers a hierarchical extension by proposing that vectorial representations in the hippocampus emerge from the projections of periodic grid codes in the entorhinal cortex. Specifically, the model suggests that grid cell populations encode integrated path information, forming a vectorial gradient toward goal locations.
To simplify the theoretical account, our model was implemented in an idealized square layout. In more complex real-world environments, hippocampal 3-fold periodicity may interact with additional spatial variables, such as distance, movement speed, and environmental boundaries.
We have revised the final two paragraphs of the Discussion to clarify this conceptual framework and emphasize the importance of future studies in exploring how periodic activity in the EC–HPC circuit interacts with environmental features to support navigation.
Reviewer #2 (Recommendations for the authors):
(1) Please show a histogram of movement direction sampling for each participant.
We thank the reviewer for this helpful suggestion. We have added a new supplementary figure (Figure S2) showing histograms of path direction sampling for each participant (36 bins of 10°). The figure is also included. Rayleigh tests for circular uniformity revealed no significant deviations from uniformity (all ps > 0.05, Bonferroni-corrected across participants), confirming that path directions were sampled evenly across 0°–360°.
(2) Why didn’t you use participants’ original trajectories (instead of the trajectories inferred from the movement start and end points) for the hexadirectional analyses?
In our paradigm, participants used two MRI-compatible 2-button response boxes (one for each hand) to adjust the two features of the greebles. As a result, the raw adjustment path contained only four cardinal directions (up, down, left, right). If we were to use the raw stepwise trajectories, the analysis would be restricted to these four directions, which would severely limit the angular resolution. By instead defining direction as the vector from the start to the end position in feature space, we can expand the effective range of directions to the full 0–360°. This approach follows previous literature on abstract grid-like coding in humans (e.g., Constantinescu et al., 2016), where direction was similarly defined by the relative change between two feature dimensions rather than the literal stepwise path. We have added this clarification in the “Sinusoidal modulation” subsection of the revised method.
(3) Legend of Figure 2: the statement "localizing grid cell activity" seems too strong because it is still not clear whether hexadirectional signals indeed result from grid-cell activity (e.g., Bin Khalid et al., eLife, 2024). I would suggest rephrasing this statement (here and elsewhere).
Thank you for this helpful suggestion. We have removed the statement “localizing grid cell activity” to avoid ambiguity and revised the legend of Figure 2a to more explicitly highlight its main purpose—defining how path directions and the aligned/misaligned conditions were constructed in the 6-fold modulation. We have also modified similar expressions throughout the manuscript to ensure consistency and clarity.
(4) Legend of Figure 2: “cluster-based SVC correction for multiple comparisons” - what is the small volume you are using for the correction? Bilateral EC?
For both Figure 2 and Figure 3, the anatomical mask of the bilateral medial temporal lobe (MTL), as defined by the AAL atlas, was used as the small volume for correction. This has been clarified in the revised Statistical Analysis section of the Methods as “… with small-volume correction (SVC) applied within the bilateral MTL”.
(5) Legend of Figure 2: "ROI-based analysis" - what kind of ROI are you using? "corrected for multiple comparisons" - which comparisons are you referring to? Different symmetries and also the right/left hemisphere?
In Figure 2b, the ROI was defined as a functional mask derived from the significant activation cluster in the right entorhinal cortex (EC). Since no robust clusters were observed in the left EC, the functional ROI was restricted to the right hemisphere. We indeed included Figure 2c to illustrate this point; however, we recognize that our description in the text was not sufficiently clear.
Regarding the correction for multiple comparisons, this refers specifically to the comparisons across different rotational symmetries (3-, 4-, 5-, 6-, and 7-fold). Only the 6-fold symmetry survived correction, whereas no significant effects were detected for the other symmetries.
We have clarified these points in the “6-fold periodicity in the EC” subsection of the result as “… The ROI was defined as a functional mask of the right EC identified in the voxel-based analysis and further restricted within the anatomical EC. These analyses revealed significant periodic modulation only at 6-fold (Figure 2c; t(32) = 3.56, p = 0.006, two-tailed, corrected for multiple comparisons across rotational symmetries; Cohen’s d = 0.62) …”.
We have also revised the “3-fold periodicity in the HPC” subsection of the result as “… ROI analysis, using a functional mask of the HPC identified in the spectral analysis and further restricted within the anatomical HPC, indicated that HPC activity selectively fluctuated at 3-fold periodicity (Figure 3e; t(32) = 3.94, p = 0.002, corrected for multiple comparisons across rotational symmetries; Cohen’s d = 0.70) …”.
(6) Figure 2d: Did you rotationally align 0{degree sign} across participants? Please state explicitly whether (or not) 0{degree sign} aligns with the x-axis in Greeble space.
We thank the reviewer for this helpful question. Yes, before reconstructing the directional tuning curve in Figure 2d, path directions were rotationally aligned for each participant by subtracting the participant-specific grid orientation (ϕ) estimated from the independent dataset (odd sessions). We have now made this description explicit in the revised manuscript in the “6-fold periodicity in the EC” subsection of the Results, stating “… To account for individual difference in spatial phase, path directions were calibrated by subtracting the participant-specific grid orientation estimated from the odd sessions ...”.
(7) Clustering of grid orientations in 30 participants: What does “Bonferroni corrected” refer to? Also, the Rayleigh test is sensitive to the number of voxels - do you obtain the same results when using pair-wise phase consistency?
“Bonferroni corrected” here refers to correction across participants. We have clarified this in the first paragraph of the “6-fold periodicity in the EC” subsection of the Result and in the legend of Supplementary Figure S5 as “Bonferroni-corrected across participants.”
To examine whether our findings were sensitive to the number of voxels, we followed the reviewer’s guidance to compute pairwise phase consistency (PPC; Vinck et al., 2010) for each participant. The PPC results replicated those obtained with the Rayleigh test. We have updated the new results into the Supplementary Figure S5. We also updated the “Statistical Analysis” subsection of the Methods to describe PPC as “For the PPC (Vinck et al., 2010), significance was tested using 5,000 permutations of uniformly distributed random phases (0–2π) to generate a null distribution for comparison with the observed PPC”.
(8) 6-fold periodicity in the EC: Do you compute an average grid orientation across all EC voxels, or do you compute voxel-specific grid orientations?
Following the protocol originally described by Doeller et al. (2010), we estimated voxel-wise grid orientations within the EC and then obtained a participant-specific orientation by averaging across voxels within a hand-drawn bilateral EC mask. The procedure is described in detail in the “Sinusoidal modulation” subsection of the Methods.
(9) Hand-drawn bilateral EC mask: What was your procedure for drawing this mask? What results do you get with a standard mask, for example, from Freesurfer or SPM? Why do you perform this analysis bilaterally, given that the earlier analysis identified 6-fold symmetry only in the right EC? What do you mean by "permutation corrected for multiple comparisons"?
We thank the reviewer for raising these important methodological points. To our knowledge, no standard volumetric atlas provides an anatomically defined entorhinal cortex (EC) mask. For example, the built-in Harvard–Oxford cortical structural atlas in FSL contains only a parahippocampal region that encompasses, but does not isolate, the EC. The AAL atlas likewise does not contain an EC region. In FreeSurfer, an EC label is available, but only in the fsaverage surface space, which is not directly compatible with MNI-based volumetric group-level analyses.
Therefore, we constructed a bilateral EC mask by manually delineating the EC according to the detailed anatomical landmarks described by Insausti et al. (1998). Masks were created using ITK-SNAP (Version 3.8, www.itksnap.org). For transparency and reproducibility, the mask has been made publicly available at the Science Data Bank (link: https://www.scidb.cn/s/NBriAn), as indicated in the revised Data and Code availability section.
Regarding the use of a bilateral EC mask despite voxel-wise effects being strongest in the right EC. First, we did not have any a priori hypothesis regarding laterality of EC involvement before performing analyses. Second, previous studies estimated grid orientation using a bilateral EC mask in their sinusoidal analyses (Doeller et al., 2010; Constantinescu et al., 2016; Bao et al., 2019; Wagner et al., 2023; Raithel et al., 2023). We therefore followed this established approach to estimate grid orientation.
By “permutation corrected for multiple comparisons” we refer to the family-wise error correction applied to the reconstructed directional tuning curves (Figure 2d for the EC, Figure 3f for the HPC). Specifically, directional labels were randomly shuffled 5,000 times, and an FFT was applied to each shuffled dataset to compute spectral power at each fold. This procedure generated null distributions of spectral power for each symmetry. For each fold, the 95th percentile of the maximal power across permutations was used as the uncorrected threshold. To correct across folds, the 95th percentile of the maximal suprathreshold power across all symmetries was taken as the family-wise error–corrected threshold. We have clarified this procedure in the revised “Statistical Analysis” subsection of the Methods.
(10) Figures 3b and 3d: Why do different hippocampal voxels show significance for the sinusoidal versus spectral analysis? Shouldn’t the analyses be redundant and, thus, identify the same significant voxels?
We thank the reviewer for this insightful question. Although both sinusoidal modulation and spectral analysis aim to detect periodic neural activity, the two approaches are methodologically distinct and are therefore not expected to identify exactly the same significant voxels.
Sinusoidal modulation relies on a GLM with sine and cosine regressors to test for phase-aligned periodicity (e.g., 3-fold or 6-fold), calibrated according to the estimated grid orientation. This approach is highly specific but critically depends on accurate orientation estimation. In contrast, spectral analysis applies Fourier decomposition to the directional tuning profile, enabling the detection of periodic components without requiring orientation calibration.
Accordingly, the two analyses are not redundant but complementary. The FFT approach allows for an unbiased exploration of multiple candidate periodicities (e.g., 3–7-fold) without predefined assumptions, thereby providing a critical cross-validation of the sinusoidal GLM results. This strengthens the evidence for 6-fold periodicity in EC and 3-fold periodicity in HPC. Furthermore, FFT uniquely facilitates the analysis of periodicities in behavioral performance data, which is not feasible with standard sinusoidal GLM approaches. This methodological consistency enables direct comparison of periodicities across neural and behavioral domains.
Additionally, the anatomical distributions of the HPC clusters appear more similar between Figure 3b and Figure 3d after re-plotting Figure 3d using the peak voxel coordinates (x = –24, y = –18), which are closer to those used for Figure 3b (x = –24, y = –20), as shown in the revised Figure 3.
Taken together, the two analyses serve distinct but complementary purposes.
(11) 3-fold sinusoidal analysis in hippocampus: What kind of small volume are you using to correct for multiple comparisons?
We thank the reviewer for this comment. The same small volume correction procedure was applied as described in R4. Specifically, the anatomical mask of the bilateral medial temporal lobe (MTL), as defined by the AAL atlas, was used as the small volume for correction. This procedure has been clarified in the revised Statistical Analysis section of the Methods as following: “… with small-volume correction (SVC) applied within the bilateral MTL.”
(12) Figure S5: “right HPC” – isn’t the cluster in the left hippocampus?
We are sorry for the confusion. The brain image was present in radiological orientation (i.e., the left and right orientations are flipped). We also checked the figure and confirmed that the cluster shown in the original Figure S5 (i.e., Figure S6 in the revised manuscript) is correctly labeled as the right hippocampus, as indicated by the MNI coordinate (x = 22), where positive x values denote the right hemisphere. To avoid potential confusion, we have explicitly added the statement “Volumetric results are displayed in radiological orientation” to the figure legends of all volume-based results.
(13) Figure S5: Why are the significant voxels different from the 3-fold symmetry analysis using 10{degree sign} bins?
As shown in R10, the apparent differences largely reflect variation in MNI coordinates. After adjusting for display coordinates, the anatomical locations of the significant clusters are in fact highly similar between the 10°-binned (Figure 3d, shown above) and the 20°-binned results (Figure S6).
Although both analyses rely on sinusoidal modulation, they differ in the resolution of the input angular bins (10° vs. 20°). Combined with the inherent noise in fMRI data, this makes it unlikely that the two approaches would yield exactly the same set of significant voxels. Importantly, both analyses consistently reveal robust 3-fold periodicity in the hippocampus, indicating that the observed effect is not dependent on angular bin size.
(14) Figure 4a and corresponding text: What is the unit? Phase at which frequency? Are you using a circular-circular correlation to test for the relationship?
We thank the reviewer for raising this important point. In the revised manuscript, we have clarified that the unit of the phase values is radians, corresponding to the 6-fold periodic component in the EC and the 3-fold periodic component in the HPC. In the original Figure 4a, both EC and HPC phases—estimated from sinusoidal modulation—were analyzed using Pearson correlation. We have since realized issues with this approach, as also noted R5 to Reviewer #1.
In the revised analysis and Figure 4a (as shown above), we re-evaluated the relationship between EC and HPC phases using a circular–circular correlation (Jammalamadaka & Sengupta, 2001), implemented in the CircStat MATLAB toolbox. The “Phase synchronization between the HPC and EC activity” subsection of the Result has been accordingly updated as following:
“To examine whether the spatial phase structure in one region could predict that in another, we tested whether the orientations of the 6-fold EC and 3-fold HPC periodic activities, estimated from odd-numbered sessions using sinusoidal modulation with rotationally symmetric parameters (in radians), were correlated across participants. A cross-participant circular–circular correlation was conducted between the spatial phases of the two areas to quantify the spatial correspondence of their activity patterns (EC: purple dots; HPC: green dots) (Jammalamadaka & Sengupta, 2001). The analysis revealed a significant circular correlation (Figure 4a; r = 0.42, p < 0.001) …”.
In the “Statistical analysis” subsection of the method:
“… The relationship between EC and HPC phases was evaluated using the circular–circular correlation (Jammalamadaka & Sengupta, 2001) implemented in the CircStat MATLAB toolbox …”.
(15) Paragraph following “We further examined amplitude-phase coupling...” - please clarify what data goes into this analysis.
We thank the reviewer for this helpful comment. In this analysis, the input data consisted of hippocampal (HPC) phase and entorhinal (EC) amplitude, both extracted using the Hilbert transform from the reconstructed BOLD signals of the EC and HPC derived through sinusoidal modulation. We have substantially revised the description of the amplitude–phase coupling analysis in the third paragraph of the “Phase Synchronization Between HPC and EC Activity” subsection of the Results to clarify this procedure.
(16) Alignment between EC 6-fold phases and HC 3-fold phases: Why don't you simply test whether the preferred 6-fold orientations in EC are similar to the preferred 3-fold phases in HC? The phase-amplitude coupling analyses seem sophisticated but are complex, so it is somewhat difficult to judge to what extent they are correct.
We thank the reviewer for this thoughtful comment. We employed two complementary analyses to examine the relationship between EC and HPC activity. In the revised Figure 4 (as shown in Figure 4 for Reviewer #1), Figure 4a provides a direct and intuitive measure of the phase relationship between the two regions using circular–circular correlation. Figure 4b–c examines whether the activity peaks of the two regions are aligned across path directions using cross-frequency amplitude–phase coupling, given our hypothesis that the spatial phase of the HPC depends on EC projections. These two analyses are complementary: a phase correlation does not necessarily imply peak-to-peak alignment, and conversely, peak alignment does not always yield a statistically significant phase correlation. We therefore combined multiple analytical approaches as a cross-validation across methods, providing convergent evidence for robust EC–HPC coupling.
(17) Figure 5: Do these results hold when you estimate performance just based on “deviation from the goal to ending locations” (without taking path length into account)?
We thank the reviewer for this thoughtful suggestion. Following the reviewer’s advice, we re-estimated behavioral performance using the deviation between the goal and ending locations (i.e., error size) and path length independently. As shown in the new Figure S9, no significant periodicity was observed in error size (p > 0.05), whereas a robust 3-fold periodicity was found for path length (p < 0.05, corrected for multiple comparisons).
We employed two behavioral metrics,(1) path length and (2) error size, for complementary reasons. In our task, participants navigated using four discrete keys corresponding to the cardinal directions (north, south, east, and west). This design inherently induces a 4-fold bias in path directions, as described in the “Behavioral performance” subsection of the Methods. To minimize this artifact, we computed the objectively optimal path length and used it to calibrate participants’ path lengths. However, error size could not be corrected in the same manner and retained a residual 4-fold tendency (see Figure S9d).
Given that both path length and error size are behaviorally relevant and capture distinct aspects of task performance, we decided to retain both measures when quantifying behavioral periodicity. This clarification has been incorporated into the “Behavioral performance” subsection of the Methods, and the 2<sup>nd</sup> paragraph of the “3-fold periodicity in human behavior” subsection of the Results.
(18) Phase locking between behavioral performance and hippocampal activity: What is your way of creating surrogates here?
We thank the reviewer for this helpful question. Surrogate datasets were generated by circularly shifting the signal series along the direction axis across all possible offsets (following Canolty et al., 2006). This procedure preserves the internal phase structure within each domain while disrupting consistent phase alignment, thereby removing any systematic coupling between the two signals. Each surrogate dataset underwent identical filtering and coherence computation to generate a null distribution, and the observed coherence strength was compared with this distribution using paired t-tests across participants. The statistical analysis section has been systematically revised to incorporate these methodological details.
(19) I could not follow why the authors equate 3-fold symmetry with vectorial representations. This includes statements such as “these empirical findings provide a potential explanation for the formation of vectorial representation observed in the HPC.” Please clarify.
We thank the reviewer for raising this point. Please refer to our response to R2 for Reviewer #1 and the revised Introduction (paragraphs 2–4), where we explicitly explain why the three primary axes of the hexagonal grid cell code can manifest as vector fields.
(20) It was unclear whether the sentence “The EC provides a foundation for the formation of periodic representations in the HPC” is based on the authors’ observations or on other findings. If based on the authors’ findings, this statement seems too strong, given that no other studies have reported periodic representations in the hippocampus to date (to the best of my knowledge).
We thank the reviewer for this comment. We agree that the original wording lacked sufficient rigor. We have extensively revised the 3rd paragraph of the Discussion section with more cautious language by reducing overinterpretation and emphasizing the consistency of our findings with prior empirical evidence, as follows: “The EC–HPC PhaseSync model demonstrates how a vectorial representation may emerge in the HPC from the projections of populations of periodic grid codes in the EC. The model was motivated by two observations. First, the EC intrinsically serves as the major source of hippocampal input (Witter and Amaral, 1991; van Groen et al., 2003; Garcia and Buffalo, 2020), and grid codes exhibit nearly invariant spatial orientations (Hafting et al., 2005; Gardner et al., 2022). Second, mental planning, characterized by “forward replay” (Dragoi and Tonegawa, 2011; Pfeiffer, 2020), has the capacity to activate populations of grid cells that represent sequential experiences in the absence of actual physical movement (Nyberg et al., 2022). We hypothesize that an integrated path code of sequential experiences may eventually be generated in the HPC, providing a vectorial gradient toward the goal location. The path code exhibits regular, vector-like representations when the path direction aligns with the orientations of grid axes, and becomes irregular when they misalign. This explanation is consistent with the band-like representations observed in the dorsomedial EC (Krupic et al., 2012) and the irregular activity fields of trace cells in the HPC (Poulter et al., 2021). ”
Reviewer #1 (Public review):
Summary
The authors propose a transformer-based model for prediction of condition- or tissue-specific alternative splicing and demonstrate its utility in design of RNAs with desired splicing outcomes, which is a novel application. The model is compared to relevant exising approaches (Pangolin and SpliceAI) and the authors clearly demonstrate its advantage. Overall, a compelling method that is well thought out and evaluated.
Strengths:
(1) The model is well thought out: rather than modeling a cassette exon using a single generic deep learning model as has been done e.g. in SpliceAI and related work, the authors propose a modular architecture that focuses on different regions around a potential exon skipping event, which enables the model to learn representations that are specific to those regions. Because each component in the model focuses on a fixed length short sequence segment, the model can learn position-specific features. Furthermore, the architecture of the model is designed to model alternative splicing events, whereas Pangolin and SpliceAI are focused on modeling individual splice junctions, which is an easier problem.
(2) The model is evaluated in a rigorous way - it is compared to the most relevant state-of-the-art models, uses machine learning best practices, and an ablation study demonstrates the contribution of each component of the architecture.
(3) Experimental work supports the computational predictions: Regulatory elements predicted by the model were experimentally verified; novel tissue-specific cassette exons were verified by LSV-seq.
(4) The authors use their model for sequence design to optimize splicing outcome, which is a novel application.
Weaknesses:
None noted.
Reviewer #2 (Public review):
Summary:
The authors present a transformer-based model, TrASPr, for the task of tissue-specific splicing prediction (with experiments primarily focused on the case of cassette exon inclusion) as well as an optimization framework (BOS) for the task of designing RNA sequences for desired splicing outcomes.
For the first task, the main methodological contribution is to train four transformer-based models on the 400bp regions surrounding each splice site, the rationale being that this is where most splicing regulatory information is. In contrast, previous work trained one model on a long genomic region. This new design should help the model capture more easily interactions between splice sites. It should also help in cases of very long introns, which are relatively common in the human genome.
TrASPr's performance is evaluated in comparison to previous models (SpliceAI, Pangolin, and SpliceTransformer) on numerous tasks including splicing predictions on GTEx tissues, ENCODE cell lines, RBP KD data, and mutagenesis data. The scope of these evaluations is ambitious; however, significant details on most of the analyses are missing, making it difficult to evaluate the strength of evidence.
In the second task, the authors combine Latent Space Bayesian Optimization (LSBO) with a Transformer-based variational auto encoder to optimize RNA sequences for a given splicing-related objective function. This method (BOS) appears to be a novel application of LSBO, with promising results on several computational evaluations and the potential to be impactful on sequence design for both splicing-related objectives and other tasks. However, comparison of BOS against existing methods for sequence design is lacking.
Strengths:
- A novel machine learning model for an important problem in RNA biology with excellent prediction accuracy.
- Instead of being based on a generic design as in previous work, the proposed model incorporates biological domain knowledge (that regulatory information is concentrated around splice sites). This way of using inductive bias can be important to future work on other sequence-based prediction tasks.
Weaknesses:
- Most of the analyses presented in the manuscript are described in broad strokes and are often confusing. As a result, it is difficult to assess the significance of the contribution.
- As more and more models are being proposed for splicing prediction (SpliceAI, Pangolin, SpliceTransformer, TrASPr), there is a need for establishing standard benchmarks, similar to those in computer vision (ImageNet). Without such benchmarks, it is exceedingly difficult to compare models.<br /> *This point is now addressed in the revision *<br /> *Moreover, datasets have been made available by the authors on BitBucket. *
- Related to the previous point, as discussed in the manuscript, SpliceAI and Pangolin are not designed to predict PSI of cassette exons. Instead, they assign a "splice site probability" to each nucleotide. Converting this to a PSI prediction is not obvious, and the method chosen by the authors (averaging the two probabilities (?)) is likely not optimal. It would interesting to see what happens if an MLP is used on top of the four predictions (or the outputs of the top layers) from SpliceAI/Pangolin. This could also indicate where the improvement in TrASPr comes from: is it because TrASPr combines information from all four splice sites? Also consider fine-tuning Pangolin on cassette exons only (as you do for your model).<br /> *This point is still not addressed in the revision. *
- L141, "TrASPr can handle cassette exons spanning a wide range of window sizes from 181 to 329,227 bases-thanks to its multi-transformer architecture." This is reported to be one of the primary advantages compared to existing models. Additional analysis should be included on how TrASPr performs across varying exon and intron sizes, with comparison to SpliceAI, etc.
Added after revision: The authors have added additional analyses of performance based on both the length of the exon under consideration and the total length of the surrounding intronic contexts. The result that TrASPr performs well across various context sizes (i.e., the length of the sequence between the upstream and downstream exons, ranging from <1k to >10k) is highly encouraging and supports the claim that most of the sequence-based splicing logic is located proximal to the splice sites. It is also noteworthy that TrASPr performs well for exons longer than 200, suggesting that most of the "regulatory code" is present at the exon boundaries rather than in its center (which TrASPr is blind to).<br /> Additionally, Pearson correlation is used as the sole performance metric in many analyses (e.g., Fig 2 - Supp 2). The authors should consider alternative accuracy metrics, such as RMSE, which better convey the magnitude of prediction error and are more easily comparable across datasets. Pearson correlation may also be more sensitive to outliers on the smaller samples that arise when binning sequences.
- L171, "training it on cassette exons". This seems like an important point: previous models were trained mostly on constitutive exons, whereas here the model is trained specifically on cassette exons. This should be discussed in more detail.<br /> * Our initial comment was incorrect, as pointed out by the authors. *
- L214, ablations of individual features are missing.<br /> * This was addressed in the revision. *
- L230, "ENCODE cell lines", it is not clear why other tissues from GTEx were not included<br /> * This was addressed in the revision. *
- L239, it is surprising that SpliceAI performs so badly, and might suggest a mistake in the analysis. Additional analysis and possible explanations should be provided to support these claims. Similarly for the complete failure of SpliceAI and Pangolin shown in Fig 4d.<br /> * The authors should consider adding SpliceAI/Pangolin predictions for the alternative 5' and 3' splice site selection tasks (and code for related analyses) to the BitBucket repository.*
- BOS seems like a separate contribution that belongs in a separate publication. Instead, consider providing more details on TrASPr.
*Minor comment added after revision: regarding the author response that "A completely independent evaluation would have required a high-throughput experimental system to assess designs, which is beyond the scope of the current paper.":<br /> It's not clear why BOS cannot be evaluated as a separate contribution by instead using different "teacher" models instead of TrASPr. Additionally, BOS lacks evaluation against existing methods for sequence optimization. *
- The authors should consider evaluating BOS using Pangolin or SpliceTransformer as the oracle, in order to measure the contribution to the sequence generation task provided by BOS vs TrASPr.<br /> * See comment above *
Author response:
The following is the authors’ response to the original reviews
A point by point response included below. Before we turn to that we want to note one change that we decided to introduce, related to generalization on unseen tissues/cell types (Figure 3a in the original submission and related question by Reviewer #2 below). This analysis was based on adding a latent “RBP state” representation during learning of condition/tissue specific splicing. The “RBP state” per condition is captured by a dedicated encoder. Our original plan was to have a paper describing a new RBP-AE model we developed in parallel, which also served as the base to capture this “RBP State”. However, we got delayed in getting this second paper finalized (it was led by other lab members, some of whom have already left the lab). This delay affected the TrASPr manuscript as TrASPr’s code should be available and analysis reproducible upon publication. After much deliberation, we decided that in order to comply with reproducibility standards while not self scooping the RBP-AE paper, we eventually decided to take out the RBP-AE and replace it with a vanilla PCA based embedding for the “RBP-State”. The PCA approach is simpler and reproducible, based on linear transformation of the RBPs expression vector into a lower dimension. The qualitative results included in Figure 3a still hold, and we also produced the new results suggested by Reviewer #2 in other GTEX tissues with this PCA based embedding (below).
We don’t believe the switch to PCA based embedding should have any bearing on the current manuscript evaluation but wanted to take this opportunity to explain the reasoning behind this additional change.
Public Reviews:
Reviewer #1 (Public review):
Summary:
The authors propose a transformer-based model for the prediction of condition - or tissue-specific alternative splicing and demonstrate its utility in the design of RNAs with desired splicing outcomes, which is a novel application. The model is compared to relevant existing approaches (Pangolin and SpliceAI) and the authors clearly demonstrate its advantage. Overall, a compelling method that is well thought out and evaluated.
Strengths:
(1) The model is well thought out: rather than modeling a cassette exon using a single generic deep learning model as has been done e.g. in SpliceAI and related work, the authors propose a modular architecture that focuses on different regions around a potential exon skipping event, which enables the model to learn representations that are specific to those regions. Because each component in the model focuses on a fixed length short sequence segment, the model can learn position-specific features. Another difference compared to Pangolin and SpliceAI which are focused on modeling individual splice junctions is the focus on modeling a complete alternative splicing event.
(2) The model is evaluated in a rigorous way - it is compared to the most relevant state-of-the-art models, uses machine learning best practices, and an ablation study demonstrates the contribution of each component of the architecture.
(3) Experimental work supports the computational predictions.
(4) The authors use their model for sequence design to optimize splicing outcomes, which is a novel application.
We wholeheartedly thank Reviewer #1 for these positive comments regarding the modeling approach we took to this task and the evaluations we performed. We have put a lot of work and thought into this and it is gratifying to see the results of that work acknowledged like this.
Weaknesses:
No weaknesses were identified by this reviewer, but I have the following comments:
(1) I would be curious to see evidence that the model is learning position-specific representations.
This is an excellent suggestion to further assess what the model is learning. To get a better sense of the position-specific representation we performed the following analyses:
(1) Switching the transformers relative order: All transformers are pretrained on 3’ and 5’ splice site regions before fine-tunning for the PSI and dPSI prediction task. We hypothesized that if relative position is important, switching the order of the transformers would make a large difference on prediction accuracy. Indeed if we switch the 3’ and 5’ we see as expected a severe drop in performance, with Pearson correlation on test data dropping from 0.82 to 0.11. Next, we switched the two 5’ and 3’ transformers, observing a drop to 0.65 and 0.78 respectively. When focusing only on changing events the drop was from 0.66 to 0.54 (for 3’ SS transformers), 0.48 (for 5’ SS transformers), and 0.13 (when the 3’ and 5’ transformers flanking the alternative exon were switched).
(2) Position specific effect of RBPs: We wanted to test whether the model is able to learn position specific effects for RBPs. For this we focused on two RBPs, FOX (a family of three highly related RBPs), and QKI, both have a relatively well defined motif, known condition and position specific effect identified via RBP KD experiments combined with CLIP experiments (e.g. PMID: 23525800, PMID: 24637117, PMID: 32728246). For each, we randomly selected 40 highly and 40 lowly included cassette exons sequences. We then ran in-silico mutagenesis experiments where we replaced small windows of sequences with the RBP motifs (80 for RBFOX and 80 for QKI), then compared TrASPR’s predictions for the average predictions for 5 random sequences inserted in the same location. The results of this are now shown in Figure 4 Supp 3, where the y-axis represents the dPSI effect per position (x-axis), and the color represents the percentile of observed effects over inserting motifs in that position across all 80 sequences tested. We see that both RBPs have strong positional preferences for exerting a strong effect on the alternative exon. We also see differences between binding upstream and downstream of the alternative exon. These results, learned by the model from natural tissue-specific variations, recapitulate nicely the results derived from high-throughput experimental assays. However, we also note that effects were highly sequence specific. For example, RBFOX is generally expected to increase inclusion when binding downstream of the alternative exon and decrease inclusion when binding upstream. While we do observe such a trend we also see cases where the opposite effects are observed. These sequence specific effects have been reported in the literature but may also represent cases where the model errs in the effect’s direction. We discuss these new results in the revised text.
(3) Assessing BOS sequence edits to achieve tissue-specific splicing: Here we decided to test whether BOS edits in intronic regions (at least 8b away from the nearest splice site) are important for the tissue-specific effect. The results are now included in Figure 6 Supp 1, clearly demonstrating that most of the neuronal specific changes achieved by BOS were based on changing the introns, with a strong effect observed for both up and downstream intron edits.
(2) The transformer encoders in TrASPr model sequences with a rather limited sequence size of 200 bp; therefore, for long introns, the model will not have good coverage of the intronic sequence. This is not expected to be an issue for exons.
The reviewer is raising a good question here. On one hand, one may hypothesize that, as the reviewer seems to suggest, TrASPr may not do well on long introns as it lacks the full intronic sequence.
Conversely, one may also hypothesize that for long introns, where the flanking exons are outside the window of SpliceAI/Pangolin, TrASPr may have an advantage.
Given this good question and a related one by Reviewer #2, we divided prediction accuracy by intron length and the alternative exon length.
For short exons (<100bp) we find TrASPr and Pangolin perform similarly, but for longer exons, especially those > 200, TrASPr results are better. When dividing samples by the total length of the upstream and downstream intron, we find TrASPr outperform all other models for introns of combined length up to 6K, but Pangolin gets better results when the combined intron length is over 10K. This latter result is interesting as it means that contrary to the second hypothesis laid out above, Pangolin’s performance did not degrade for events where the flanking exons were outside its field of view. We note that all of the above holds whether we assess all events or just cases of tissue specific changes. It is interesting to think about the mechanistic causes for this. For example, it is possible that cassette exons involving very long introns evoke a different splicing mechanism where the flanking exons are not as critical and/or there is more signal in the introns which is missed by TrASPr. We include these new results now as Figure 2 - Supp 1,2 and discuss these in the main text.
(3) In the context of sequence design, creating a desired tissue- or condition-specific effect would likely require disrupting or creating motifs for splicing regulatory proteins. In your experiments for neuronal-specific Daam1 exon 16, have you seen evidence for that? Most of the edits are close to splice junctions, but a few are further away.
That is another good question. Regarding Daam1 exon 16, in the original paper describing the mutation locations some motif similarities were noted to PTB (CU) and CUG/Mbnl-like elements (Barash et al Nature 2010). In order to explore this question beyond this specific case we assessed the importance of intronic edits by BOS to achieve a tissue specific splicing profile - see above.
(4) For sequence design, of tissue- or condition-specific effect in neuronal-specific Daam1 exon 16 the upstream exonic splice junction had the most sequence edits. Is that a general observation? How about the relative importance of the four transformer regions in TrASPr prediction performance?
This is another excellent question. Please see new experiments described above for RBP positional effect and BOS edits in intronic regions which attempt to give at least partial answers to these questions. We believe a much more systematic analysis can be done to explore these questions but such evaluation is beyond the scope of this work.
(5) The idea of lightweight transformer models is compelling, and is widely applicable. It has been used elsewhere. One paper that came to mind in the protein realm:
Singh, Rohit, et al. "Learning the language of antibody hypervariability." Proceedings of the National Academy of Sciences 122.1 (2025): e2418918121.
We definitely do not make any claim this approach of using lighter, dedicated models instead of a large ‘foundation’ model has not been taken before. We believe Rohit et al mentioned above represents a somewhat different approach, where their model (AbMAP) fine-tunes large general protein foundational models (PLM) for antibody-sequence inputs by supervising on antibody structure and binding specificity examples. We added a description of this modeling approach citing the above work and another one which specifically handles RNA splicing (intron retention, PMID: 39792954).
Reviewer #2 (Public review):
Summary:
The authors present a transformer-based model, TrASPr, for the task of tissue-specific splicing prediction (with experiments primarily focused on the case of cassette exon inclusion) as well as an optimization framework (BOS) for the task of designing RNA sequences for desired splicing outcomes.
For the first task, the main methodological contribution is to train four transformer-based models on the 400bp regions surrounding each splice site, the rationale being that this is where most splicing regulatory information is. In contrast, previous work trained one model on a long genomic region. This new design should help the model capture more easily interactions between splice sites. It should also help in cases of very long introns, which are relatively common in the human genome.
TrASPr's performance is evaluated in comparison to previous models (SpliceAI, Pangolin, and SpliceTransformer) on numerous tasks including splicing predictions on GTEx tissues, ENCODE cell lines, RBP KD data, and mutagenesis data. The scope of these evaluations is ambitious; however, significant details on most of the analyses are missing, making it difficult to evaluate the strength of the evidence. Additionally, state-of-the-art models (SpliceAI and Pangolin) are reported to perform extremely poorly in some tasks, which is surprising in light of previous reports of their overall good prediction accuracy; the reasoning for this lack of performance compared to TrASPr is not explored.
In the second task, the authors combine Latent Space Bayesian Optimization (LSBO) with a Transformer-based variational autoencoder to optimize RNA sequences for a given splicing-related objective function. This method (BOS) appears to be a novel application of LSBO, with promising results on several computational evaluations and the potential to be impactful on sequence design for both splicing-related objectives and other tasks.
We thank Reviewer #2 for this detailed summary and positive view of our work. It seems the main issue raised in this summary regards the evaluations: The reviewer finds details of the evaluations missing and the fact that SpliceAI and Pangolin perform poorly on some of the tasks to be surprising. We made a concise effort to include the required details, including code and data tables. In short, some of the concerns were addressed by adding additional evaluations, some by clarifying missing details, and some by better explaining where Pangolin and SpliceAI may excel vs. settings where these may not do as well. More details are given below.
Strengths:
(1) A novel machine learning model for an important problem in RNA biology with excellent prediction accuracy.
(2) Instead of being based on a generic design as in previous work, the proposed model incorporates biological domain knowledge (that regulatory information is concentrated around splice sites). This way of using inductive bias can be important to future work on other sequence-based prediction tasks.
Weaknesses:
(1) Most of the analyses presented in the manuscript are described in broad strokes and are often confusing. As a result, it is difficult to assess the significance of the contribution.
We made an effort to make the tasks be specific and detailed, including making the code and data of those available. We believe this helped improve clarity in the revised version.
(2) As more and more models are being proposed for splicing prediction (SpliceAI, Pangolin, SpliceTransformer, TrASPr), there is a need for establishing standard benchmarks, similar to those in computer vision (ImageNet). Without such benchmarks, it is exceedingly difficult to compare models. For instance, Pangolin was apparently trained on a different dataset (Cardoso-Moreira et al. 2019), and using a different processing pipeline (based on SpliSER) than the ones used in this submission. As a result, the inferior performance of Pangolin reported here could potentially be due to subtle distribution shifts. The authors should add a discussion of the differences in the training set, and whether they affect your comparisons (e.g., in Figure 2). They should also consider adding a table summarizing the various datasets used in their previous work for training and testing. Publishing their training and testing datasets in an easy-to-use format would be a fantastic contribution to the community, establishing a common benchmark to be used by others.
There are several good points to unpack here. Starting from the last one, we very much agree that a standard benchmark will be useful to include. For tissue specific splicing quantification we used the GTEx dataset from which we select six representative human tissues (heart, cerebellum, lung, liver, spleen, and EBV-transformed lymphocytes). In total, we collected 38394 cassette exon events quantified across 15 samples (here a ‘sample’ is a cassette exon quantified in two tissues) from the GTEx dataset with high-confidence quantification for their PSIs based on MAJIQ. A detailed description of how this data was derived is now included in the Methods section, and the data itself is made available via the bitbucket repository with the code.
Next, regarding the usage of different data and distribution shifts for Pangolin: The reviewer is right to note there are many differences between how Pangolin and TrASPr were trained. This makes it hard to determine whether the improvements we saw are not just a result of different training data/labels. To address this issue, we first tried to finetune the pre-trained Pangolin with MAJIQ’s PSI dataset: we use the subset of the GTEx dataset described above, focusing on the three tissues analyzed in Pangolin’s paper—heart, cerebellum, and liver—for a fair comparison. In total, we obtained 17,218 events, and we followed the same training and test split as reported in the Pangolin paper. We got Pearson: 0.78 Spearman: 0.68 which are values similar to what we got without this extra fine tuning. Next, we retrained Pangolin from scratch, with the full tissues and training set used for TrASPr, which was derived from MAJIQ’s quantifications. Since our model only trained on human data with 6 tissues at the same time, we modified Pangolin from original 4 splice site usage outputs to 6 PSI outputs. We tried to take the sequence centered with the first or the second splice site of the mid exon. This test resulted in low performance (3’ SS: pearson 0.21 5’ SS: 0.26.).
The above tests are obviously not exhaustive but their results suggest that the differences we observe are unlikely to be driven by distribution shifts. Notably, the original Pangolin was trained on much more data (four species, four tissues each, and sliding windows across the entire genome). This training seems to be important for performance while the fact we switched from Pangolin’s splice site usage to MAJIQ’s PSI was not a major contributor. Other potential reasons for the improvements we observed include the architecture, target function, and side information (see below) but a complete delineation of those is beyond the scope of this work.
(3) Related to the previous point, as discussed in the manuscript, SpliceAI, and Pangolin are not designed to predict PSI of cassette exons. Instead, they assign a "splice site probability" to each nucleotide. Converting this to a PSI prediction is not obvious, and the method chosen by the authors (averaging the two probabilities (?)) is likely not optimal. It would be interesting to see what happens if an MLP is used on top of the four predictions (or the outputs of the top layers) from SpliceAI/Pangolin. This could also indicate where the improvement in TrASPr comes from: is it because TrASPr combines information from all four splice sites? Also, consider fine-tuning Pangolin on cassette exons only (as you do for your model).
Please see the above response. We did not investigate more sophisticated models that adjust Pangolin’s architecture further as such modifications constitute new models which are beyond the scope of this work.
(4) L141, "TrASPr can handle cassette exons spanning a wide range of window sizes from 181 to 329,227 bases - thanks to its multi-transformer architecture." This is reported to be one of the primary advantages compared to existing models. Additional analysis should be included on how TrASPr performs across varying exon and intron sizes, with comparison to SpliceAI, etc.
This was a good suggestion, related to another comment made by Reviewer #1. Please see above our response to them with a breakdown by exon/intron length.
(5) L171, "training it on cassette exons". This seems like an important point: previous models were trained mostly on constitutive exons, whereas here the model is trained specifically on cassette exons. This should be discussed in more detail.
Previous models were not trained exclusively on constitutive exons and Pangolin specifically was trained with their version of junction usage across tissues. That said, the reviewer’s point is valid (and similar to ones made above) about a need to have a matched training/testing and potential distribution shifts. Please see response and evaluations described above.
(6) L214, ablations of individual features are missing.
These were now added to the table which we moved to the main text (see table also below).
(7) L230, "ENCODE cell lines", it is not clear why other tissues from GTEx were not included.
Good question. The task here was to assess predictions in unseen conditions, hence we opted to test on completely different data of human cell lines rather than additional tissue samples. Following the reviewers suggestion we also evaluated predictions on two additional GTEx tissues, Cortex and Adrenal Gland. These new results, as well as the previous ones for ENCODE, were updated to use the PCA based embedding of “RBP-State” as described above. We also compared the predictions using the PCA based embedding of the “RBP-State” to training directly on data (not the test data of course) from these tissues. See updated Figure 3a,b. Figure 3 Supp 1,2.
(8) L239, it is surprising that SpliceAI performs so badly, and might suggest a mistake in the analysis. Additional analysis and possible explanations should be provided to support these claims. Similarly, the complete failure of SpliceAI and Pangolin is shown in Figure 4d.
Line 239 refers to predicting relative inclusion levels between competing 3’ and 5’ splice sites. We admit we too expected this to be better for SpliceAI and Pangolin but we were not able to find bugs in our analysis (which is all made available for readers and reviewers alike). Regarding this expectation to perform better, first we note that we are not aware of a similar assessment being done for either of those algorithms (i.e. relative inclusion for 3’ and 5’ alternative splice site events). Instead, our initial expectation, and likely the reviewer’s as well, was based on their detection of splice site strengthening/weakening due to mutations, including cryptic splice site activation. More generally though, it is worth noting in this context that given how SpliceAI, Pangolin and other algorithms have been presented in papers/media/scientific discussions, we believe there is a potential misperception regarding tasks that SpliceAI and Pangolin excel at vs other tasks where they should not necessarily be expected to excel. Both algorithms focus on cryptic splice site creation/disruption. This has been the focus of those papers and subsequent applications. While Pangolin added tissue specificity to SpliceAI training, the authors themselves admit “...predicting differential splicing across tissues from sequence alone is possible but remains a considerable challenge and requires further investigation”. The actual performance on this task is not included in Pangolin’s main text, but we refer Reviewer #2 to supplementary figure S4 in the Pangolin manuscript to get a sense of Pangolin’s reported performance on this task. Similar to that, Figure 4d in our manuscript is for predicting ‘tissue specific’ regulators. We do not think it is surprising that SpliceAI (tissue agnostic) and Pangolin (slight improvement compared to SpliceAI in tissue specific predictions) do not perform well on this task. Similarly, we do not find the results in Figure 4C surprising either. These are for mutations that slightly alter inclusion level of an exon, not something SpliceAI was trained on - SpiceAI was trained on genomic splice sites with yes/no labels across the genome. As noted elsewhere in our response, re-training Pangolin on this mutagenesis dataset results in performance much closer to that of TrASPr. That is to be expected as well - Pangolin is constructed to capture changes in PSI (or splice site usage as defined by the authors), those changes are not even tissue specific for the CD19 data and the model has no problem/lack of capacity to generalize from the training set just like TrASPr does. In fact, if you only use combinations of known mutations seen during training a simple regression model gives correlation of ~92-95% (Cortés-López et al 2022). In summary, we believe that better understanding of what one can realistically expect from models such as SpliceAI, Pangolin, and TrASPr will go a long way to have them better understood and used effectively. We have tried to make this more clear in the revision.
(9) BOS seems like a separate contribution that belongs in a separate publication. Instead, consider providing more details on TrASPr.
We thank the reviewer for the suggestion. We agree those are two distinct contributions/algorithms and we indeed considered having them as two separate papers. However, there is strong coupling between the design algorithm (BOS) and the predictor that enables it (TrASPr). This coupling is both conceptual (TrASPr as a “teacher”) and practical in terms of evaluations. While we use experimental data (experiments done involving Daam1 exon 16, CD19 exon 2) we still rely heavily on evaluations by TrASPr itself. A completely independent evaluation would have required a high-throughput experimental system to assess designs, which is beyond the scope of the current paper. For those reasons we eventually decided to make it into what we hope is a more compelling combined story about generative models for prediction and design of RNA splicing.
(10) The authors should consider evaluating BOS using Pangolin or SpliceTransformer as the oracle, in order to measure the contribution to the sequence generation task provided by BOS vs TrASPr.
We can definitely see the logic behind trying BOS with different predictors. That said, as we note above most of BOS evaluations are based on the “teacher”. As such, it is unclear what value replacing the teacher would bring. We also note that given this limitation we focus mostly on evaluations in comparison to existing approaches (genetic algorithm or random mutations as a strawman).
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
Additional comments:
(1) Is your model picking up transcription factor binding sites in addition to RBPs? TFs have been recently shown to have a role in splicing regulation:
Daoud, Ahmed, and Asa Ben-Hur. "The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models." PLOS Computational Biology 21.1 (2025): e1012755.
We agree this is an interesting point to explore, especially given the series of works from the Ben-Hur’s group. We note though that these works focus on intron retention (IR) which we haven’t focused on here, and we only cover short intronic regions flanking the exons. We leave this as a future direction as we believe the scope of this paper is already quite extensive.
(2) SpliceNouveau is a recently published algorithm for the splicing design problem:
Wilkins, Oscar G., et al. "Creation of de novo cryptic splicing for ALS and FTD precision medicine." Science 386.6717 (2024): 61-69.
Thank you for pointing out Wilkins et al recent publication, we now refer to it as well.
(3) Please discuss the relationship between your model and this deep learning model. You will also need to change the following sentence: "Since the splicing sequence design task is novel, there are no prior implementations to reference."
We revised this statement and now refer to several recent publications that propose similar design tasks.
(4) I would suggest adding a histogram of PSI values - they appear to be mostly close to 1 or 0.
PSI values are indeed typically close to either 0 or 1. This is a known phenomenon illustrated in previous studies of splicing (e.g. Shen et al NAR 2012 ). We are not sure what is meant by the comment to add a histogram but we made sure to point this out in the main text:
“...Still, those statistics are dominated by extreme values, such that 33.2\% are smaller than 0.15 and 56.0\% are higher than 0.85. Furthermore, most cassette exons do not change between a given tissue pair (only 14.0\% of the samples in the dataset, \ie a cassette exon measured across two tissues, exhibit ΔΨ| ≥ 0.15).”
(5) Part of the improvement of TrASPr over Pangolin could be the result of a more extensive dataset.
Please see above responses and new analysis.
(6) In the discussion of the roles of alternative splicing, protein diversity is mentioned, but I suggest you also mention the importance of alternative splicing as a regulatory mechanism:
Lewis, Benjamin P., Richard E. Green, and Steven E. Brenner. "Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans." Proceedings of the National Academy of Sciences 100.1 (2003): 189-192.
Thank you for the suggestion. We added that point and citation.
(7) Line 96: You use dPSI without defining it (although quite clear that it should be Delta PSI).
Fixed.
(8) Pretrained transformers: Have you trained separate transformers on acceptor and donor sites, or a single splice junction transformer?
Single splice junction pre-training.
(9) "TrASPr measures the probability that the splice site in the center of Se is included in some tissue" - that's not my understanding of what TrASPr is designed to do.
We revised the above sentence to make it more precise: “Given a genomic sequence context S<sub>e</sub> = (s<sub>e</sub>,...,s<sub>e</sub>), made of a cassette exon e and flanking intronic/exonic regions, TrASPr predicts for tissue c the fraction of transcripts where exon e is included or skipped over, ΔΨ-<sub>e,c,c’</sub>.”
(10) Please include the version of the human genome annotations that you used.
We used GENCODE v40 human genome hg38- this is now included in the Data section.
(11) I did not see a description of the RBP-AE component in the methods section. A bit more detail on the model would be useful as well.
Please see above details about replacing RBP-AE with a simpler linear PCA “RBP-State” encoding. We added details about how the PCA was performed to the Methods section.
(12) Typos, grammar:
- Fix the following sentence: ATP13A2, a lysosomal transmembrane cation transporter, linked to an early-onset form of Parkinson's Disease (PD) when 306 loss-of-function mutations disrupt its function.
Sentence was fixed to now read: “The first example is of a brain cerebellum-specific cassette exon skipping event predicted by TrASPr in the ATP13A2 gene (aka PARK9). ATP13A2 is a lysosomal transmembrane cation transporter, for which loss of function mutation has been linked to early-onset of Parkinson’s Disease (PD)”.
- Line 501: "was set to 4e−4"(the - is a superscript).
Fixed
- A couple of citations are missing in lines 580 and 581.
Thank you for catching this error. Citations in line 580, 581 were fixed.
(13) Paper title: Generative modeling for RNA splicing predictions and design - it would read better as "Generative modeling for RNA splicing prediction and design", as you are solving the problems of splicing prediction and splicing design.
Thank you for the suggestion. We updated the title and removed the plural form.
Reviewer #2 (Recommendations for the authors):
(1) Appendices are not very common in biology journals. It is also not clear what purpose the appendix serves exactly - it seems to repeat some of the things said earlier. Consider merging it into the methods or the main text.
We merged the appendices into the Methods section and removed redundancy.
(2) L112, "For instance, the model could be tasked with designing a new version of the cassette exon, restricted to no more than N edit locations and M total base changes." How are N and M different? Is there a difference between an edit location and a base change?
Yes, N is the number of locations (one can think of it as a start position) of various lengths (e.g. a SNP is of length 1) and the total number of positions edited is M. The text now reads “For instance, the model could be tasked with designing a new version of the cassette exon, restricted to no more than $N$ edit locations (\ie start position of one or more consecutive bases) and $M$ total base changes.”
(3) L122: "DEN was developed for a distinct problem". What prevents one from adapting DEN to your sequence design task? The method should be generic. I do not see what "differs substantially" means here. (Finally, wasn't DEN developed for the task you later refer to as "alternative splice site" (as opposed to "splice site selection")? Use consistent terminology. And in L236 you use "splice site variation" - is that also the same?).
Indeed, our original description was not clear/precise enough. DEN was designed and trained for two tasks: APA, and 5’ alternative splice site usage. The terms “selection”, “usage”, and “variation” were indeed used interchangeably in different locations and the reviewer was right, noting the lack of precision. We have now revised the text to make sure the term “relative usage” is used.
Nonetheless, we hold DEN was indeed defined for different tasks. See figures from Figure 2A, 6A of Linder et al 2020 (the reference was also incorrect as we cited the preprint and not the final paper):
In both cases DEN is trying to optimize a short region for selecting an alternative PA site (left) or a 5’ splice site (right). This work focused on an MPRA dataset of short synthetic sequences inserted in the designated region for train/test. We hold this is indeed a different type of data and task then the one we focus on here. Yes, one can potentially adopt DEN for our task, but this is beyond the scope of this paper. Finally, we note that a more closely related algorithm recently proposed is Ledidi (Schreiber et al 2025) which was posted as a pre-print. Similar to BOS, Ledidi tries to optimize a given sequence and adopt it with a few edits for a given task. Regardless, we updated the main text to make the differences between DEN and the task we defined here for BOS more clear, and we also added a reference to Ledidi and other recent works in the discussion section.
(4) L203, exons with DeltaPSI very close to 0.15 are going to be nearly impossible to classify (or even impossible, considering that the DeltaPSI measurements are not perfect). Consider removing such exons to make the task more feasible.
Yes, this is how it was done. As described in more details below, we defined changing samples as ones where the change was >= 0.15 and non-changing as ones where the change in PSI was < 0.05 to avoid ambiguous cases affecting the classification task.
(5) L230, RBP-AE is not explained in sufficient detail (and does not appear in the methods, apparently). It is not clear how exactly it is trained on each new cellular condition.
Please see response in the opening of this document and Q11 from
Reviewer 1
(6) L230, "significantly improving": the r value actually got worse; it is therefore not clear you can claim any significant improvement. Please mention that fact in the text.
This is a fair point. We note that we view the “a” statistic as potentially more interesting/relevant here as the Pearson “r” is dominated by points being generally close to 0/1. Regardless, revisiting this we realized one can also make a point that the term “significant” is imprecise/misplaced since there is no statistical test done here (side note: given the amount of points, a simple null of same distribution yes/no would pass significance but we don’t think this is an interesting/relevant test here). Also, we note that with the transition to PCA instead of RBP-AE we actually get improvements in both a and r values, both for the ENCODE samples shown in Figure 3a and the two new GTEX tissues we tested (see above). We now changed the text to simply state:
“...As shown in Figure 3a, this latent space representation allows TrSAPr to generalize from the six GTEX tissues to unseen conditions, including unseen GTEX tissues (top row), and ENCODE cell lines (bottom row). It improves prediction accuracy compared to TrASPr lacking PCA (eg a=88.5% vs a=82.3% for ENCODE cell lines), though naturally training on the additional GTEX and ENCODE conditions can lead to better performance (eg a=91.7%, for ENCODE, Figure 3a left column).”
(7) L233, "Notably, previous splicing codes focused solely on cassette exons", Rosenberg et al. focused solely on alternative splice site choice.
Right - we removed that sentence..
(8) L236, "trained TrASPr on datasets for 3' and 5' splice site variations". Please provide more details on this task. What is the input to TrASPr and what is the prediction target (splice site usage, PSI of alternative isoforms)? What datasets are used for this task?
The data for this data was the same GTEx tissue data processed, just for alternative 3’ and 5’ splice sites events. We revised the description of this task in the main task and added information in the Methods section. The data is also included in the repo.
(9) L243, "directly from genomic sequences", and conservation?
Yes, we changed the sentence to read “...directly from genomic sequences combined with related features”
(10) L262, what is the threshold for significant splicing changes?
The threshold is 0.15 We updated the main text to read the following:
The total number of mutations hitting each of the 1198 genomic positions across the 6106 sequences is shown in \FIG{mut_effect}b (left), while the distribution of effects ($|\Delta \Psi|$) observed across those 6106 samples is shown in \FIG{mut_effect}b (right). To this data we applied three testing schemes. The first is a standard 5-fold CV where 20\% of combinations of point mutations were hidden in every fold while the second test involved 'unseen mutation' (UM) where we hide any sample that includes mutations in specific positions for a total of 1480 test samples. As illustrated by the CDF in \FIG{mut_effect}b, most samples (each sample may involve multiple positions mutated) do not involve significant splicing changes. Thus, we also performed a third test using only the 883 samples were mutations cause significant changes ($|\Delta \Psi|\geq 0.15 $).
(11) L266, Pangolin performance is only provided for one of the settings (and it is not clear which). Please provide details of its performance in all settings.
The description was indeed not clear. Pangolin’s performance was similar to SpliceAI as mentioned above but retraining it on the CD19 data yielded much closer performance to TrASPr. We include all the matching tests for Pangolin after retraining in Figure 4 Supp Figure 1.
(12) Please specify "n=" in all relevant plots.
Fixed.
(13) Figure 3a, "The tissues were first represented as tokens, and new cell line results were predicted based on the average over conditions during training." Please explain this procedure in more detail. What are these tokens and how are they provided to the model? Are the cell line predictions the average of the predictions for the training tissues?
Yes, we compared to simply the average over the predictions for the training tissues for that specific event as baseline to assess improvements (see related work pointing for the need to have similar baselines in DL for genomics in https://pubmed.ncbi.nlm.nih.gov/33213499/). Regarding the tokens - we encode each tissue type as a possible value and feed the two tissues as two tokens to the transformer.
(14) Figure 4b, the total count in the histogram is much greater than 6106. Please explain the dataset you're using in more detail, and what exactly is shown here.
We updated the text to read:
“...we used 6106 sequence samples where each sample may have multiple positions mutated (\ie mutation combinations) in exon 2 of CD19 and its flanking introns and exons (Cortes et al 2022). The total number of mutations hitting each of the 1198 genomic positions across the 6106 sequences is shown in Figure 4b (left).”
(15) Figure 5a, how are the prediction thresholds (TrASPr passed, TrASPr stringent, and TrASPr very stringent) defined?
Passed: dpsi>0.1, Stringent: dpsi>0.15, Very stringent: dpsi>0.2 This is now included in the main text.
(16) L417, please include more detail on the relative size of TrASPr compared to other models (e.g. number of parameters, required compute, etc.).
SpliceAI is a general-purpose splicing predictor with 32-layer deep residual neural network to capture long-range dependencies in genomic sequences. Pangolin is a deep learning model specifically designed for predicting tissue-specific splicing with similar architecture as SpliceAI. The implementation of SpliceAI that can be found here https://huggingface.co/multimolecule/spliceai involves an ensemble of 5 such models for a total of ~3.5M parameters. TrASPr, has 4 BERT transformers (each 6 layers and 12 heads) and MLP a top of those for a total of ~189M parameters. Evo 2, a genomic ‘foundation’ model has 40B parameters, DNABERT has ~86M (a single BERT with 12 layers and 12 heads), and Borzoi has 186M parameters (as stated in https://www.biorxiv.org/content/10.1101/2025.05.26.656171v2). We note that the difference here is not just in model size but also the amount of data used to train the model. We edited the original L417 to reflect that.
(17) L546, please provide more detail on the VAE. What is the dimension of the latent representation?
We added more details in the Methods section like the missing dimension (256) and definitions for P(Z) and P(S).
(18) Consider citing (and possibly comparing BOS to) Ghari et al., NeurIPS 2024 ("GFlowNet Assisted Biological Sequence Editing").
Added.
(19) Appendix Figure 2, and corresponding main text: it is not clear what is shown here. What is dPSI+ and dPSI-? What pairs of tissues are you comparing? Spearman correlation is reported instead of Pearson, which is the primary metric used throughout the text.
The dPSI+ and dPSI- sets were indeed not well defined in the original submission. Moreover, we found our own code lacked consistency due to different tests executed at different times/by different people. We apologize for this lack of consistency and clarity which we worked to remedy in the revised version. To answer the reviewer’s question, given two tissues ($c,c'$), dPSI+ and dPSI- is for correctly classifying the exons that are significantly differentially included or excluded. Specifically, differential included exons are those for which $\Delta \Psi_{e,c1,c2} = \Psi_\Psi_{e,c1} - \Psi_{e,c2} \geq 0.15$, compared to those that are not ($\Delta \Psi_{e,c1,c2} < 0.05). Similarly, dPSI- is for correctly classifying the exons that are significantly differentially excluded in the first tissue or included in the second tissue ($\Delta \Psi_{e,c1,c2} = \Psi_\Psi_{e,c1} - \Psi_{e,c2} \leq -0.15$) compared to those that are not ($\Delta \Psi_{e,c1,c2} > -0.05). This means dPSI+ and dPSI- are dependent on the order of c1, c2. In addition, we also define a direction/order agnostic test for changing vs non changing events i.e. $|\Delta \Psi_{e,c1,c2}| \geq 0.15$ vs $|\Delta \Psi_{e,c1,c2}| < 0.05$. These test definitions are consistent with previous publications (e.g. Barash et al Nature 2010, Jha et al 2017) and also answer different biological questions: For example “Exons that go up in brain” and “Exons that go up in Liver” can reflect distinct mechanisms, while changing exons capture a model’s ability to identify regulated exons even if the direction of prediction may be wrong. The updated Appendix Figure 2 is now in the main text as Figure 2d and uses Pearson, while AUPRC and AUROC refer to the changing vs no-changing classification task described above such that we avoid dPSI+ and dPSI- when summarizing in this table over 3 pairs of tissues . Finally, we note that making sure all tests comply with the above definition also resulted in an update to Figure 2b/c labels and values, where TrASPr’s improvements over Pangolin reaches up to 1.8fold in AUPRC compared to 2.4fold in the earlier version. We again apologize for having a lack of clarity and consistent evaluations in the original submission.
(20) Minor typographical comments:
- Some plots could use more polishing (e.g., thicker stroke, bigger font size, consistent style (compare 4a to the other plots)...).
Agreed. While not critical for the science itself we worked to improve figure polishing in the revision to make those more readable and pleasant.
- Consider using 2-dimensional histograms instead of the current kernel density plots, which tend to over-smooth the data and hide potentially important details.
We were not sure what the exact suggestion is here and opted to leave the plots as is.
- L53: dPSI_{e, c, c'} is never formally defined. Is it PSI_{e, c} - PSI_{e, c'} or vice versa?
Definition now included (see above).
- L91: Define/explain "transformer" and provide reference.
We added the explanation and related reference of the transformer in the introduction section and BERT in the method section.
- L94: exons are short. Are you referring here to the flanking introns? Please explain.
We apologize for the lack of clarity. We are referring to a cassette exon alternative splicing event as is commonly defined by the splice junctions involved that is from the 5’ SS of the upstream exon to the 3’ SS of the downstream exon. The text now reads:
“...In contrast, 24% of the cassette exons analyzed in this study span a region between the flanking exons' upstream 3' and downstream 5' splice sites that are larger than 10 kb.”
- L132: It's unclear whether a single, shared transformer or four different transformers (one for each splice site) are being pre-trained. One would at least expect 5' and 3' splice sites to have a different transformer. In Methods, L506, it seems that each transformer is pre-trained separately.
We updated the text to read:
“We then center a dedicated transformer around each of the splice sites of the cassette exon and its upstream and downstream (competing) exons (four separate transformers for four splice sites in total).”
- L471: You explain here that it is unclear what tasks 'foundation' models are good for. Also in L128, you explain that you are not using a 'foundation' model. But then in L492, you describe the BERT model you're using as a foundation model!
Line 492 was simply a poor choice of wording as “foundation” is meant here simply as the “base component”. We changed it accordingly.
- L169, "pre-training ... BERT", explain what exactly this means. Is it using masking? Is it self-supervised learning? How many splice sites do you provide? Also explain more about the BERT architecture and provide references.
We added more details about the BERT architecture and training in the Methods section.
- L186 and later, the values for a and r provided here and in the below do not correspond to what is shown in Figure 2.
Fixed, thank you for noticing this.
- L187,188: What exactly do you mean by "events" and "samples"? Are they the same thing? If so, are they (exon, tissue) pairs? Please use consistent terminology. Moreover, when you say "changing between two conditions": do you take all six tissues whenever there is a 0.15 spread in PSI among them? Or do you take just the smallest PSI tissue and the largest PSI tissue when there is a 0.15 spread between them? Or something else altogether?
Reviewer #2 is yet again correct that the definitions were not precise. A “sample” involves a specific exon skipping “event” measured in two tissues. The text now reads:
“....most cassette exons do not change between a given tissue pair (only 14.0% of the samples in the dataset, i.e., a cassette exon measured across two tissues, exhibit |∆Ψ| ≥ 0.15). Thus, when we repeat this analysis only for samples involving exons that exhibited a change in inclusion (|∆Ψ| ≥ 0.15) between at least two tissues, performance degrades for all three models, but the differences between them become more striking (Figure 2a, right column).”
- Figure 1a, explain the colors in the figure legend. The 3D effect is not needed and is confusing (ditto in panel C).
Color explanation is now added: “exons and introns are shown as blue rectangles and black lines. The blue dashed line indicates the inclusive pattern and the red junction indicates an alternative splicing pattern.”
These are not 3D effects but stacks to indicate multiple events/cases. We agree these are not needed in Fig1a to illustrate types of AS and removed those. However, in Fig1c and matching caption we use the stacks to indicate HT data captures many such LSVs over which ML algorithms can be trained.
- Figure 1b, this cartoon seems unnecessary and gives the wrong impression that this paper explores mechanistic aspects of splicing. The only relevant fact (RBPs serving as splicing factors) can be explained in the text (and is anyway not really shown in this figure).
We removed Figure 1b cartoon.
- Figure 1c, what is being shown by the exon label "8"?
This was meant to convey exon ID, now removed to simplify the figure.
- Figure 1e, left, write "Intron Len" in one line. What features are included under "..."? Based on the text, I did not expect more features.
Also, the arrows emanating from the features do not make sense. Is "Embedding" a layer? I don't think so. Do not show it as a thin stripe. Finally, what are dPSI'+ and dPSI'-? are those separate outputs? are those logits of a classification task?
We agree this description was not good and have updated it in the revised version.
- Figure 1e, the right-hand side should go to a separate figure much later, when you introduce BOS.
We appreciate the suggestion. However, we feel that Figure 1e serves as a visual representation of the entire framework. Just like we opted to not turn this work into two separate papers (though we fully agree it is a valid option that would also increase our publication count), we also prefer to leave this unified visual representation as is.
- Figure 2, does the n=2456 refer to the number of (exons, tissues) pairs? So each exon contributes potentially six times to this plot? Typo "approximately".
The “n” refers to the number of samples which is a cassette event measured in two tissues. The same cassette event may appear in multiple samples if it was confidently quantified in more than two tissues. We updated the caption to reflect this and corrected the typo.
- Figure 2b, typo "differentially included (dPSI+) or excluded" .
Fixed.
- L221, "the DNABERT" => "DNABERT".
Fixed.
- L232, missing percent sign.
-
Fixed.
- L246, "see Appendix Section 2 for details" seems to instead refer to the third section of the appendix.
We do not have this as an Appendix, the reference has been updated.
- Figure 3, bottom panels, PSI should be "splice site usage"?
PSI is correct here - we hope the revised text/definitions make it more clear now.
- Figure 3b: typo: "when applied to alternative alternative 3'".
Fixed.
- p252, "polypyrimidine" (no capitalization).
Fixed.
- Strange capitalization of tissue names (e.g., "Brain-Cerebellum"). The tissue is called "cerebellum" without capitalization.
We used EBV (capital) for the abbreviation and lower case for the rest.
- Figure 4c: "predicted usage" on the left but "predicted PSI" on the right.
Right. We opted to leave it as is since Pangolin and SpliceAI do predict their definition of “usage” and not directly PSI, we just measure correlations to observed PSI as many works have done in the past.
- Figure 4 legend typo: "two three".
Fixed.
- L351, typo: "an (unsupervised)" (and no need to capitalize Transformer).
Fixed.
- L384, "compared to other tissues at least" => "compared to other tissues of at least".
Fixed.
- L549, P(Z) and P(S) are not defined in the text.
Fixed.
- L572, remove "Subsequently". Add missing citations at the end of the paragraph.
Fixed.
- L580-581, citations missing.
Fixed.
- L584-585, typo: "high confidince predictions"
Fixed.
- L659-660, BW-M and B-WM are both used. Typo?
Fixed.
- L895, "calculating the average of these two", not clear; please rewrite.
Fixed.
- L897, "Transformer" and "BERT", do these refer to the same thing? Be consistent.
BOS is a transformer and not a BERT but TrASPr uses the BERT architecture. BERT is a type of transformer as the reviewer is surely well aware so the sentence is correct. Still, to follow the reviewer’s recommendation for consistency/clarity we changed it here to state BERT.
- Appendix Figure 5: The term dPSI appears to be overloaded to also represent the difference between predicted PSI and measured PSI, which is inconsistent with previous definitions.
Indeed! We thank the reviewer again for their sharp eye and attention to details that we missed. We changed Supp Figure 5, now Figure 4 Supplementary Figure 2, to |PSI’-PSI| and defined those as the difference between TrASPr’s predictions (PSI’) and MAJIQ based PSI quantifications.
Joint Public Review:
Summary:
This is an excellent, timely study investigating and characterizing the underlying neural activity that generates the neuroendocrine GnRH and LH surges that are responsible for triggering ovulation. Abundant evidence accumulated over the past 20 years implicated the population of kisspeptin neurons in the hypothalamic RP3V region (also referred to as the POA or AVPV/PeN kisspeptin neurons) as being involved in driving the GnRH surge in response to elevated estradiol (E2), also known as the "estrogen positive feedback". However, while former studies used Cfos coexpression as a marker of RP3V kisspeptin neuron activation at specific times and found this correlates with the timing of the LH surge, detailed examination of the live in vivo activity of these neurons before, during, and after the LH surge remained elusive due to technical challenges.
Here, Zhou and colleagues use fiber photometry to measure the long-term synchronous activity of RP3V kisspeptin neurons across different stages of the mouse estrous cycle, including on proestrus when the LH surge occurs, as well as in a well-established OVX+E2 mouse model of the LH surge.
The authors report that RP3V kisspeptin neuron activity is low on estrous and diestrus, but increases on proestrus several hours before the late afternoon LH surge, mirroring prior reports of rising GnRH neuron activity in proestrus female mice. The measured increase in RP3V kisspeptin activation is long, spanning ~13 hours in proestrus females and extending well beyond the end of the LH secretion, and is shown by the authors to be E2 dependent.
For this work, Kiss-Cre female mice received a Cre-dependent AAV injection, containing GCaMP6, to measure the neuronal activation of RP3V Kiss1 cells. Females exhibited periods of increased neuronal activation on the day of proestrus, beginning several hours prior to the LH surge and lasting for about 12 hours. Though oscillations in the pattern of GCaMP fluorescence were occasionally observed throughout the ovarian cycle, the frequency, duration, and amplitude of these oscillations were significantly higher on the day of proestrus. This increase in RP3V Kiss1 neuronal activation that precedes the increase in LH supports the hypothesis that these neurons are critical in regulating the LH surge. The authors compare this data to new data showing a similar increased activation pattern in GnRH neurons just prior to the LH surge, further supporting the hypothesis that RP3V Kiss1 cell activation causes the release of kisspeptin to stimulate GnRH neurons and produce the LH surge.
Strengths:
This study provides compelling data demonstrating that RP3V kisspeptin neuronal activity changes throughout the ovarian cycle, likely in response to changes in estradiol levels, and that neuronal activation increases on the day of the LH surge.
The observed increase in RP3V kisspeptin neuronal activation precedes the LH surge, which lends support to the hypothesis that these neurons play a role in regulating the estradiol-induced LH surge. Continuing to examine the complexities of the LH surge and the neuronal populations involved, as done in this study, is critical for developing therapeutic treatments for women's reproductive disorders.
This innovative study uses a within-subject design to examine neuronal activation in vivo across multiple hormone milieus, providing a thorough examination of the changes in activation of these neurons. The variability in neuronal activity surrounding the LH surge across ovarian cycles in the same animals is interesting and could not be achieved without this within-subjects design. The inclusion and comparison of ovary-intact females and OVX+E2 females is valuable to help test mechanisms under these two valuable LH surge conditions, and allows for further future studies to tease apart minor differences in the LH surge pattern between these 2 conditions.
This study provides an excellent experimental setup able to monitor the daily activity of preoptic kisspeptin neurons in freely moving female mice. It will be a valuable tool to assess the putative role of these kisspeptin neurons in various aspects of altered female fertility (aging, pathologies...). This approach also offers novel and useful insights into the impact of E2 and circadian cues on the electrical activity of RP3V kisspeptin neurons.
An intriguing cyclical oscillation in kisspeptin neural activity every 90 minutes exists, which may offer critical insight into how the RP3V kisspeptin system operates. Interestingly, there was also variability in the onset and duration of RP3V Kisspeptin neuron activity between and within mice in naturally cycling females. Preoptic kisspeptin neurons show an increased activity around the light/dark transition only on the day of proestrus, and this is associated with an increase in LH secretion. An original finding is the observation that the peak of kisspeptin neuron activation continues a few hours past the peak of LH, and the authors hypothesize that this prolonged activity could drive female sexual behaviors, which usually appear after the LH surge.
The authors demonstrated that ovariectomy resulted in very little neuronal activity in RP3V kisspeptin neurons. When these ovarietomized females were treated with estradiol benzoate (EB) and an LH surge was induced, there was an increase in RP3V kisspeptin neuronal activation, as was seen during proestrus. However, the magnitude of the change in activity was greater during proestrus than during the EB-induced LH surge. Interestingly, the authors noted a consistent peak in activity about 90 minutes prior to lights out on each day of the ovarian cycle and during EB treatment, but not in ovariectomized females. The functional purpose of this consistent neuronal activity at this time remains to be determined.
Though not part of this study, the comparison of neuronal activation of GnRH neurons during the LH surge to the current data was convincing, demonstrating a similar pattern of increased activation that precedes the LH surge.
In summary, the study is well-designed, uses proper controls and analyses, has robust data, and the paper is nicely organized and written. The data from these experiments is compelling, and the authors' claims and conclusions are nicely supported and justified by the data. The data support the hypothesis in the field that these RP3V neurons regulate the LH surge. Overall, these findings are important and novel, and lend valuable insight into the underlying neural mechanisms for neuroendocrine control of ovulation.
Weaknesses:
(1) LH levels were not measured in many mice or in robust temporal detail, such as every 30 or 60 min, to allow a more detailed comparison between the fine-scale timing of RP3V neuron activation with onset and timing of LH surge dynamics.
(2) The authors report that the peak LH value occurred 3.5 hours after the first RP3V kisspeptin neuron oscillation. However, it is likely, and indeed evident from the 2 example LH patterns shown in Figures 3A-B, that LH values start to increase several hours before the peak LH. This earlier rise in LH levels ("onset" of the surge) occurs much closer in time to the first RP3V kisspeptin neuron oscillatory activation, and as such, the ensuing LH secretion may not be as delayed as the authors suggest.
(3) The authors nicely show that there is some variation (~2 hours) in the peak of the first oscillation in proestrus females. Was this same variability present in OVX+E2 females, or was the variability smaller or absent in OVX+E2 versus proestrus? It is possible that the variability in proestrus mice is due to variability in the timing and magnitude of rising E2 levels, which would, in theory, be more tightly controlled and similar among mice in the OVX+E2 model. If so, the OVX+E2 mice may have less variability between mice for the onset of RP3V kisspeptin activity.
(4) One concern regarding this study is the lack of data showing the specificity of the AAV and the GCaMP6s signals. There are no data showing that GCaMP6s is limited to the RP3V and is not expressed in other Kiss1 populations in the brain. Given that 2ul of the AAV was injected, which seems like a lot considering it was close to the ventricle, it is important to show that the signal and measured activity are specific to the RP3V region. Though the authors discuss potential reasons for the low co-expression of GCaMP6 and kisspeptin immunoreactivity, it does raise some concern regarding the interpretation of these results. The low co-expression makes it difficult to confirm the Kiss1 cell-specificity of the Cre-dependent AAV injections. In addition, if GFP (GCaMP6s) and kisspeptin protein co-localization is low, it is possible that the activation of these neurons does not coincide with changes in kisspeptin or that these neurons are even expressing Kiss1 or kisspeptin at the time of activation. It is important to remember that the study measures activation of the kisspeptin neuron, and it does not reveal anything specific about the activity of the kisspeptin protein.
(5) One additional minor concern is that LH levels were not measured in the ovariectomized females during the expected time of the LH surge. The authors suggest that the lower magnitude of activation during the LH surge in these females, in comparison to proestrus females, may be the result of lower LH levels. It's hard to interpret the difference in magnitude of neuronal activation between EB-treated and proestrus females without knowing LH levels. In addition, it's possible that an LH surge did not occur in all EB-treated females, and thus, having LH levels would confirm the success of the EB treatment.
(6) This kisspeptin neuron peak activity is abolished in ovariectomized mice, and estradiol replacement restored this activity, but only partially. Circulating levels of estradiol were not measured in these different setups, but the authors hypothesize that the lack of full restoration may be due to the absence of other ovarian signals, possibly progesterone.
(7) Recordings in several mice show inter- and intra-variability in the time of peak onset. It is not shown whether this variability is associated with a similar variability in the timing of the LH surge onset in the recorded mice. The authors hypothesized that this variability indicates a poor involvement of the circadian input. However, no experiments were done to investigate the role of the (vasopressinergic-driven) circadian input on the kisspeptin neuron activation at the light/dark transition. Thus, we suggest that the authors be more tentative about this hypothesis.
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Summary:
Liu et al. provided evidence of the interaction between endocytosis and VAMP8-mediated endocytic recycling of clathrin-mediated endocytosis (CME) cargo through a knockdown approach combined with total internal reflection fluorescence (TIRF) microscopy, western blotting, and functional assays in a mammalian cell line system. They demonstrated that VAMP8 impairs the initial stages of CME, such as the initiation, stabilization, and invagination of clathrin-coated pits (CCPs). VAMP8 indirectly regulates CME by facilitating endocytic recycling. The depletion of VAMP8 alters endosomal recycling, as shown here by the transferrin receptor, towards lysosomal degradation, thereby inhibiting clathrin-coated vesicle (CCV) formation. Overall, I found this study to be highly engaging because of its elucidation of the unexpected role of R-Snare in influencing the levels of cargo proteins within the context of clathrin-mediated endocytosis (CME). This MS will be helpful for researchers in endocytosis and protein trafficking fields. It appears to me that VAMP8 interacts with multiple targets within the endo-lysosomal pathway, collectively influencing the clathrin-mediated endocytosis (CME). Therefore, the contribution of lysosomes in this context should be evaluated. This matter should be addressed experimentally and discussed in the MS before considering publication.
Major comments:
Minor comments:
VAMP8 is an R-SNARE critical for late endosome/lysosome fusion and regulates exocytosis, especially in immune and secretory cells. It pairs with Q-SNAREs to mediate vesicle fusion, and its dysfunction alters immunity, inflammation, and secretory processes. This study revealed that the SNARE protein VAMP8 influences clathrin-mediated endocytosis (CME) by managing the recycling of endocytic cargo rather than being directly recruited to clathrin-coated vesicles. This study advances our understanding of cellular trafficking mechanisms and underscores the essential role of recycling pathways in maintaining membrane dynamics. This is an excellent piece of work, and the experiments were designed meticulously; however, the mechanism is not convincing enough at this point. This MS will surely benefit the general audience, specifically the membrane and protein trafficking and cell biology community.
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
The authors investigate the role of the SNARE protein VAMP8 in endocytic recycling and clathrin-mediated endocytosis (CME). Using siRNA knockdown, live-cell imaging, and recycling assays, they report that VAMP8 depletion impairs clathrin-coated pit (CCP) initiation, stabilisation, and invagination, thereby inhibiting CME. Furthermore, they suggest that VAMP8 knockdown promotes transferrin receptor (TfR) degradation and slows its recycling. Consistent with previous studies, knockdown of CALM expression inhibits CME, whereas overexpression of wild-type or L219S/M244K mutant CALM rescues CME.
Major concerns:
General assessment: While the study shows that VAMP8 depletion negatively affects CME and TfR trafficking, the manuscript suffers from limited novelty, logical inconsistencies, and experimental shortcomings.
Nobody can see what’s in your vault, except for you
locally and on device encrypted. Means you can't access the same material through something other than Anytype? Iow it's not a viewer but a gatekeeper? Runs foul of [[3 Distributed Eigenschappen 20180703150724]] req.
Reviewer #3 (Public review):
Summary:
This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins in proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.
Strengths:
A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.
Comments on revisions:
The authors have addressed my concerns in the revised manuscript.
Author response:
The following is the authors’ response to the previous reviews
Reviewer 1
The authors should clarify the statement regarding the expression in horizontal cells (lines 170-172). In line 170, it is stated that GFP was observed in horizontal cells. Since GFP is fused to Cx36, the observation of GFP in horizontal cells would suggest the expression of Cx36-GFP.
We believe that there appears to be a misunderstanding. GFP is observed in horizontal cells, because the test AAV construct, which consists of the HKamac promoter and a downstream GFP sequence, was used to validate the promoter specificity in wildtype animals. This was just a test to confirm that HKamac is indeed active in AII amacrine cells as previously described by Khabou et al. 2023. This construct was not used for the large scale BioID screen. For these experiments, V5-dGBP-Turbo was expressed under the control of the HKamac promoter as illustrated in Figure 2A.
Fig 7: the legend is missing the descriptions for panels A-C.
We apologize for this mistake. We have missed the label “(A-C)” and added it to the legend.
Supplemental files are not referenced in the manuscript.
We have added a reference for these files in line 221-226.
Reviewer 2
Supplementary Files 1 and 2 are presented as two replicates of the zebrafish proteomic datasets, but they appear to be identical.
This appears to be a misunderstanding. These two replicates contain slightly different hits, although the most abundant candidates are identical.
Reviewer 3
Thank you for the positive comments
Reviewer #1 (Public review):
Summary:
The authors note that there is a large corpus of research establishing the importance of LC-NE projections to medial prefrontal cortex (mPFC) of rats and mice in attentional set or 'rule' shifting behaviours. However, this is complex behavior and the authors were attempting to gain an understanding of how locus coeruleus modulation of the mPFC contributes to set shifting.
The authors replicated the ED-shift impairment following NE denervation of mPFC by chemogenetic inhibition of the LC. They further showed that LC inhibition changed the way neurons in mPFC responded to the cues, with a greater proportion of individual neurons responsive to 'switching', but the individual neurons also had broader tuning, responding to other aspects of the task (i.e., response choice and response history). The population dynamics was also changed by LC inhibition, with reduced separation of population vectors between early-post-switch trials, when responding was at chance, and later trials when responding was correct. This was what they set out to demonstrate and so one can conclude they achieved their aims.
The authors concluded that LC inhibition disrupted mPFC "encoding capacity for switching" and suggest that this "underlie[s] the behavioral deficits."
Strengths:
The principal strength is combining inactivation of LC with calcium imaging in mPFC. This enabled detailed consideration of the change in behavior (i.e., defining epochs of learning, with an 'early phase' when responding is at chance being compared to a 'later phase' when the behavioral switch has occurred) and how these are reflected in neuronal activity in the mPFC, with and without LC-NE input.
Comments on revised version:
In their response to reviewers, the authors say "We report p values using 2 decimal points and standard language as suggested by this reviewer". However, no changes were made in the manuscript: for example, "P = 4.2e-3" rather than "p = 0.004".
In their response to the reviewers, they wrote: "Upon closer examination of the behavioral data, we exclude several sessions where more trials were taken in IDS than in EDS." If those sessions in which EDSIDS. Most problematic is the fact that the manuscript now reads "Importantly, control mice (pooled from Fig. 1e, 1h, Supp. Fig. 1a, 1b) took more trials to complete EDS than IDS (Trials to criterion: IDS vs. EDS, 10 {plus minus} 1 trials vs. 16 {plus minus} 1 trials, P < 1e-3, Supp. Fig. 1c), further supporting the validity of attentional switching (as in Fig. 1c)" without mentioning that data has been excluded.
Reviewer #3 (Public review):
Summary:
Nigro et al examine how the locus coeruleus (LC) influences the medial prefrontal cortex (mPFC) during attentional shifts required for behavioral flexibility. Specifically, the propose that LC-mPFC inputs enable mice to shift attention effectively from texture to odor cues to optimize behavior. The LC and its noradrenergic projections to the mPFC have previously been implicated in this behavior. The authors further establish this by using chemogenetics to inhibit LC terminals in mPFC and show a selective deficit in extradimensional set shifting behavior. But the study's primary innovation is the simultaneous inhibition of LC while recording multineuron patterns of activity in mPFC. Analysis at the single neuron and population levels revealed broadened tuning properties, less distinct population dynamics, and disrupted predictive encoding when LC is inhibited. These findings add to our understanding of how neuromodulatory inputs shape attentional encoding in mPFC and are an important advance. There are some methodological limitations and/or caveats that should be considered when interpreting the findings, and these are described below.
Strengths:
The naturalistic set-shifting task in freely-moving animals is a major strength and the inclusion of localized suppression of LC-mPFC terminals is builds confidence in the specificity of their behavioral effect. Combining chemogenetic inhibition of LC while simultaneously recording neural activity in mPFC with miniscopes is state-of-the-art. The authors apply analyses to population dynamics in particular that can advance our understanding of how the LC modifies patterns of mPFC neural activity. The authors show that neural encoding at both the single cell level and the population level are disrupted when LC is inhibited. They also show that activity is less able to predict key aspects of the behavior when the influence of LC is disrupted. This is quite interesting and adds to a growing understanding of how neuromodulatory systems sharpen tuning of mPFC activity.
Weaknesses:
Weaknesses are mostly minor, but there are some caveats that should be considered. First, the authors use a DBH-Cre mouse line and provide histological confirmation of overlap between HM4Di expression and TH immunostaining. While this strongly suggests modulation of noradrenergic circuit activity, the results should be interpreted conservatively as there is no independent confirmation that norepinephrine (NE) release is suppressed and these neurons are known to release other neurotransmitters and signaling peptides. In the absence of additional control experiments, it is important to recognize that effects on mPFC activity may or may not be directly due to LC-mPFC NE.
Another caveat is that the imaging analyses are entirely from the extradimensional shift session. Without analyzing activity data from the intradimensional shift (IDS) session, one cannot be certain that the observed changes are to some feature of activity that is specific to extradimensional shifts. Future experiments should examine animals with LC suppression during the IDS as well, which would show whether the observed effects are specific to an extradimensional shift and might explain behavioral effects.
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
We thank the reviewers and editors for this peer review. Following the editorial assessment and specific review comments, in this revision we have included new analysis to support the validity of the behavioral task (Reviewer #2). We have improved data presentation by including 1) data points from individual animals (Reviewer #1, #3), 2) updated histology showing the expression of hM4Di in LC neurons as well as LC terminals in the mPFC (Reviewer #3), and 3) more detailed descriptions of methodology and data analysis (Reviewer #1, #2, #3).
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
(1) Planned t-tests should be performed in both control and experimental animals to determine if the number of trials needed to reach criterion on the ID is lower than on the ED. Based on the data analyses showing no difference among the control group, the data could be pooled to demonstrate that the task is valid. Reporting all p-values using 2 decimal points and standard language e.g., p < 0.001 would greatly improve the readability of the data.
Thank you for this suggestion. As pointed out by this reviewer, more trials to reach performance criterion in EDS than IDS is indicative of successful acquisition and switching of the attentional sets. Upon closer examination of the behavioral data, we exclude several sessions where more trials were taken in IDS than in EDS, and our conclusions that DREADD inhibition of the LC or LC input to the mPFC impaired rule switching in EDS remain robust (e.g., new Fig. 1e, 1h). We also pool control and test data (Fig. 1e, 1h, new Supp. Fig. 1a, 1b) to demonstrate the validity of this task (new Supp. Fig. 1c, IDS vs. EDS in the control group, 10 ± 1 trials vs. 16 ± 1 trials, P < 1e-3). The validity of set shifting is also supported by the new Fig. 1c.
We report p values using 2 decimal points and standard language as suggested by this reviewer.
Relevant to the comments from Reviewer #1 in the public review, we now show individual data points on the bar charts (new Fig. 1e, 1h).
(2) It may also be helpful to provide the average time between CNO infusion and onset of the ED as well as information about when maximal effects are expected after these treatments.
Systemic CNO injections were administered immediately after IDS, and we waited approximately one hour before proceeding to EDS. Maximal effects of systemic CNO activation were reported to occur after 30 minutes and last for at least 4-6 hours. Both control and test groups received the CNO injections in the same manner. This is now better described in Methods.
Reviewer #3 (Recommendations for the authors):
(1) Add better histology images showing colocalization of TH and HM4Di. Quantification of colocalization would be optimal.
We now include better histology images (new Fig. 1d) and have quantified the colocalization of TH and HM4Di in the main text (line 115-116).
(2) If possible, images showing HM4Di expression in mPFC axon terminals would be useful. If these are colocalized with TH immunostaining, that would increase confidence in their identity. This would be much more useful than the images provided in Figure 1C.
We now include new image to show hM4Di expression (mCherry) in LC terminals in the mPFC (new Fig. 1f). However, due to technical limitations (species of the primary antibody), we did not co-stain with TH.
(3) Include behavior of mice from the miniscope experiment in Figure 2 to show they are similar to those from Figure 1.
This is now included in Supp. Fig. 1b.
(4) More details about the processing and segmentation of miniscope data would be helpful (e.g., how many neurons were identified from each animal?).
We use standard preprocessing and segmentation pipelines in Inscopix data processing software (version 1.6), which includes modules for motion correction and signal extraction. Briefly, raw imaging videos underwent preprocessing, including a x4 spatial down sampling to reduce file size and processing time. No temporal down sampling was performed. The images were then cropped to eliminate post-registration borders and areas where cells were not visible. Prior to the calculation of the dF/F0 traces, lateral movement was corrected. For ROI identification, we used a constrained non-negative matrix factorization algorithm optimized for endoscopic data (CNMF-E) to extract fluorescence traces from ROIs. We identified 128 ± 31 neurons after manual selection, depending on recording quality and field of view. Number of neurons acquired from each animal are now included in Methods. This is now further elaborated in Methods (line 405415).
(5) Add more methodological detail for how cell tuning was analyzed, including how z-scoring was performed (across the entire session?), and how neurons in each category were classified.
We have expanded the Methods section to clarify how cell tuning was analyzed (line 419430). Calcium traces were z-scored on a per-neuron basis across the entire session. For each neuron, we computed trial-averaged activity aligned to specific task events (e.g., digging in one of the two ramekins available). A neuron was classified as responsive if its activity showed a significant difference (p < 0.05) between two conditions within the defined time window in the ROC analysis.
(6) For data from Figure 2F it would be very useful to plot data from individual mice in addition to this aggregated representation.
We now include data from individual mice in Supp. Table 1.
(7) I think it would be helpful to move some parts of Figure S1 to the main Figure 1, in particular the table from S1A.
Fig. S1 is now part of the new Fig. 1.
(8) Clarify whether Figure S2 is an independent replication, as implied, or whether the same test data is shown twice in two separate figures (In Figure 1b and Supplementary Figure 2).
The test group in Fig. S2 (new Fig. S1) is the same as the test group in Fig. 1b (new Fig. 1e), but the control group is a separate cohort. This is now clarified in the figure legends.
(9) The authors should add a limitations section to the discussion where they specifically discuss the caveats involved in relating their results specifically to NE. This should include the possible involvement of co-transmitters and off-target expression of Cre in other populations.
Thank you for this comment. Previous pharmacology and lesion studies showed that LC input or NE content in the mPFC was specifically required for EDS-type switching processes (Lapiz, M.D. et al., 2006; Tait, D.S. et al. 2007; McGaughy, J. et al. 2008), in light of which we interpret our mPFC neurophysiological effects with LC inhibition as at least partially mediated by the direct LC-NE input. When discussing the limitations of our study, we now explicitly acknowledge the potential involvement of co-transmitters released by LC neurons (line 253-256).
(10) The authors should provide details about the TH antibody uses for IHC
We now include more details in immunohistochemistry (line 384-388).
(11) Throughout, it would be helpful to include datapoints from individual animals - these are included in some supplementary figures, but are missing in a number of the main plots.
Reviewer #1 made a similar comment, and we now include individual data points in the figures (e.g., Fig. 1e, 1h).
Reviewer #3 (Public review):
Summary:
The authors propose a method for estimating the spatial power spectrum of cortical activity from irregularly sampled data and apply it to iEEG data from human patients during a delayed free recall task. The main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).
Strengths:
A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the three-dimensional brain volume and this poses a methodological problem that the proposed method aims to address.
Weaknesses:
Although the proposed method is evaluated in several indirect ways, a direct evaluation is lacking. This would entail simulating cortical current source density (CSD) with known spatial spectrum and using a realistic iEEG volume-conductor model to generate iEEG signals.
Comments on revised version:
In my original review, I raised the following issue:
"The proposed method of estimating wavelength from irregularly sampled three-dimensional iEEG data involves several steps (phase-extraction, singular value-decomposition, triangle definition, dimension reduction, etc.) and it is not at all clear that the concatenation of all these steps actually yields accurate estimates. Did the authors use more realistic simulations of cortical activity (i.e. on the convoluted cortical sheet) to verify that the method indeed yields accurate estimates of phase spectra?"
And the authors' response was:
"We now included detailed surrogate testing, in which varying combinations of sEEG phase data and veridical surrogate wavelengths are added together. See our reply from the public reviewer comments. We assess that real neurophysiological data (here, sEEG plus surrogate and MEG manipulated in various ways) is a more accurate way to address these issues. In our experience, large scale TWs appear spontaneously in realistic cortical simulations, and we now cite the relevant papers in the manuscript (line 53)."
The point that I wanted to make is not that traveling waves appear in computational models of cortical activity, as the authors seem to think. My point was that the only direct way to evaluate the proposed method for estimating spatial spectra is to use simulated cortical activity with known spatial spectrum. In particular, with "realistic simulations" I refer to the iEEG volume-conductor model that describes the mapping from cortical current source density (CSD) to iEEG signals, and that incorporates the reference electrodes and the particular montage used.
Although in the revised manuscript the authors have provided indirect evidence for the soundness of the proposed estimation method, the lack of a direct evaluation using realistic simulations with ground truth as described above makes that remain sceptical about the soundness of the method.
Author response:
The following is the authors’ response to the original reviews.
eLife Assessment
This study introduces a novel method for estimating spatial spectra from irregularly sampled intracranial EEG data, revealing cortical activity across all spatial frequencies, which supports the global and integrated nature of cortical dynamics. The study showcases important technical innovations and rigorous analyses, including tests to rule out potential confounds; however, the lack of comprehensive theoretical justification and assumptions about phase consistency across time points renders the strength of evidence incomplete. The dominance of low spatial frequencies in cortical phase dynamics continues to be of importance, and further elaboration on the interpretation and justification of the results would strengthen the link between evidence and conclusions.
Public Reviews:
Reviewer #1 (Public review):
Summary:
The paper uses rigorous methods to determine phase dynamics from human cortical stereotactic EEGs. It finds that the power of the phase is higher at the lowest spatial phase.
Strengths:
Rigorous and advanced analysis methods.
Weaknesses:
The novelty and significance of the results are difficult to appreciate from the current version of the paper.
(1) It is very difficult to understand which experiments were analysed, and from where they were taken, reading the abstract. This is a problem both for clarity with regard to the reader and for attribution of merit to the people who collected the data.
We now explicitly state the experiments that were used, lines 715-716.
(2) The finding that the power is higher at the lowest spatial phase seems in tune with a lot of previous studies. The novelty here is unclear and it should be elaborated better.
It is not generally accepted in neuroscience that power is higher at lowest spatial frequencies, and recent research concludes that traveling waves at this scale may be the result of artefactual measurement (Orczyk et al., 2022; Hindriks et al., 2014; Zhigalov & Jensen,2023). The question we answer is therefore timely and a source of controversy to researchers analysing TWs in cortex. While, in our view, the previous literature points in the direction of our conclusions (notably the work of Freeman et. al. 2003; 2000; Barrie et al. 1996), it is not conclusive at the scale we are interested in, specifically >8cm, and certainly not convincing to the proponents of ‘artefactual measurement’.
We have added to a sentence to make this explicit in the abstract, lines 20-22. Please also note previous text at the end of the introduction, lines 140-148 and in the first paragraph of the discussion, lines 563-569.
I could not understand reading the paper the advantage I would have if I used such a technique on my data. I think that this should be clear to every reader.
We have made the core part of the code available on github (line 1154), which should simplify adoption of the technique. We have urged, in the Discussion (lines 653-663), why habitual measurement of SF spectra is desirable, since the same task measured with EEG, sEEG or ECoG does not encompass the same spatial scales, and researchers may be comparing signals with different functional properties. Until reliable methods for estimating SF are available, not dependent on the layout of the recording array, data cannot be analysed to resolve this question. Publication of our results and methods will help this process along.
(3) It seems problematic to trust in a strong conclusion that they show low spatial frequency dynamics of up to 15-20 cm given the sparsity of the arrays. The authors seem to agree with this concern in the last paragraph of page 12.
The new surrogate testing supports our conclusions. The sEEG arrays would not normally be a first choice to estimate SF spectra, for reasons of their sparsity, which may be why such estimates have not been done before. Yet, this is the research challenge that we sought to solve, and a problem for which there was no ready method to hand. Nevertheless, it is a problem that urgently needed to be solved given the current debate on the origin of large-scale TWs. We have now included detailed surrogate testing of real data plus varying strength model waves (Figure 6A and Supplementary Figure 4). We believe this should convince the reader that we are measuring the spatial frequency spectrum with sufficient accuracy to answer the central research question.
They also say that it would be informative to repeat the analyses presented here after the selection of more participants from all available datasets. It begs the question of why this was not done. It should be done if possible.
We have now doubled the number of participants in the main analyses. Since each participant comprises a test of the central hypothesis, now the hypothesis test now has 23 replications (Supplementary Figures 2 and 3). There were four failures to reach significance due to under-powered tests, i.e., not enough contacts. This is sufficient test of the hypothesis and, in our opinion, not the primary obstacle to scientific acceptance of our results. The main obstacle is providing convincing tests that the method is accurate, and this is what we have focussed on. Publication of python code and the detailed methods described here enable any interested researcher to extend our method to other datasets.
(4) Some of the analyses seem not to exploit in full the power of the dataset. Usually, a figure starts with an example participant but then the analysis of the entire dataset is not as exhaustive. For example, in Figure 6 we have a first row with the single participants and then an average over participants. One would expect quantifications of results from each participant (i.e. from the top rows of GFg 6) extracting some relevant features of results from each participant and then showing the distribution of these features across participants. This would complement the subject average analysis.
The results are now clearly split into sections, where we first deal with all the single participant analyses, then the surrogate testing to confirm the basic results, then the participant aggregate results (Figure 7 and Supplementary Figure 7). The participant aggregate results reiterate the basic findings for the single participants. The key finding is straightforward (SF power decreases with SF) and required only one statistical analysis per subject.
(5) The function of brain phase dynamics at different frequencies and scales has been examined in previous papers at frequencies and scales relevant to what the authors treat. The authors may want to be more extensive with citing relevant studies and elaborating on the implications for them. Some examples below:
Womelsdorf T, et alScience. 2007
Besserve M et al. PloS Biology 2015
Nauhaus I et al Nat Neurosci 2009
We have added two paragraphs to the discussion, in response to the reviewer suggestion (lines 606-623). These paragraphs place our high TF findings in the context of previous research.
Reviewer #2 (Public review):
Summary:
In this paper, the authors analyze the organization of phases across different spatial scales. The authors analyze intracranial, stereo-electroencephalogram (sEEG) recordings from human clinical patients. The authors estimate the phase at each sEEG electrode at discrete temporal frequencies. They then use higher-order SVD (HOSVD) to estimate the spatial frequency spectrum of the organization of phase in a data-driven manner. Based on this analysis, the authors conclude that most of the variance explained is due to spatially extended organizations of phase, suggesting that the best description of brain activity in space and time is in fact a globally organized process. The authors' analysis is also able to rule out several important potential confounds for the analysis of spatiotemporal dynamics in EEG.
Strengths:
There are many strengths in the manuscript, including the authors' use of SVD to address the limitation of irregular sampling and their analyses ruling out potential confounds for these signals in the EEG.
Weaknesses:
Some important weaknesses are not properly acknowledged, and some conclusions are overinterpreted given the evidence presented.
The central weakness is that the analyses estimate phase from all signal time points using wavelets with a narrow frequency band (see Methods - "Numerical methods"). This step makes the assumption that phase at a particular frequency band is meaningful at all times; however, this is not necessarily the case. Take, for example, the analysis in Figure 3, which focuses on a temporal frequency of 9.2 Hz. If we compare the corresponding wavelet to the raw sEEG signal across multiple points in time, this will look like an amplitude-modulated 9.2 Hz sinusoid to which the raw sEEG signal will not correspond at all. While the authors may argue that analyzing the spatial organization of phase across many temporal frequencies will provide insight into the system, there is no guarantee that the spatial organization of phase at many individual temporal frequencies converges to the correct description of the full sEEG signal. This is a critical point for the analysis because while this analysis of the spatial organization of phase could provide some interesting results, this analysis also requires a very strong assumption about oscillations, specifically that the phase at a particular frequency (e.g. 9.2 Hz in Figure 3, or 8.0 Hz in Figure 5) is meaningful at all points in time. If this is not true, then the foundation of the analysis may not be precisely clear. This has an impact on the results presented here, specifically where the authors assert that "phase measured at a single contact in the grey matter is more strongly a function of global phase organization than local". Finally, the phase examples given in Supplementary Figure 5 are not strongly convincing to support this point.
“using wavelets with a narrow frequency band … this analysis also requires a very strong assumption about oscillations, specifically that the phase at a particular frequency (e.g. 9.2 Hz in Figure 3, or 8.0 Hz in Figure 5) is meaningful at all points in time”
Our method uses very short time-window Morlet wavelets to avoid the assumptions of oscillations, i.e., long-lasting sinusoids in the signal, in the sense of sinusoidal waveforms, or limit cycles extending in time. Cortical TWs can only last one or two cycles (Alexander et al., 2006), requiring methods that are compact in the time domain to avoid underreporting the desired phenomena. Additionally, the short time-window Morlet wavelets have low frequency resolution, so they are robust with respect to shifts in frequency between sites. We now discuss this issue explicitly in the Methods (lines 658-674). This means the phase estimation methods used in the manuscript precisely do not have the problem of assuming narrow-band oscillations in the signal. The methods are also robust to the exact shape of the waveforms; the signal needs be only approximately sinusoidal; to rise and fall. This means the Fourier variant we use does not introduce ringing artefact that can be introduced using longer timeseries methods, such as FFT.
“This step makes the assumption that phase at a particular frequency band is meaningful at all times”
This important consideration is entrenched in our choice of methods. By way of explanatory background, we point out that this step is not the final step. Aggregation methods can be used to distinguish between signal and noise. In the simple case, event-locked time-series of phase can be averaged. This would allow consistent (non-noise) phase relations to be preserved, while the inconsistent (including noise) phase relations would be washed out. This is part of the logic behind all such aggregation procedures, e.g., phase-locking, coherence. SVD has the advantage of capturing consistent relations in this sense, but without loss of information as occurs in averaging (up to the choice of number of singular vectors in the final model). Specifically, maps of the spatial covariances in phase are captured in the order of the variance explained. Noise (in the sense conveyed by the reviewer) in the phase measurements will not contribute to highest rank singular vectors. SVD is commonly used to remove noise, and that is one of its purposes here. This point can be seen by considering the very smooth singular vectors derived from MEG (Figure 3F) in this new version of the manuscript. These maps of phase gradients pull out only the non-noisy relations, even as their weighted sums reproduce any individual sample to any desired accuracy.
To summarize, the next step (of incorporating the phase measure into the SVD) neatly bypasses the issue of non-meaningful phase quantification. This is one of the reasons why we do not undertake the spatial frequency estimates on the raw matrices of estimated phase.
We now include a new sub-paragraph on this topic in the methods, lines 831-838.
In addition, we have reworded the first description of the methods with a new paragraph at the end of the introduction, which better balances the description of the steps involved. The two sentences (lines 162-166 highlight the issue of concern to the reviewer.
“there is no guarantee that the spatial organization of phase at many individual temporal frequencies converges to the correct description of the full sEEG signal.”
The correct description of the full sEEG signal is beyond the scope of the present research. Our main goal, as stated, is to show that the hypothesis that ‘extra-cranial measurements of TWs is the result of projection from localized activity’ is not supported by the evidence of spatial patterns of activity in the cortex. Since this activity can be accessed as single frequency band (especially if localized sources create the large-scale patterns), analysis of SF on a TF-by-TF basis is sufficient.
“This has an impact on the results presented here, specifically where the authors assert that "phase measured at a single contact in the grey matter is more strongly a function of global phase organization than local".
We agree with the reviewer, even though we expect that the strongest influences on local phase are due to other cortical signals in the same band. The implicit assumption of the focus on bands of the same temporal frequency is now made explicit in the abstract (lines 31-34).
A sentence addressing this issue had been added to the first paragraph of the discussion (lines 579-582).
Inclusion of cross-frequency interactions would likely require a highly regular measurement array over the scales of interest here, i.e., the noise levels inherent in the spatial organization of sEEG contacts would not support such analyses.
“Finally, the phase examples given in Supplementary Figure 5 are not strongly convincing to support this point.”
We have removed the phase examples that were previously in Supplementary Figure 5 (and Figure 5 in the previous version of the main text), since further surrogate testing and modelling (Supplementary Figure 11) shows the LSVs from irregular arrays will inevitably capture mixtures of low and high SF signals. The final section of the Methods explains this effect in some detail. Instead, the new version of the manuscript relies on new surrogate testing to validate our methods.
Another weakness is in the discussion on spatial scale. In the analyses, the authors separate contributions at (approximately) > 15 cm as macroscopic and < 15 cm as mesoscopic. The problem with the "macroscopic" here is that 15 cm is essentially on the scale of the whole brain, without accounting for the fact that organization in sub-systems may occur. For example, if a specific set of cortical regions, spanning over a 10 cm range, were to exhibit a consistent organization of phase at a particular temporal frequency (required by the analysis technique, as noted above), it is not clear why that would not be considered a "macroscopic" organization of phase, since it comprises multiple areas of the brain acting in coordination. Further, while this point could be considered as mostly semantic in nature, there is also an important technical consideration here: would spatial phase organizations occurring in varying subsets of electrodes and with somewhat variable temporal frequency reliably be detected? If this is not the case, then could it be possible that the lowest spatial frequencies are detected more often simply because it would be difficult to detect variable organizations in subsets of electrodes?
The motivation for our study was to show that large-scale TWs measured outside the cortex cannot be the result of more localized activity being ‘projected up’. In this case, the temporal frequency of the artefactual waves would be the same as the localized sources, so the criticism does not apply.
“while this point could be considered as mostly semantic in nature”
We have changed the terminology in the paper to better coincide with standard usage. Macroscopic now refers to >1cm, while we refer to >8cm as large-scale.
“15 cm is essentially on the scale of the whole brain, without accounting for the fact that organization in sub-systems may occur.”
We can assume that subtle frequency variation (e.g., within an alpha phase binding) is greatest at the largest scales of cortex, or at least not less varying than measurements within regions. This means that not considering frequency-drift effects will not inflate low spatial frequency power over high spatial frequency power. Even so, the power spectrum we estimated is approximately 1/SF, so that unmeasured cross-frequency effects in binding (causal influences on local phase) would have to overcome the strength of this relation for this criticism to apply, which seems unlikely.
“would spatial phase organizations occurring in varying subsets of electrodes and with somewhat variable temporal frequency reliably be detected?”
See our previous comments about the low temporal frequency resolution of two cycle Morlet wavelets. The answer is yes, up to the range approximated by half-power bandwidth, which is large in the case of this method (see lines 760-764).
Another weakness is disregarding the potential spike waveform artifact in the sEEG signal in the context of these analyses. Specifically, Zanos et al. (J Neurophysiol, 2011) showed that spike waveform artifacts can contaminate electrode recordings down to approximately 60 Hz. This point is important to consider in the context of the manuscript's results on spatial organization at temporal frequencies up to 100 Hz. Because the spike waveform artifact might affect signal phase at frequencies above 60 Hz, caution may be important in interpreting this point as evidence that there is significant phase organization across the cortex at these temporal frequencies.
We have now added a sentence on this issue to the discussion (lines 600-602).
However, our reading of the Zanos et al. paper is that the low temporal frequency (60-100Hz) contribution of spikes and spike patterns is negligible compared to genuine post-synaptic membrane fluctuations (see their Figure 3). These considerations come more strongly into play when correlations between LFP and spikes are calculated or spike triggered averaging is undertaken, since then a signal is being partly correlated with itself, or, partly averaged over the supposedly distinct signal with which it was detected.
A last point is that, even though the present results provide some insight into the organization of phase across the human brain, the analyses do not directly link this to spiking activity. The predictive power that these spatial organizations of phase could provide for spiking activity - even if the analyses were not affected by the distortion due to the narrow-frequency assumption - remains unknown. This is important because relating back to spiking activity is the key factor in assessing whether these specific analyses of phase can provide insight into neural circuit dynamics. This type of analysis may be possible to do with the sEEG recordings, as well, by analyzing high-gamma power (Ray and Maunsell, PLoS Biology, 2011), which can provide an index of multi-unit spiking activity around the electrodes.
“even if the analyses were not affected by the distortion due to the narrow-frequency assumption”
See our earlier comment about narrow TFs; this is not the case in the present work.
The spiking activity analysis would be an interesting avenue for future research. It appears the 1000Hz sampling frequency in the present data is not sufficient for method described in Ray & Maunsell (2011). On a related topic, we have shown that large-scale traveling waves in the MEG and 8cm waves in ECoG can both be used to predict future localized phase at a single sensor/contact, two cycles into the future (Alexander et al., 2019). This approach could be used to predict spiking activity, by combining it with the reviewer’s suggestion. However, the current manuscript is motivated by the argument that measured large-scale extra-cranial TWs are merely projections of localized cortical activity. Since spikes do not arise in this argument, we feel it is outside the scope of the present research. We have added this suggestion to the discussion as a potential line of future research (lines 686-688).
Reviewer #3 (Public review):
Summary:
The authors propose a method for estimation of the spatial spectra of cortical activity from irregularly sampled data and apply it to publicly available intracranial EEG data from human patients during a delayed free recall task. The authors' main findings are that the spatial spectra of cortical activity peak at low spatial frequencies and decrease with increasing spatial frequency. This is observed over a broad range of temporal frequencies (2-100 Hz).
Strengths:
A strength of the study is the type of data that is used. As pointed out by the authors, spatial spectra of cortical activity are difficult to estimate from non-invasive measurements (EEG and MEG) due to signal mixing and from commonly used intracranial measurements (i.e. electrocorticography or Utah arrays) due to their limited spatial extent. In contrast, iEEG measurements are easier to interpret than EEG/MEG measurements and typically have larger spatial coverage than Utah arrays. However, iEEG is irregularly sampled within the threedimensional brain volume and this poses a methodological problem that the proposed method aims to address.
Weaknesses:
The used method for estimating spatial spectra from irregularly sampled data is weak in several respects.
First, the proposed method is ad hoc, whereas there exist well-developed (Fourier-based) methods for this. The authors don't clarify why no standard methods are used, nor do they carry out a comparative evaluation.
We disagree that the method is ad hoc, though the specific combination of SVD and multiscale differencing is novel in its application to sEEG. The SVD method has been used to isolate both ~30cm TWs in MEG and EEG (Alexander et al., 2013; 2016), as well as 8cm waves in ECoG (Alexander et al., 2013; 2019). In our opening examples in the results now reiterate these previous related findings, by way of example analysis of MEG data (Figure 3). This will better inform the reader on the extent of continuity of the method from previous research.
Standard FFT has been used after interpolating between EEG electrodes to produce a uniform array (Alamia et al., 2023). There exist well-developed Fourier methods for nonuniform grids, such as simple interpolation, the butterfly algorithm, wavefield extrapolation and multi-scale vector field techniques. However, the problems for which these methods are designed require non-sparse sampling or less irregular arrays. The sEEG contacts (reduced in number to grey matter contacts) are well outside the spatial irregularity range of any Fourierrelated methods that we are aware of, particularly at the broad range of spatial scales of interest here (2cm up to 24cm). This would make direct comparison of these specialized Fourier method to our novel methods, in the sEEG, something of a straw-man comparison.
We now include a summary paragraph in the introduction, which is a brief review of Fourier methods designed to deal with non-uniform sampling (lines 159-162).
Second, the proposed method lacks a theoretical foundation and hinges on a qualitative resemblance between Fourier analysis and singular value decomposition.
We have improved our description of the theoretical relation between Fourier analysis and SVD (additional material at lines 839-861 and 910-922). In fact, there are very strong links between the two methods, and now it should be clearer that our method does not rely on a mere qualitative resemblance.
Third, the proposed method is not thoroughly tested using simulated data. Hence it remains unclear how accurate the estimated power spectra actually are.
We now include a new surrogate testing procedure, which takes as inputs the empirical data and a model signal (of known spatial frequency) in various proportions. Thus, we test both the impact of small amount of surrogate signal on the empirical signal, and the impact of ‘noise’ (in the form of a small amount of empirical signal) added to the well-defined surrogate signal.
In addition, there are a number of technical issues and limitations that need to be addressed or clarified (see recommendations to the authors).
My assessment is that the conclusions are not completely supported by the analyses. What would convince me, is if the method is tested on simulated cortical activity in a more realistic set-up. I do believe, however, that if the authors can convincingly show that the estimated spatial spectra are accurate, the study will have an impact on the field. Regarding the methodology, I don't think that it will become a standard method in the field due to its ad hoc nature and well-developed alternatives.
Simulations of cortical activity do not seem the most direct way to achieve this goal. The first author has published in this area (Liley et. al., 1999; Wright et al., 2001), and such simulations, for both bulk and neuronally based simulations, readily display traveling wave activity at low spatial frequencies (indeed, this was the origin of the present scientific journey). The manuscript outlines these results in the introduction, as well as theoretical treatments proposing the same. Several other recent studies have highlighted the appearance of largescale travelling waves using connectome-based models (https://www.biorxiv.org/content/10.1101/2025.07.05.663278v1; https://www.nature.com/articles/s41467-024-47860-x), which we do not include in the manuscript for reasons of brevity. In short, the emergence of TW phenomenon in models is partly a function of the assumptions put into them (i.e., spatial damping, boundary conditions, parameterization of connection fields) and would therefore be inconclusive in our view.
Instead, we rely on the advantages provided by the way our central research question has been posed: that the spatial frequency distribution of grey matter signal can determine whether extra-cranial TWs are artefactual. The newly introduced surrogate methods reflect this advantage by directly adding ground truth spatial frequency components to individual sample measurements. This is a less expensive option than making cortical simulations to achieve the same goal.
For the same reasons, we include testing of the methods using real cortical signals with MEG arrays (for which we could test the effects of increasing sparseness of contacts, test the effects of average referencing, and also construct surrogate time-series with alternative spectra).
Recommendations for the authors:
Reviewer #2 (Recommendations for the authors):
Major points
Methods, Page 18: "... using notch filters to remove the 50Hz line signal and its harmonics ...": The sEEG data appear to have been recorded in North America, where the line frequency is 60 Hz. Is this perhaps a typo, or was a 50 Hz notch filter in fact applied here (which would be a mistake)?
This has now been fixed in the text to read 60Hz. This is the notch filter that was applied.
Minor points
(1) While the authors do state that they are analyzing the "spatial frequency spectrum of phase dynamics" in the abstract, this could be more clearly emphasized. Specifically, the difference between signal power at different spatial frequencies (as analyzed by a standard Fourier analysis) and the organization of phase in space (as done here) could be more clearly distinguished.
We now address this point explicitly on lines 167-172. We now include at the end of the results additional analyses where the TF power is included. This means that the effects of including signal power at different temporal frequencies can be directly compared to our main analysis of the SF spectrum of the phase dynamics.
(2) Figure 1A-C: It was not immediately clear what the lengths provided in these panels (e.g."> 40 cm cortex", "< 10 cm", "< 30 cm") were meant to indicate. This could be made clearer.
Now fixed in the caption.
(3) Figure 2A: If this is surrogate data to explain the analysis technique, it would be helpful to note explicitly at this point.
This Figure has been completely reworked, and now the status of the examples (from illustrative toy models to actual MEG data) should be clearer.
(4) Figure 4A: Why change from "% explained variance" for the example data in Figure 2C to arbitrary units at this point?
This has now been explicitly stated in the methods (lines 1033-1036).
(5) Page 15: "This means either the results were biased by a low pass filter, or had a maximum measurable...": If the authors mean that the low-pass filter is due to spatial blurring of neural activity in the EEG signal, it would be helpful to state that more directly at this point.
Now stated directly, lines 567-568.
(6) Page 23: "...where |X| is the complex magnitude of X...": The modulus operation is defined on a complex number, yet here is applied to a vector of complex numbers. If the operation is elementwise, it should be defined explicitly.
‘Elementwise’ is now stated explicitly (line 1020).
Reviewer #3 (Recommendations for the authors):
In the submitted manuscript, the authors propose a method to estimate spatial (phase) spectra from irregularly sampled oscillatory cortical activity. They apply the method to intracranial (iEEG) data and argue that cortical activity is organized into global waves up to the size of the entire cortex. If true, this finding is certainly of interest, and I can imagine that it has profound implications for how we think about the functional organization of cortical activity.
We have added a section to the discussion outlining the most radical of these implications: what does it mean to do source localization when non-local signals dominate? Lines 670-681.
The manuscript is well-written, with comprehensive introduction and discussion sections, detailed descriptions of the results, and clear figures. However, the proposed method comprised several ad hoc elements and is not well-founded mathematically, its performance is not adequately assessed, and its limitations are not sufficiently discussed. As such, the study failed to convince (me) of the correctness of the main conclusions.
We now have a direct surrogate testing of the method. We have also improved the mathematical explanation to show that the link between Fourier analysis and SVD is not ad hoc, but well understood in both literatures. We had addressed explicitly in the text all of the limitations raised by the reviewers.
Major comments
(1) The main methodological contribution of the study is summarized in the introduction section:
"The irregular sampling of cortical spatial coordinates via stereotactic EEG was partly overcome by the resampling of the phase data into triplets corresponding to the vertices of approximately equilateral triangles within the cortical sheet."
There exist well-established Fourier methods for handling irregularly sampled data so it is unclear why the authors did not resort to these and instead proposed a rather ad hoc method without theoretical justification (see next comment).
We have re-reviewed the literature on non-uniform Fourier analysis. We now briefly review the Fourier methods for handling irregularly sampled data (lines 155-162) and conclude that none of the existing methods can deal with the degree of irregularity, and especially sparsity, found for the grey-matter sEEG contacts.
(2) In the Appendix, the authors write:
"For appropriate signals, i.e., those with power that decreases monotonically with frequency, each of the first few singular vectors, v_k, is an approximate complex sinusoid with wavenumber equal to k."
I don't think this is true in general and if it is, there must be a formal argument that proves it. Furthermore, is it also true for irregularly sampled data? And in more than one spatial dimension? Moreover, it is also unclear exactly how the spatial Fourier spectrum is estimated from the SVD.
In response to these reviewer queries, we now spend considerably more time in the conceptual set-up of the manuscript, giving examples of where SVD can be used to estimate the Fourier spectrum. We have now unpacked the word ‘appropriate’ and we are now more exact in our phrasing. This is laid out in lines 843-850 of the manuscript. In addition, the methods now describe the mathematical links between Fourier analysis and SVD (lines 851861 and 910-922).
The authors write:
"The spatial frequency spectrum can therefore be estimated using SVD by summing over the singular values assigned to each set of singular vectors with unique (or by binning over a limited range of) spatial frequencies. This procedure is illustrated in Figure 1A-C."
First, the singular vectors are ordered to decreasing values of the corresponding singular values. Hence, if the singular values are used to estimate spectral power, the estimated spectrum will necessarily decrease with increasing spatial frequency (as can be seen in Figure 2C). Then how can traveling waves be detected by looking for local maxima of the estimated power spectra?
TWs are not detected by looking for local maxima in the spectra. Our work has focussed on the global wave maps derived from the SVD of phase (i.e., k=1-3), which also explain most of the variance in phase. This is now mentioned in the caption to Figure 3 (lines 291-294).
Second, how are spatial frequencies assigned to the different singular vectors? The proposed method for estimating spatial power spectra from irregularly sampled data seems rather ad hoc and it is not at all clear if, and under what conditions, it works and how accurate it is.
The new version of the manuscript uses a combination of the method previously presented (the multi-scale differencing) and the method previously outlined in the supplementary materials (doing complex-valued SVD on the spatial vectors of phase). We hope that along with the additional expository material in the methods the new version is clearer and seems less ad hoc to the reviewer. Certainly, there are deep and well-understood links between Fourier analysis and SVD, and we hope we have brought these into focus now.
(3) The authors define spatial power spectra in three-dimensional Euclidean space, whereas the actual cortical activity occurs on a two-dimensional sheet (the union of two topological 2spheres). As such, it is not at all clear how the estimated wavelengths in three-dimensional space relate to the actual wavelengths of the cortical activity.
We define spatial power spectra on the folded cortical sheet, rather than Cartesian coordinates. We use geodesic distances in all cases where a distance measurement is required. We have included two new figures (Figure 5 and Supplementary Figure1) showing the mapping of the triangles onto the cortical sheet, which should bring this point home.
(4) The authors' analysis of the iEEG data is subject to a caveat that is not mentioned in the manuscript: As a reference for the local field potentials, the average white-matter signal was used and this can lead to artifactual power at low spatial frequencies. This is because fluctuations in the reference signal are visible as standing waves in the recording array. This might also explain the observation that
"A surprising finding was that the shape of the spatial frequency spectrum did not vary much with temporal frequency."
because fluctuations in the reference signal are expected to have power at all temporal frequencies (1/f spectrum). When superposed with local activity at the recording electrodes, this leads to spurious power at low spatial frequencies. Can the authors exclude this interpretation of the results?
The new version of the manuscript deals explicitly with this potential confound (lines 454467). First, the artefactual global synchrony due to the reference signal (the DC component in our spatial frequency spectra of phase) is at a distinct frequency from the lowest SF of interest here. The lowest spatial frequency is a function of the maximum spatial range of the recording array and not overlapping in our method with the DC component, despite the loss of SF resolution due to the noise of the spatial irregularity of the recording array. This can be seen from consideration of the SF tuning (Figure 4) for the MEG wave maps shown in Figure 3, and the spectra generated for sparse MEG arrays in Supplementary Figure 5. Additionally, this question led us to a series of surrogate tests which are now included in the manuscript. We used MEG to test for the effects of average reference, since in this modality the reference free case is available. The results show that even after imposing a strong and artefactual global synchrony, the method is highly robust to inflation of the DC component, which either way does not strongly influence the SF estimates in the range of interest (4c/m to 12c/m for the case of MEG).
(5) Related to the previous comment: Contrary to the authors' claims, local field potentials are susceptible to volume conduction, particularly when average references are used (see e.g. https://www.cell.com/neuron/fulltext/S0896-6273(11)00883-X)
Methods exist to mitigate these effects (e.g. taking first- or second-order spatial differences of the signals). I think this issue deserves to be discussed.
We have reviewed this research and do not find it to be a problem. The authors cited by the reviewer were concerned with unacknowledged volume conduction up to 1 cm for LFP. The maximum spatial frequency we report here is 50c/m, or equivalent to 2cm. While the intercontact distance on the sEEG electrodes was 0.5cm, in practice the smallest equilateral triangles (i.e., between two electrodes) to be found in the grey matter was around 2cm linear size. We make no statements about SF in the 1cm range. We do now cite this paper and mention this short-range volume conduction (lines 602-605). The method of taking derivatives has the same problems as source localization methods. They remove both artefactual correlations (volume conduction) and real correlations (the low SF interactions of interest here). We mention this now at lines 667-669. In addition, our method to remove negative SF components from the LSVs ameliorates the effects of average referencing. There are now more details in the Methods about this step (lines 924-947), as well as a new supplementary figure illustrating its effects on signal with a known SF spectrum (MEG, supplementary Figure 6).
(6) Could the authors add an analysis that excludes the possibility that the observed local maxima in the spectra are a necessary consequence of the analysis method, rather than reflecting true maxima in the spectra? A (possibly) similar effect can be observed in ordinary Fourier spectra that are estimated from zero-mean signals: Because the signals have zero mean, the power spectrum at frequency zero is close to zero and this leads to an artificial local maximum at low frequencies.
We acknowledge the reviewer’s mathematical point. We do not agree that it could be an issue, though it is important to rule it out definitively. First, removing the DC component will only produce an artefactual low SF peak if the power at low SF is high. This may occur in the reviewer’s example only because temporal frequency has a ~1/f spectrum. If the true spectrum is flat, or increasing in power with f, no such artificial low SF will be produced (see Supplementary Figure 5G). Additionally,
(1) The DC component is well separated from the low SF components in our method;
(2) We now include several surrogate methods which show that our method finds the correct spectral distribution and is not just finding a maximum at low SFs due to the suggested effect (subtraction of the DC component). Analysis of separated wave maps in MEG (Figures 3 & 4) shows the expected peaks in SF, increasing in peak SF for each family of maps when wavenumber increases (roughly three k=1 maps, three k=2 etc.). A specific surrogate test for this query was also undertaken by creating a reverse SF spectrum in MEG phase data, in which the spectrum goes linearly with f over the SF range of interest, rather than the usual 1/f. Our method correctly finds the former spectrum (Supplementary Figure 5). Additionally, we tested for the effects of introducing the average reference and the effects of our method to remove the DC component of the phase SF spectrum (Supplementary Figure 6). We can definitively rule out the reviewer’s concern.
A related issue (perhaps) is the observation that the location of the maximum (i.e. the peak spatial frequency of cortical activity) depends on array size: If cortical activity indeed has a characteristic wavelength (in the sense of its spectrum having a local maximum) would one not expect it to be independent of array size?
This is only true when making estimates for relatively clean sinusoidal signals, and not from broad-band signals. Fourier analysis and our related SVD methods are very much dependent on maximum array size used to measure cortical signals. This is why the first frequency band (after the DC component) in Fourier analysis is always at a frequency equivalent to 1/array_size, even if the signal is known to contain lower frequency components. We now include a further illustration of this in Figure 3, a more detailed exposition of this point in the methods, and in Supplementary Figure 11 we provide a more detailed example of the relation between Fourier analysis and SVD when grids with two distinct scales are used.
In short, it is not possible, mathematically, to measure wavelengths greater than the array size in broad-band data. This is now stated explicitly in the manuscript (lines 143-144). A common approach in Neuroscience research is to first do narrowband filtering, then use a method that can accurately estimate ‘instantaneous’ phase change, such as the Hilbert transform. This is not possible for highly irregular sEEG arrays.
(7) The proposed method of estimating wavelength from irregularly sampled threedimensional iEEG data involves several steps (phase-extraction, singular value decomposition, triangle definition, dimension reduction, etc.) and it is not at all clear that the concatenation of all these steps actually yields accurate estimates.
Did the authors use more realistic simulations of cortical activity (i.e. on the convoluted cortical sheet) to verify that the method indeed yields accurate estimates of phase spectra?
We now included detailed surrogate testing, in which varying combinations of sEEG phase data and veridical surrogate wavelengths are added together.
See our reply from the public reviewer comments. We assess that real neurophysiological data (here, sEEG plus surrogate and MEG manipulated in various ways) is a more accurate way to address these issues. In our experience, large scale TWs appear spontaneously in realistic cortical simulations, and we now cite the relevant papers in the manuscript (line 53).
Minor comments
(1) Perhaps move the first paragraph of the results section to the Introduction (it does not describe any results).
So moved.
(2) The authors write:
"The stereotactic EEG contacts in the grey matter were re-referenced using the average of low-amplitude white matter contacts"
Does this mean that the average is taken over a subset of white-matter contacts (namely those with low amplitude)? Or do the authors refer to all white-matter contacts as "low-amplitude"? And had contacts at different needles different references? Or where the contacts from all needles pooled?
A subset of white-matter contacts was used for re-referencing, namely those 50% with lowest amplitude signals. This subset was used to construct a pooled, single, average reference. We have rephrased the sentences referring to this procedure to improve clarity (line 202 and 743745).
Reviewer #3 (Public review):
Summary:
Guy et al. explored the variation in the pathogenicity of carboxy-terminal frameshift deletions in the X-linked MECP2 gene. Loss-of-function variants in MECP2 are associated with Rett syndrome, a severe neurodevelopmental disorder. Although 100's of pathogenic MECP2 variants have been found in people with Rett syndrome, 8 recurrent point mutations are found in ~65% of disease cases, and frameshift insertions/deletions (indels) variants resulting in production of carboxy-terminal truncated (CTT) MeCP2 protein account for ~10% of cases. Many of these occur in a "deletion prone region" (DPR) between c.1110-1210, with common recurrent deletions c.1157-1197del (CTD1) and c.1164_1207del (CTD2). While two major protein functional domains have been defined in MeCP2, the methyl-binding domain (MBD) and the NCoR interacting domain (NID), the functional role of the carboxy-terminal domain (CTD, beyond the NID, predicted to have a disordered protein structure) has not been identified, and previous work by this group and others demonstrated that a Mecp2 "minigene" lacking the CTD retains MeCP2 function suggesting that the CTD is dispensable. This raises an important question: If the CTD is dispensable, what is the pathogenic basis of the various CTT frameshift variants? Prior work from this group demonstrated that genetically engineered mice expressing the CTD1 variant had decreased expression of Mecp2 RNA and MeCP2 protein and decreased survival, but those expressing the CTD2 variant had normal Mecp2 RNA and protein and survival. However, they noted that differences between the mouse and human coding sequences resulted in different terminal sequences between the two common CTD, with CTD1 ending in -PPX in both mouse and human, but CTD2 ending in -PPC in human but -SPX in mouse, and in the previous paper they demonstrated in humanized mouse ES cells (edited to have the same -PPX termination) containing the CTD2 deletion resulted in decreased Mecp2 RNA and protein levels. This previous work provides the underlying hypotheses that they sought to explore, which is that the pathological basis of disease causing CTD relates to the formation of truncated proteins that end with a specific amino acid sequence (-PPX), which leads to decreased mRNA and protein levels, whereas tolerated, non-pathogenic CTD do not lead to production of truncated proteins ending in this sequence and retain normal mRNA/protein expression.
In this manuscript, they evaluate missense variants, in-frame deletions, and frame shift deletions within the DPR from the aggregated Genome Aggregated Database (gnomAD) and find that the "apparently" normal individuals within gnomAD had numerous tolerated missense variants and in-frame deletions within this region, as well as frameshift deletions (in hemizygous males) in the defined region. All of the gnomAD deletions within this region resulted in terminal amino acid sequences -SPRTX (due to +1 frameshift), whereas nearly all deletion variants in this region from people with Rett syndrome (from the Clinvar copy of the former RettBase database) had a terminal -PPX sequence, due to a +2 frameshift. They hypothesized that terminal proline codons causing ribosomal stalling and "nonsense mediated decay like" degradation of mRNA (with subsequent decreased protein expression) was the basis of the specific pathogenicity of the +2 frameshift variants, and that utilizing adenine base editors (ABE) to convert the termination codon to a tryptophan could correct this issue. They demonstrate this by engineering the change into mouse embryonic stem cell lines and mouse lines containing the CTD1 deletion and show that this change normalized Mecp2 mRNA and protein levels and mouse phenotypes. Finally, they performed an initial proof-of-concept in an inducible HEK cell line and showed the ability of targeted ABE to edit the correct adenine and cause production of the expected larger truncated Mecp2 protein from CTD1 constructs.
The findings of this manuscript provide a level of support for their hypothesis about the pathogenicity versus non-pathogenicity of some MECP2 CTT intragenic deletions and provide preliminary evidence for a novel therapeutic approach for Rett syndrome; however, limitations in their analysis do not fully support the broader conclusions presented.
Strengths:
(1) Utilization of publicly available databases containing aggregated genetic sequencing data from adult cohorts (gnomAD) and people with Rett syndrome (Clinvar copy of RettBase) to compare differences in the composition of the resulting terminal amino acid sequences resulting from deletions presumed to be pathogenic (n+2) versus presumed to be tolerated (n+1).
(2) Evaluation of a unique human pedigree containing an n+1 deletion in this region that was reported as pathogenic, with demonstration of inheritance of this from the unaffected father and presence within other unaffected family members.
(3) Development of a novel engineered mouse model of a previously assumed n+1 pathogenic variant to demonstrate lack of detrimental effect, supporting that this is likely a benign variant and not causative of Rett syndrome.
(4) Creation and evaluation of novel cell lines and mouse models to test the hypothesis that the pathogenicity of the n+2 deletion variants could be altered by a single base change in the frameshifted stop codon.
(5) Initial proof-of-concept experiments demonstrating the potential of ABE to correct the pathogenicity of these n+2 deletion variants.
Weaknesses:
(1) While the use of the large aggregated gnomAD genetic data benefits from the overall size of the data, the presence of genetic variants within this collection does not inherently mean that they are "neutral" or benign. While gnomAD does not include children, it does include aggregated data from a variety of projects targeting neuropsychiatric (and other conditions), so there is information in gnomAD from people with various medical/neuropsychiatric conditions. The authors do make some acknowledgement of this and argue that the presence of intragenic deletion variants in their region of interest in hemizygous males indicates that it is highly likely that these are tolerated, non-pathogenic variants. Broadly, it is likely true that gnomAD MECP2 variants found in hemizygous males are unlikely to cause Rett syndrome in heterozygous females, it does not necessarily mean that these variants have no potential to cause other, milder, neuropsychiatric disorders. As a clear example, within gnomAD, there is a hemizygous male with the rs28934908 C>T variant that results in p.A140V (p.A152V in e1 transcript numbering convention). This pathogenic variant has been found in a number of pedigrees with an X-linked intellectual disability pattern, in which males have a clear neurodevelopmental disorder and heterozygous females have mild intellectual disability (see PMIDs 12325019, 24328834 as representative examples of a large number of publications describing this). Thus, while their claim that hemizygous deletion variants in gnomAD are unlikely to cause Rett syndrome, that cannot make the definitive statement that they are not pathogenic and completely benign, especially when only found in a very small number of individuals in gnomAD.
(2) The authors focus exclusively on deletions within the "DPR", they define as between c.1110-1210 and say that these deletions account for 10% of Rett syndrome cases. However, the published studies that are the basis for this 10% estimate include all genetic variants (frameshift deletions, insertions, complex insertion/deletions, nonsense variants) resulting in truncations beyond the NID. For example, Bebbington 2010 (PMID: 19914908), which includes frameshift indels as early as c.905 and beyond c.1210. Further specific examples from RettBase are described below, but the important point is that their evaluation of only frameshift variants within c.1110-1210 is not truly representative of the totality of genetic variants that collectively are considered CTT and account for 10% of Rett cases.
(3) The authors say that they evaluated the putative pathogenic variants contained within RettBase (which is no longer available, but the data were transferred to Clinvar) for all cases with Classic Rett syndrome and de novo deletion variants within their defined DPR domain. Looking at the data from the Clinvar copy of RettBase, there are a number (n=143) of c-terminal truncating variants (either frameshift or nonsense) present beyond the NID, but the authors only discuss 14 deletion frameshift variants in this manuscript. A number of these variants have molecular features that do not fall into the pathogenic classification proposed by the authors and are not addressed in the manuscript and do not support the generalization of the conclusions presented in this manuscript, especially the conclusion that the determination of pathogenicity of all c-terminal truncating variants can be determined according to their proposed n+2 rule, or that all of the 10% of people with Rett syndrome and c-terminal truncating variants could be treated by using a base editor to correct the -PPX termination codon.
(4) The HEK-based system utilized is convenient for doing the initial experiments testing ABE; however, it represents an artificial system expressing cDNA without splicing. Canonical NMD is dependent on splicing, and while non-canonical "NMD-like" processes are less well understood, a concern is whether the artificial system used can adequately predict efficacy in a native setting that includes introns and splicing.
Author response:
Public Reviews:
Reviewer #1 (Public review):
Summary:
The authors scrutinized differences in C-terminal region variant profiles between Rett syndrome patients and healthy individuals and pinpointed that subtle genetic alternation can cause benign or pathogenic output, which harbours important implications in Rett syndrome diagnosis and proposes a therapeutic strategy. This work will be beneficial to clinicians and basic scientists who work on Rett syndrome, and carries the potential to be applied to other Mendelian rare diseases.
Strengths:
Well-designed genetic and molecular experiments translate genetic differences into functional and clinical changes. This is a unique study resolving subtle changes in sequences that give rise to dramatic phenotypic consequences.
Weaknesses:
There are many base-editing and protein-expression changes throughout the manuscript, and they cause confusion. It would be helpful to readers if authors could provide a simple summary diagram at the end of the paper.
We thank Reviewer #1 for their encouraging comments. As suggested, we will include a summary figure of the genetic changes we have made, and the resulting expression and phenotypic consequences.
Reviewer #2 (Public review):
Summary:
This study by Guy and Bird and colleagues is a natural follow-up to their 2018 Human Molecular Genetics paper, further clarifying the molecular basis of C-terminal deletions (CTDs) in MECP2 and how they contribute to Rett syndrome. The authors combine human genetic data with well-designed experiments in embryonic stem cells, differentiated neurons, and knock-in mice to explain why some CTD mutations are disease-causing while others are harmless. They show that pathogenic mutations create a specific amino acid motif at the C-terminus, where +2 frameshifts produce a PPX ending that greatly reduces MeCP2 protein levels (likely due to translational stalling) whereas +1 frameshifts generating SPRTX endings are well tolerated.
Strengths:
This is a comprehensive and rigorous study that convincingly pinpoints the molecular mechanism behind CTD pathogenicity, with strong agreement between the cell-based and animal data. The authors also provide a proof of principle that modifying the PPX termination codon can restore MeCP2-CTD protein levels and rescue symptoms in mice. In addition, they demonstrate that adenine base editing can correct this defect in cultured cells and increase MeCP2-CTD protein levels. Overall, this is a well-executed study that provides important mechanistic and translational insight into a clinically important class of MECP2 mutations.
Weaknesses:
The adenine base editing to change the termination codon is shown to be feasible in generated cell lines, but has yet to be shown in vivo in animal models.
We thank Reviewer #2 for their positive comments. We agree that an in vivo study demonstrating effective DNA base editing in our CTD-1 mouse model is the obvious next step, and this work is in progress. However, given the ever-increasing use of pre- and neonatal screening for genetic diseases, we felt it important to disseminate our findings as soon as possible. The family pedigree in Figure 3C is a clear demonstration of this need.
Reviewer #3 (Public review):
Summary:
Guy et al. explored the variation in the pathogenicity of carboxy-terminal frameshift deletions in the X-linked MECP2 gene. Loss-of-function variants in MECP2 are associated with Rett syndrome, a severe neurodevelopmental disorder. Although 100's of pathogenic MECP2 variants have been found in people with Rett syndrome, 8 recurrent point mutations are found in ~65% of disease cases, and frameshift insertions/deletions (indels) variants resulting in production of carboxy-terminal truncated (CTT) MeCP2 protein account for ~10% of cases. Many of these occur in a "deletion prone region" (DPR) between c.1110-1210, with common recurrent deletions c.1157-1197del (CTD1) and c.1164_1207del (CTD2). While two major protein functional domains have been defined in MeCP2, the methyl-binding domain (MBD) and the NCoR interacting domain (NID), the functional role of the carboxy-terminal domain (CTD, beyond the NID, predicted to have a disordered protein structure) has not been identified, and previous work by this group and others demonstrated that a Mecp2 "minigene" lacking the CTD retains MeCP2 function suggesting that the CTD is dispensable. This raises an important question: If the CTD is dispensable, what is the pathogenic basis of the various CTT frameshift variants? Prior work from this group demonstrated that genetically engineered mice expressing the CTD1 variant had decreased expression of Mecp2 RNA and MeCP2 protein and decreased survival, but those expressing the CTD2 variant had normal Mecp2 RNA and protein and survival. However, they noted that differences between the mouse and human coding sequences resulted in different terminal sequences between the two common CTD, with CTD1 ending in -PPX in both mouse and human, but CTD2 ending in -PPC in human but -SPX in mouse, and in the previous paper they demonstrated in humanized mouse ES cells (edited to have the same -PPX termination) containing the CTD2 deletion resulted in decreased Mecp2 RNA and protein levels. This previous work provides the underlying hypotheses that they sought to explore, which is that the pathological basis of disease causing CTD relates to the formation of truncated proteins that end with a specific amino acid sequence (-PPX), which leads to decreased mRNA and protein levels, whereas tolerated, non-pathogenic CTD do not lead to production of truncated proteins ending in this sequence and retain normal mRNA/protein expression.
In this manuscript, they evaluate missense variants, in-frame deletions, and frame shift deletions within the DPR from the aggregated Genome Aggregated Database (gnomAD) and find that the "apparently" normal individuals within gnomAD had numerous tolerated missense variants and in-frame deletions within this region, as well as frameshift deletions (in hemizygous males) in the defined region. All of the gnomAD deletions within this region resulted in terminal amino acid sequences -SPRTX (due to +1 frameshift), whereas nearly all deletion variants in this region from people with Rett syndrome (from the Clinvar copy of the former RettBase database) had a terminal -PPX sequence, due to a +2 frameshift. They hypothesized that terminal proline codons causing ribosomal stalling and "nonsense mediated decay like" degradation of mRNA (with subsequent decreased protein expression) was the basis of the specific pathogenicity of the +2 frameshift variants, and that utilizing adenine base editors (ABE) to convert the termination codon to a tryptophan could correct this issue. They demonstrate this by engineering the change into mouse embryonic stem cell lines and mouse lines containing the CTD1 deletion and show that this change normalized Mecp2 mRNA and protein levels and mouse phenotypes. Finally, they performed an initial proof-of-concept in an inducible HEK cell line and showed the ability of targeted ABE to edit the correct adenine and cause production of the expected larger truncated Mecp2 protein from CTD1 constructs.
The findings of this manuscript provide a level of support for their hypothesis about the pathogenicity versus non-pathogenicity of some MECP2 CTT intragenic deletions and provide preliminary evidence for a novel therapeutic approach for Rett syndrome; however, limitations in their analysis do not fully support the broader conclusions presented.
Strengths:
(1) Utilization of publicly available databases containing aggregated genetic sequencing data from adult cohorts (gnomAD) and people with Rett syndrome (Clinvar copy of RettBase) to compare differences in the composition of the resulting terminal amino acid sequences resulting from deletions presumed to be pathogenic (n+2) versus presumed to be tolerated (n+1).
(2) Evaluation of a unique human pedigree containing an n+1 deletion in this region that was reported as pathogenic, with demonstration of inheritance of this from the unaffected father and presence within other unaffected family members.
(3) Development of a novel engineered mouse model of a previously assumed n+1 pathogenic variant to demonstrate lack of detrimental effect, supporting that this is likely a benign variant and not causative of Rett syndrome.
(4) Creation and evaluation of novel cell lines and mouse models to test the hypothesis that the pathogenicity of the n+2 deletion variants could be altered by a single base change in the frameshifted stop codon.
(5) Initial proof-of-concept experiments demonstrating the potential of ABE to correct the pathogenicity of these n+2 deletion variants.
Weaknesses:
(1) While the use of the large aggregated gnomAD genetic data benefits from the overall size of the data, the presence of genetic variants within this collection does not inherently mean that they are "neutral" or benign. While gnomAD does not include children, it does include aggregated data from a variety of projects targeting neuropsychiatric (and other conditions), so there is information in gnomAD from people with various medical/neuropsychiatric conditions. The authors do make some acknowledgement of this and argue that the presence of intragenic deletion variants in their region of interest in hemizygous males indicates that it is highly likely that these are tolerated, non-pathogenic variants. Broadly, it is likely true that gnomAD MECP2 variants found in hemizygous males are unlikely to cause Rett syndrome in heterozygous females, it does not necessarily mean that these variants have no potential to cause other, milder, neuropsychiatric disorders. As a clear example, within gnomAD, there is a hemizygous male with the rs28934908 C>T variant that results in p.A140V (p.A152V in e1 transcript numbering convention). This pathogenic variant has been found in a number of pedigrees with an X-linked intellectual disability pattern, in which males have a clear neurodevelopmental disorder and heterozygous females have mild intellectual disability (see PMIDs 12325019, 24328834 as representative examples of a large number of publications describing this). Thus, while their claim that hemizygous deletion variants in gnomAD are unlikely to cause Rett syndrome, that cannot make the definitive statement that they are not pathogenic and completely benign, especially when only found in a very small number of individuals in gnomAD.
(2) The authors focus exclusively on deletions within the "DPR", they define as between c.1110-1210 and say that these deletions account for 10% of Rett syndrome cases. However, the published studies that are the basis for this 10% estimate include all genetic variants (frameshift deletions, insertions, complex insertion/deletions, nonsense variants) resulting in truncations beyond the NID. For example, Bebbington 2010 (PMID: 19914908), which includes frameshift indels as early as c.905 and beyond c.1210. Further specific examples from RettBase are described below, but the important point is that their evaluation of only frameshift variants within c.1110-1210 is not truly representative of the totality of genetic variants that collectively are considered CTT and account for 10% of Rett cases.
(3) The authors say that they evaluated the putative pathogenic variants contained within RettBase (which is no longer available, but the data were transferred to Clinvar) for all cases with Classic Rett syndrome and de novo deletion variants within their defined DPR domain. Looking at the data from the Clinvar copy of RettBase, there are a number (n=143) of c-terminal truncating variants (either frameshift or nonsense) present beyond the NID, but the authors only discuss 14 deletion frameshift variants in this manuscript. A number of these variants have molecular features that do not fall into the pathogenic classification proposed by the authors and are not addressed in the manuscript and do not support the generalization of the conclusions presented in this manuscript, especially the conclusion that the determination of pathogenicity of all c-terminal truncating variants can be determined according to their proposed n+2 rule, or that all of the 10% of people with Rett syndrome and c-terminal truncating variants could be treated by using a base editor to correct the -PPX termination codon.
(4) The HEK-based system utilized is convenient for doing the initial experiments testing ABE; however, it represents an artificial system expressing cDNA without splicing. Canonical NMD is dependent on splicing, and while non-canonical "NMD-like" processes are less well understood, a concern is whether the artificial system used can adequately predict efficacy in a native setting that includes introns and splicing.
We thank reviewer #3 for their constructive comments. A number of these relate to our analysis of databases of pathogenic (RettBASE) and non-pathogenic (gnomAD) databases. We disagree with their assertion that we are looking at only a small subset of RTT CTD mutations. We detail 14 different RTT CTDs in the manuscript, but these include the 3 most frequently occurring, which alone account for 121 RTT cases in RettBASE.
We used the original RettBASE database for our analysis, which contained significantly more information than was transferred to Clinvar. We may not have made this sufficiently clear and will remedy this during revision of the manuscript.
We stress that RettBASE contained many non-RTT causing mutations. For this reason, we employed stringent selection criteria to define a high-confidence set of RTT CTD alleles. Importantly, this set was chosen before any investigation of reading frame or C-terminal amino acid sequence. Our stringent set was selected based on three criteria: location within the C-terminal deletion prone region (CT-DPR), a diagnosis of Classical RTT and at least one case where that mutation had been shown to be absent from both parents (i.e. that it was a de novo mutation). This excluded a large number of CTD alleles which fitted the +2 frameshift/PPX ending category as well as some in other categories. There are good reasons to believe that the vast majority of genuinely pathogenic RTT CTD mutations do fall into this class.
Concerning gnomAD CTDs, we chose to restrict our detailed analysis to those which are present in the hemizygous state, to exclude individuals which mask a pathogenic mutation due to skewed X-inactivation. Data from all zygosities are shown in Fig. 3, SF1.
We will revise the manuscript to include tables of all extracted data relevant to this region, from both gnomAD and RettBASE, along with analysis of a less stringent set of RettBASE CTDs for reading frame and C-terminal amino acid sequence. We hope this will make clear the strength of the evidence for our conclusions.
We agree with Reviewer #3 that inclusions of variants in gnomAD does not exclude the possibility that they may cause medical/psychiatric conditions other than RTT. This point is already mentioned in the Discussion, but we plan to emphasise it further. The pedigree included in the paper, as well as others that we have learned of, argue that loss of the C-terminus of MeCP2 has few if any phenotypic consequences, but more cases are needed to robustly assess this conclusion.
We disagree that our HEK cell-based system is not suitable for testing efficacy of reagents for use on endogenous alleles in vivo. The editing process is not dependent on splicing, and we have shown in this manuscript that making this single base change has the same effect on an endogenous knock-in allele (CTD1 X>W) or a cDNA-based transgene (Flp-In T-REx CTD1 + base editing), namely, to increase the amount of truncated MeCP2 produced.
Reviewer #2 (Public review):
I have completed a thorough review of this paper, which seeks to use the large datasets of species occurrences available through GBIF to estimate variation in how large numbers of plant and animal species are associated with urbanization throughout the world, describing what they call the "species urbanness distribution" or SUD. They explore how these SUDs differ between regions and different taxonomic levels. They then calculate a measure of urban tolerance and seek to explore whether organism size predicts variation in tolerance among species and across regions.
The study is impressive in many respects. Over the course of several papers, Callaghan and coauthors have been leaders in using "big [biodiversity] data" to create metrics of how species' occurrence data are associated with urban environments, and in describing variation in urban tolerance among taxa and regions. This work has been creative, novel, and it has pushed the boundaries of understanding how urbanization affects a wide diversity of taxa. The current paper takes this to a new level by performing analyses on over 94000 observations from >30,000 species of plants and animals, across more than 370 plant and animal taxonomic families. All of these analyses were focused on answering two main questions:
(1) What is the shape of species' urban tolerance distributions within regional communities?
(2) Does body size consistently correlate with species' urban tolerance across taxonomic groups and biogeographic contexts?
Overall, I think the questions are interesting and important, the size and scope of the data and analyses are impressive, and this paper has a potentially large contribution to make in pushing forward urban macroecology specifically and urban ecology and evolution more generally.
Despite my enthusiasm for this paper and its potential impact, there are aspects that could be improved, and I believe the paper requires major revision.
Some of these revisions ideally involve being clearer about the methodology or arguments being made. In other cases, I think their metrics of urban tolerance are flawed and need to be rethought and recalculated, and some of the conclusions are inaccurate. I hope the authors will address these comments carefully and thoroughly. I recognize that there is no obligation for authors to make revisions. However, revising the paper along the lines of the comments made below would increase the impact of the paper and its clarity to a broad readership.
Major Comments:
(1) Subrealms
Where does the concept of "subrealms" come from? No citation is given, and it could be said that this sounds like an idea straight out of Middle Earth. How do subrealms relate to known bioclimatic designations like Koppen Climate classifications, which would arguably be more appropriate? Or are subrealms more socio-ecologically oriented? From what I can tell, each subrealm lumps together climatically diverse areas. It might be better and more tractable to break things in terms of continents, as the rationale for subrealms is unclear, and it makes the analyses and results more confusing. The authors rationalized the use of subrealms to account for potential intraspecific differences in species' response to urbanization, but that is never a core part of the questions or interpretation in the paper, and averaging across subrealms also accounts for intraspecific variation. Another issue with using the subrealm approach is that the authors only included a species if it had 100 observations in a given subrealm, leading to a focus on only the most common species, which may be biased in their SUD distribution. How many more species would be included if they did their analysis at the continental or global scale, and would this change the shape of SUDs?
(2) Methods - urban score
The authors describe their "urban score" as being calculated as "the mean of the distribution of VIIRS values as a relative species specific measure of a response to urban land cover."
I don't understand how this is a "relative species-specific measure". What is it relative to? Figures S4 and S5 show the mean distribution of VIIRS for various taxa, and this mean looks to be an absolute measure. Mean VIIRS for a given species would be fine and appropriate as an "urban score", but the authors then state in the next sentence: "this urban score represents the relative ranking of that species to other species in response to urban land cover".
That doesn't follow from the description of how this is calculated. Something is missing here. Please clarify and add an explicit equation for how the urban score is calculated because the text is unclear and confusing.
(3) Methods - urban tolerance
How the authors are defining and calculating tolerance is unclear, confusing, and flawed in my opinion.
Tolerance is a common concept in ecology, evolution, and physiology, typically defined as the ability for an organism to maintain some measure of performance (e.g., fitness, growth, physiological homeostasis) in the presence versus absence of some stressor. As one example, in the herbivory literature, tolerance is often measured as the absolute or relative difference in fitness of plants that are damaged versus undamaged (e.g., https://academic.oup.com/evolut/article/62/9/2429/6853425?login=true).
On line 309, after describing the calculation of urban scores across subrealms, they write: "Therefore, a species could be represented across multiple subrealms with differing measures of urban tolerance (Fig. S4). Importantly, this continuous metric of urban tolerance is a relative measure of a species' preference, or affinity, to urban areas: it should be interpreted only within each subrealm".
This is problematic on several fronts. First, the authors never define what they mean by the term "tolerance". Second, they refer to urban tolerance throughout the paper, but don't describe the calculation until lines 315-319, where they write (text in [ ] is from the reviewer):
"Within each subrealm, we further accounted for the potential of different levels of urbanization by scaling each species' urban score by subtracting the mean VIIRS of all observations in the subrealm (this value is hereafter referred to as urban tolerance). This 'urban tolerance' (Fig. S5) value can be negative - when species under-occupy urban areas [relative to the average across all species] suggesting they actively avoid them-or positive-when species over-occupy urban areas [relative to the average across all species] suggesting they prefer them (i.e., ranging from urban avoiders to urban exploiters, respectively).<br /> They are taking a relativized urban score and then subtracting the mean VIIRS of all observations across species in a subrealm. How exactly one interprets the magnitude isn't clear and they admit this metric is "not interpretative across subrealms".
This is not a true measure of tolerance, at least not in the conventional sense of how tolerance is typically defined. The problem is that a species distribution isn't being compared to some metric of urbanness, but instead it is relative to other species' urban scores, where species may, on average, be highly urban or highly nonurban in their distribution, and this may vary from subrealm to subrealm. A measure of urban tolerance should be independent of how other species are responding, and should be interpretable across subrealms, continents, and the globe.
I propose the authors use one of two metrics of urban tolerance:
(i) Absolute Urban Tolerance = Mean VIIRS of species_i - Mean VIIRS of city centers<br /> Here, the mean VIIRS of city centers could be taken from the center of multiple cities throughout a subrealm, across a continent, or across the world. Here, the units are in the original VIIRS units where 0 would correspond to species being centered on the most extreme urban habitats, and the most extreme negative values would correspond to species that occupy the most non-urban habitats (i.e., no artificial light at night). In essence, this measure of tolerance would quantify how far a species' distribution is shifted relative to the most highly urbanized habitat available.
(ii) % Urban Tolerance = (Mean VIIRS of species_i - Mean VIIRS of city centers)/MeanVIIRS of city centers * 100%<br /> This metric provides a % change in species mean VIIRS distribution relative to the most urban habitats. This value could theoretically be negative or positive, but will typically be negative, with -100% being completely non-urban, and 0% being completely urban tolerant.
Both of these metrics can be compared across the world, as it would provide either absolute (equation 1) or relative (equation 2) metrics of urban tolerance that are comparable and easily interpretable in any region.
In summary, the definition of tolerance should be clear, the metric should be a true measure of tolerance that is comparable across regions, and an equation should be given.
(4) Figure 1: The figure does not stand alone. For example, what is the hypothesis for thermophily or the temperature-size rule? The authors should expand the legend slightly to make the hypotheses being illustrated clearer.
(5) SUDs: I don't agree with the conclusion given on line 83 ("pattern was consistent across subrealms and several taxonomic levels") or in the legend of Figure 2 ("there were consistent patterns for kingdoms, classes, and orders, as shown by generally similar density histograms shapes for each of these").
The shapes of the curves are quite different, especially for the two Kingdoms and the different classes. I agree they are relatively consistent for the different taxonomic Orders of insects.
Reviewer #3 (Public review):
Summary:
This paper reports on an association between body size and the occurrence of species in cities, which is quantified using an 'urban score' that can be visualized as a 'Species Urbanness Distribution' for particular taxa. The authors use species records from the Global Biodiversity Information Facility (GBIF) and link the occurrence data to nighttime lighting quantified using satellite data (Visible Infrared Imaging Radiometer Suite-VIIRS). They link the urban score to body size data to find 'heterogeneous relationship between body size and urban tolerance across the tree'. The results are then discussed with reference to potential mechanisms that could possibly produce the observed effects (cf. Figure 1).
Strengths:
The novelty of this study lies in the huge number of species analyzed and the comparison of results among animal taxa, rather than in a thorough analysis of what traits allow species to persist under urban conditions. Such analyses have been done using a much more thorough approach that employs presence-absence data as well as a suite of traits by other studies, for example, in (Hahs et al. 2023, Neate-Clegg et al. 2023). The dataset that the authors produced would also be very valuable if these raw data were published, both the cleaned species records as well as the body sizes.
The paper could strongly add to our understanding of what species occur in cities when the open questions are addressed.
Weaknesses:
I value the approach of the authors, but I think the paper needs to be revised.
In my view, the authors could more carefully validate their approach. Currently, any weakness or biases in the approach are quickly explained away rather than carefully explored. This concerns particularly the use of presence-only data, but also the calculation of the urban score.
The vast majority of data in GBIF is presence-only data. This produces a strong bias in the analysis presented in the paper. For some taxa, it is likely that occurrences within the city are overrepresented, and for other taxa, the opposite is true (cf. Sweet et al. 2022). I think the authors should try to address this problem.
The authors should compare their results to studies focusing on particular taxa where extensive trait-based analyses have already been performed, i.e., plants and birds. In fact, I strongly suggest that the authors should compare their results to previous studies on the relationship between traits, including body size and occurrences along a gradient of urbanisation, to draw conclusions about the validity of the approach used in the current study, which has a number of weaknesses.
They should be be more careful in coming up with post-hoc explanations of why the pattern found in this study makes sense or suggests a particular mechanism. This reviewer considers that there is no way in which the current study can disentangle the different possible mechanisms without further analyses and data, so I would suggest pointing out carefully how the mechanisms could be studied
More details should be given about the methodology. The readers should be able to understand the methods without having to read a number of other papers.
References:
Hahs, A. K., B. Fournier, M. F. Aronson, C. H. Nilon, A. Herrera-Montes, A. B. Salisbury, C. G. Threlfall, C. C. Rega-Brodsky, C. A. Lepczyk, and F. A. La Sorte. 2023. Urbanisation generates multiple trait syndromes for terrestrial animal taxa worldwide. Nature Communications 14:4751.
Neate-Clegg, M. H. C., B. A. Tonelli, C. Youngflesh, J. X. Wu, G. A. Montgomery, Ç. H. Şekercioğlu, and M. W. Tingley. 2023. Traits shaping urban tolerance in birds differ around the world. Current Biology 33:1677-1688.
Sweet, F. S. T., B. Apfelbeck, M. Hanusch, C. Garland Monteagudo, and W. W. Weisser. 2022. Data from public and governmental databases show that a large proportion of the regional animal species pool occur in cities in Germany. Journal of Urban Ecology 8:juac002.
Reviewer #1 (Public review):
Summary:
In this paper, Qiu et al. developed a novel spatial navigation task to investigate the formation of multi-scale representations in the human brain. Over multiple sessions and diverse tasks, participants learned the location of 32 objects distributed across 4 different rooms. The key task was a "judgement of relative direction" task delivered in the scanner, which was designed to assess whether object representations reflect local (within-room) or global (across-room) similarity structures. In between the two scanning sessions, participants received extensive further training. The goal of this manipulation was to test how spatial representations change with learning.
Strengths:
The authors designed a very comprehensive set of tasks in virtual reality to teach participants a novel spatial map. The spatial layout is well-designed to address the question of interest in principle. Participants were trained in a multi-day procedure, and representations were assessed twice, allowing the authors to investigate changes in the representation over multiple days.
Weaknesses:
Unfortunately, I see multiple problems with the experimental design that make it difficult to draw conclusions from the results.
(1) In the JRD task (the key task in this paper), participants were instructed to imagine standing in front of the reference object and judge whether the second object was to their left or right. The authors assume that participants solve this task by retrieving the corresponding object locations from memory, rotating their imagined viewpoint and computing the target object's relative orientation. This is a challenging task, so it is not surprising that participants do not perform particularly well after the initial training (performance between 60-70% accuracy). Notably, the authors report that after extensive training, they reached more than 90% accuracy.
However, I wonder whether participants indeed perform the task as intended by the authors, especially after the second training session. A much simpler behavioural strategy is memorising the mapping between a reference object and an associated button press, irrespective of the specific target object. This basic strategy should lead to quite high success rates, since the same direction is always correct for four of the eight objects (the two objects located at the door and the two opposite the door). For the four remaining objects, the correct button press is still the same for four of the six target objects that are not located opposite to the reference object. Simply memorising the button press associated with each reference object should therefore lead to a high overall task accuracy without the necessity to mentally simulate the spatial geometry of the object relations at all.
I also wonder whether the random effect coefficients might reflect interindividual differences in such a strategy shift - someone who learnt this relationship between objects and buttons might show larger increases in RTs compared to someone who did not.
(2) On a related note, the neural effect that appears to reflect the emergence of a global representation might be more parsimoniously explained by the formation of pairwise associations between reference and target objects. Since both objects always came from the same room, an RDM reflecting how many times an object pair acted as a reference-target pair will correlate with the categorical RDM reflecting the rooms corresponding to each object. Since the categorical RDM is highly correlated with the global RDM, this means that what the authors measure here might not reflect the formation of a global spatial map, but simply the formation of pairwise associations between objects presented jointly.
(3) In general, the authors attribute changes in neural effects to new learning. But of course, many things can change between sessions (expectancy, fatigue, change in strategy, but also physiological factors...). Baseline phsiological effects are less likely to influence patterns of activity, so the RSA analyses should be less sensitive to this problem, but especially the basic differences in activation for the contrast of post-learning > pre-learning stages in the judgment of relative direction (JRD) task could in theory just reflect baseline differences in blood oxygenation, possibly due to differences in time of day, caffeine intake, sleep, etc. To really infer that any change in activity or representation is due to learning, an active control would have been great.
(4) RSA typically compares voxel patterns associated with specific stimuli. However, the authors always presented two objects on the screen simultaneously. From what I understand, this is not considered in the analysis ("The β-maps for each reference object were averaged across trials to create an overall β-map for that object."). Furthermore, participants were asked to perform a complex mental operation on each trial ("imagine standing at A, looking at B, then perform the corresponding motor response"). Assuming that participants did this (although see points 1 and 2 above), this means that the resulting neural representation likely reflects a mixture of the two object representations, the mental transformation and the corresponding motor command, and possibly additionally the semantic and perceptual similarity between the two presented words. This means that the βs taken to reflect the reference object representation must be very noisy.
This problem is aggravated by two additional points. Firstly, not all object pairs occurred equally often, because only a fraction of all potential pairs were sampled. If the selection of the object pairs is not carefully balanced, this could easily lead to sampling biases, which RSA is highly sensitive to.
Secondly, the events in the scanner are not jittered. Instead, they are phase-locked to the TR (1.2 sec TR, 1.2 sec fixation, 4.8 sec stimulus presentation). This means that every object onset starts at the same phase of the image acquisition, making HRF sampling inefficient and hurting trial-wise estimation of betas used for the RSA. This likely significantly weakens the strength of the neural inferences that are possible using this dataset.
(5) It is not clear why the authors focus their report of the results in the main manuscript on the preselected ROIs instead of showing whole-brain results. This can be misleading, as it provides the false impression that the neural effects are highly specific to those regions.
(6) I am missing behavioural support for the authors' claims.
Overall, I am not convinced that the main conclusion that global spatial representations emerge during learning is supported by the data. Unfortunately, I think there are some fundamental problems in the experimental design that might make it difficult to address the concerns.
However, if the authors can provide convincing evidence for their claims, I think the paper will have an impact on the field. The question of how multi-scale representations are represented in the human brain is a timely and important one.
Reviewer #2 (Public review):
Summary:
Qui and colleagues studied human participants who learned about the locations of 32 different objects located across 4 different rooms in a common spatial environment. Participants were extensively trained on the object locations, and fMRI scans were done during a relative direction judgement task in a pre- and post-session. Using RSA analysis, the authors report that the hippocampus increased global relative to local representations with learning; the RSC showed a similar pattern, but also increased effects of both global and local information with time.
Strengths:
(1) The manuscript asks a generally interesting question concerning the learning of global versus local spatial information.
(2) The virtual environment task provides a rich and naturalistic spatial setting for participants, and the setup with 32 objects across 4 rooms is interesting.
(3) The within-subject design and use of verbal cues for spatial retrieval is elegant .
Weaknesses:
(1) My main concern is that the global Euclidean distances and room identity are confounded. I fear this means that all neural effects in the RSA could be alternatively explained by associations to the visual features of the rooms that build up over time.
(2) The direction judgement task is not very informative about cognitive changes, as only objects in a room are compared. The setup also discourages global learning, and leaves unclear whether participants focussed on learning the left/right relationships required by the task.
(3) With N = 23, the power is low, and the effects are weak.
(4) It appears no real multiple comparisons correction is done for the ROI based approach, and significance across ROIs is not tested directly.
Reviewer #3 (Public review):
Summary:
The manuscript by Qui et al. explores the issue of spatial learning in both local (rooms) and global (connected rooms) environments. The authors perform a pointing task, which involves either pressing the right or left button in the scanner to indicate where an object is located relative to another object. Participants are repeatedly exposed to rooms over sessions of learning, with one "pre" and one "post" learning session. The authors report that the hippocampus shifted from lower to higher RSA for the global but not the local environment after learning. RSC and OFC showed higher RSA for global object pointing. Other brain regions also showed effects, including ACC, which seemed to show a similar pattern as the hippocampus, as well as other regions shown in Figure S5. The authors attempt to tie their results in with local vs. global spatial representations.
Strengths:
Extensive testing of subjects before and after learning a spatial environment, with data suggesting that there may be fMRI codes sensitive to both global and local codes. Behavioral data suggest that subjects are performing well at the task and learning both global and local object locations, although see further comments.
Weaknesses:
(1) The authors frame the entire introduction around confirming the presence of the cognitive map either locally or globally. There are some significant issues with this framing. For one, the introduction appears to be confirmatory and not testing specific hypotheses that can be falsified. What exactly are the hypotheses being tested? I believe that this relates to the testing whether neural representations are global and/or local. However, this is not clear. Given that a previous paper (Marchette et al. 2014 Nature Neuro, which bears many similarities) showed only local coding in RSC, this paper needs to be discussed in far more depth in terms of its similarities and differences. This paper looked at both position and direction, while the current paper looks at direction. Even here, direction in the current study is somewhat impoverished: it involves either pointing right or left to an object, and much of this could be categorical or even lucky guesses. From what I could tell, all behavioral inferences are based on reaction time and not accuracy, and therefore, it is difficult to determine if the subject's behavior actually reflects knowledge gained or simply faster reaction time, either due to motor learning or a speed-accuracy trade-off. The pointing task is largely egocentric: it can be solved by remembering a facing direction and an object relative to that. It is not the JRD task as has been used in other studies (e.g., Huffman et al. 2019 Neuron), which is continuous and has an allocentric component. This "version" of the task would be largely egocentric. In this way, the pointing task used does not test the core tenets of the cognitive map during navigation, which is defined as allocentric and Euclidean (please see O'Keefe and Nadel 1978, The Hippocampus as a Cognitive Map). Since neither of these assumptions appears valid, the paper should be reframed to reflect spatial representations more broadly or even egocentric spatial representations.
(2) The fMRI data workup is insufficient. What do the authors mean by "deactivations" in Figure 3b? Does this mean the object task showed more activation than the spatial task in HSC? Given that HSC is one of these regions, this would seem to suggest that the hippocampus is more involved in object than spatial processing, although it is difficult to tell from how things are written. The RSA is more helpful, but now a concern is that the analysis focuses on small clusters that are based on analyses determined previously. This appears to be the case for the correlations shown in Figure 3e as well. The issues here are several-fold. For one, it has been shown in previous work that basing secondary analyses on related first analyses can inflate the risk of false positives (i.e., Kriegeskorte et al. 2009 Nature Neuro). The authors should perform secondary analyses in ways that are unbiased by the first analyses, preferably, selecting cluster centers (if they choose to go this route) from previous papers rather than their own analyses. Another option would be to perform analyses at the level of the entire ROI, meaning that the results would generalize more readily. The authors should also perform permutation tests to ensure that the RSA results are reliable, as these can run the risk of false positives (e.g., Nolan et al. 2018 eNeuro). If these results hold, the authors should perform post-hoc (corrected) t-tests for global vs. local before and after learning to ensure these differences are robust and not simply rely on the interaction effect. The figures were difficult to follow in this regard, and an interaction effect does not necessarily mean the differences that are critical (global higher than local after) are necessarily significant. The end part of the results was hard to follow. If ACC showed a similar effect to HC and RSC, why is it not being considered? Many other areas that seemed to show local vs. global effects were dismissed, but these should instead be discussed in terms of whether they are consistent or inconsistent with the hypotheses.
(3) Concerns about the discussion: there are areas involving reverse inference about brain areas rather than connecting the findings with hypotheses (see Poldrack et al. 2006 Trends in Cognitive Science). The authors also argue for 'transfer" of information (for example, from ACC to OFC), but did not perform any connectivity analyses, so these conclusions are not based on any results. Instead, the authors should carefully compare what can be concluded from the reaction time findings and the fMRI data. What is consistent vs. inconsistent with the hypotheses? The authors should also provide a much more detailed comparison with past work. The Marchette et al. paper comes to different conclusions regarding RSC and involves more detailed analyses than those done here, including position. What is different in the current paper that might explain the differences in results? Another previous paper that came to a different conclusion (hippocampus local, retrosplenial global) and should be carefully considered and compared, as it also involved learning of environments and comparisons at different phases (e.g., Wolbers & Buchel 2005 J Neuro). Other papers that have used the JRD task have demonstrated similar, although not identical, networks (e.g., Huffman et al. 2019 Neuron) and the results here should be more carefully compared, as the current task is largely egocentric while the Huffman et al. paper involves a continuous and allocentric version of the JRD task.
(4) The authors cite rodent papers involving single neuron recordings. These are quite different experiments, however: they involve rodents, the rodents are freely moving, and single neurons are recorded. Here, the study involves humans who are supine and an indirect vascular measure of neural activity. Citations should be to studies of spatial memory and navigation in humans using fMRI: over-reliance on rodent studies should be avoided for the reasons mentioned above.