Claude Mythos is suspected to be a Recurrent-Depth Transformer (RDT) — also called a Looped Transformer (LT). Rather than stacking hundreds of unique layers, a subset of layers is recycled and run through multiple times per forward pass. Same weights. More loops. Deeper thinking.
这一观点挑战了传统大模型架构的常识,认为Claude Mythos的核心创新不在于增加参数量,而在于通过循环使用相同权重来实现更深层次的推理。这种架构设计反直觉地表明,模型的'深度'可以通过循环迭代而非堆叠层来实现,这可能解释了Mythos在复杂推理任务上的优异表现。