Reconstructing raw inputs forces models to model irrelevant low-level detail. Predicting in a learned embedding space allows the model to focus on semantically meaningful, causally relevant features.
大多数人认为AI模型需要重建完整的输入数据才能理解世界,但作者认为这种方法迫使模型关注无关的低级细节。相反,在嵌入空间中进行预测可以让模型专注于语义上有意义、因果相关的特征,这是一个反直觉的见解。