4 Matching Annotations
  1. Last 7 days
    1. We find that optimization becomes more reliable when a small intermediate-state regularizer is added on top of token-level distillation.

      这个发现提供了一个有价值的见解:在模型级别的蒸馏过程中添加中间状态的正则化项,可以提高优化的可靠性。这表明,除了关注输出分布的匹配外,保持内部表示轨迹的几何一致性对于模型转换也很重要。这种见解可能对其他模型转换和蒸馏方法有启发意义。

  2. Aug 2023
    1. Title: Delays, Detours, and Forks in the Road: Latent State Models of Training Dynamics Authors: Michael Y. Hu1 Angelica Chen1 Naomi Saphra1 Kyunghyun Cho Note: This paper seems cool, using older interpretable machine learning models, graphical models to understand what is going on inside a deep neural network

      Link: https://arxiv.org/pdf/2308.09543.pdf

  3. Feb 2019
  4. Jun 2015