Hypothesis

2 Matching Annotations

May 2026
huggingface.co huggingface.co

https://huggingface.co/papers/2605.13301

1
1. fxp007 19 May 2026
  
  in Public
  
  The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline
  
  Details the methodological pipeline, emphasizing the transition from supervised learning (SFT) to reinforcement learning (RL) and the specific techniques used (reverse-perplexity curriculum, two-stage RL).
  
  methodology SFT RL
Visit annotations in context

Tags

methodology

SFT

RL

Annotators

fxp007

URL

huggingface.co/papers/2605.13301
openai.com openai.com

https://openai.com/index/where-the-goblins-came-from/

1
1. fxp007 01 May 2026
  
  in Public
  
  A search through GPT‑5.5’s SFT data found many datapoints containing “goblin” and “gremlin.”
  
  值得注意的代码示例：SFT（监督微调）数据中的异常数据点可能揭示了模型行为的问题。
  
  notable-code sft-data
Visit annotations in context

Tags

sft-data

notable-code

Annotators

fxp007

URL

openai.com/index/where-the-goblins-came-from/