We use the same model and architecture as GPT-2
What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
We use the same model and architecture as GPT-2
What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.