We use a decoder-only Transformer architecture [Vaswani et al., 2017] from the GPT-2family
a random function f
a random function not many or several