Experiments across six game environments show that COSPLAY with an 8B base model achieves over 25.1 percent average reward improvement against four frontier LLM baselines on single player game benchmarks while remaining competitive on multi player social reasoning games.
在六个游戏环境中进行的实验表明,COSPLAY框架在单人游戏基准测试中,与四个前沿的LLM基线相比,平均奖励提高了25.1%,同时在多人社交推理游戏中也保持了竞争力。