5 Matching Annotations
  1. Nov 2020
    1. we designed a reward function that isbased on a game-balancing constant and introduce itinto the Proximal-Policy-Opmitization (PPO) (Schul-man et al., 2017) algorithm, a reinforcement learn-ing method that directly optimizes the policy usinggradient-based learning.

      *핵심 reward function + PPO

    2. still because the player may have a wrong vision ofits own abilities (Missura and G ̈artner, 2009)

      의미 파악 ??

    3. remains inside a range around this constant during the training

      이 문장의 의미 파악하기

    4. a reward function based on a balancingconstan

      reward function 에 대한 내용 조사

    5. how to act while still maintaining the balancing

      밸런스를 유지하면서의 의미..