I am working with RL algorithms like DQN and ActorCritic and I'm curious whether there is a way to correctly scale features which represent state or state/action pair while learning parameters of value function approximator and policy approximator.
In supervised learning we generally scale features on whole training set, in order to make objective function more convex and decrease learning time, storing mean and variance (i.e. zscore normalization) and applying them on test/cv set during validation.
In RL we dynamically obtain data via agent-environment interaction, so DQN's memory buffer updates at every timestep.
It is also necessary to normalize features if they have different scales in RL as well as in Supervised Learning.
Is there any standard process to scale features correctly for DQN and ActorCritic methods specifically, considering the dynamic nature of RL?
Comments
Post a Comment