Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Feature Scaling in Reinforcement Learning

I am working with RL algorithms like DQN and ActorCritic and I'm curious whether there is a way to correctly scale features which represent state or state/action pair while learning parameters of value function approximator and policy approximator.

In supervised learning we generally scale features on whole training set, in order to make objective function more convex and decrease learning time, storing mean and variance (i.e. zscore normalization) and applying them on test/cv set during validation.

In RL we dynamically obtain data via agent-environment interaction, so DQN's memory buffer updates at every timestep.

It is also necessary to normalize features if they have different scales in RL as well as in Supervised Learning.

Is there any standard process to scale features correctly for DQN and ActorCritic methods specifically, considering the dynamic nature of RL?

Comments