How to reward the simulation in MCTS search

I have a question regarding MCTS simulation. how to reward the Node when simulation win, loss or draw. According to some blog about alpha-zero

QU = Wi/Ni + C* Pi * N^2 / (1 + Ni)

if Reward Wi +1 for the win, Wi-1 for the loss, 0 for the draw.

when Wi get negative value like -4, actions with pi == 0 also get searched. seems not right. pi ==0 much no probability to add this action.

if Reward Wi +1 for the win, Wi+0.5 for the loss, Wi+0 for the draw. "Wi/Ni" equals to 0.5 when node easy to get the draw and the node is much bigger than node didn't search (which Wi == 0 ).

All Questions Answered

Search This Blog

Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

How to reward the simulation in MCTS search

Comments

Post a Comment