Q learning computation: states unknown -
i confused how implement simple q_learning algorithm. referring nice docummentation: http://artint.info/html/artint_265.html.
the given formula is
q[s,a] ←q[s,a] + α(r+ γmaxa' q[s',a'] - q[s,a])
the problem states unknown because trying learn flappybird's successful moves. q[s,a]
need know value of q[s',a']
if don't know next state, how q function? assuming state described distance between bird , nearest pipe, how compute current q function?
thank help!
s'
current state. s
previous state. max_a' q[s', a']
value of best action current state.
Comments
Post a Comment