Q learning computation: states unknown -
i confused how implement simple q_learning algorithm. referring nice docummentation: http://artint.info/html/artint_265.html.
the given formula is
q[s,a] ←q[s,a] + α(r+ γmaxa' q[s',a'] - q[s,a]) the problem states unknown because trying learn flappybird's successful moves. q[s,a] need know value of q[s',a'] if don't know next state, how q function? assuming state described distance between bird , nearest pipe, how compute current q function?
thank help!
s' current state. s previous state. max_a' q[s', a'] value of best action current state.
Comments
Post a Comment