Q learning computation: states unknown -


i confused how implement simple q_learning algorithm. referring nice docummentation: http://artint.info/html/artint_265.html.

the given formula is

q[s,a] ←q[s,a] + α(r+ γmaxa' q[s',a'] - q[s,a]) 

the problem states unknown because trying learn flappybird's successful moves. q[s,a] need know value of q[s',a'] if don't know next state, how q function? assuming state described distance between bird , nearest pipe, how compute current q function?

thank help!

s' current state. s previous state. max_a' q[s', a'] value of best action current state.


Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -