generalized returns

generalized returns

Estimating Fractional Returns in Markov Decision Processes

April 28, 2017

reinforcement learning, generalized returns, temporal difference learning

(Just some quick thoughts before I return to typesetting my thesis… I will return to this to flesh things out once I’ve gotten a bit more sleep and organized my thoughts) I am writing my thesis on estimating the variance of the return in Markov Decision Processes using online incremental algorithms, which turns out to a surprisingly complex problem. Having an estimate of the variance is generally agreed to be a good thing, according to a random sampling (N=2) of statisticians I interviewed when writing this blog post. ...

Limiting Entropy as a Return

March 23, 2017

generalized returns, reinforcement learning, temporal difference learning, math

\def\Pr#1{\mathbb{P} \left( #1 \right)} (In progress, just some quick notes until I’m done my thesis) Expressing Entropy as a Return # It’s a fun exercise Learning the limiting entropy of a state online may be useful There is probably some interesting analysis that could be done So, let’s look at discrete Markov Processes and show that there’s a recursive Bellman-like equation for entropy. Note that this is different from the entropy rate Quick rundown of relevant MDP basics # For a Markov chain, we have that the probability of seeing a particular sequence of states, say(s_0, s_1, s_2, \ldots, s_n), is: ...