## Hugo Customization

##### April 21, 2021

Some notes from the process of customizing my Hugo site.

...Some notes from the process of customizing my Hugo site.

...A quick guide to setting up Docker to leverage NVIDIA CUDA GPUs on Ubuntu Linux. (Tested with Ubuntu 20.04 and CUDA 11.2).

...Our paper on “Chutes and Ladders” appears in at the “prestigious” SIGBOVIK “conference”.

...Metaclasses are terrific, in the sense that they’re a powerful tool for programming, but also in that they should inspire a bit of terror. In this post, I talk about an example from my own work that fits both criteria.

...This post sat in my “drafts” folder for many moons, mostly done, so I finished it and then threw it online. There are probably lacunae where I meant to explain something, didn’t leave a note in the draft, and then failed to notice. However, you can now email me to tell me about it– (nevermind, I can’t access that account anymore) –for the handful of readers, dozens of webcrawlers, and hundreds of automated vulnerability scanners. ...

The paper on max entropy RL (Reinforcement Learning with Deep Energy-Based Policies) is pretty cool, but some of the proofs are lacking explanation. This is papered over by the standard “it is easy to see…” evasions that mathematicians take such delight in. Usually, space restrictions are the culprit, but I wish authors would either provide more detail or at least a specific reference (including the page number!) for where I can learn more. ...

(Just some quick thoughts before I return to typesetting my thesis… I will return to this to flesh things out once I’ve gotten a bit more sleep and organized my thoughts) I am writing my thesis on estimating the variance of the return in Markov Decision Processes using online incremental algorithms, which turns out to a surprisingly complex problem. Having an estimate of the variance is generally agreed to be a good thing, according to a random sampling (N=2) of statisticians I interviewed when writing this blog post. ...

\def\Pr#1{\mathbb{P} \left( #1 \right)} (In progress, just some quick notes until I’m done my thesis) Expressing Entropy as a Return # It’s a fun exercise Learning the limiting entropy of a state online may be useful There is probably some interesting analysis that could be done So, let’s look at discrete Markov Processes and show that there’s a recursive Bellman-like equation for entropy. Note that this is different from the entropy rate Quick rundown of relevant MDP basics # For a Markov chain, we have that the probability of seeing a particular sequence of states, say(s_0, s_1, s_2, \ldots, s_n), is: ...

This bit of light entertainment emerged out of a paper I worked on estimating variance via the TD-error. We have previously looked at how the discounted sum of TD-errors is just another way of expressing the bias of our value function. We’ll quickly recapitulate that proof, but extend it for the λ-return and general value functions because I’m a sadist1 \newcommand{\eqdef}{\stackrel{\text{def}}{=}} \newcommand{\eqmaybe}{\stackrel{\text{?}}{=}} The δ-return # We first note the definitions of the TD-error (for time-stept, we denote it as\delta_{t}) and the λ-return,G_{t}^{\lambda}. ...

One of the fundamental ideas in reinforcement learning is the temporal difference (TD) error, which is the difference between the value of the current state and the reward received plus the discounted value of the next state. That may sound abstract, but effectively it’s what you expected minus what you actually got and what you’re expecting next. Okay, that still sounds incomprehensible to anyone who’s not already familiar with RL, so instead here’s an equation. ...