Variance Reduction

Can a reinforcement learning (RL) agent remember and learn from the past, just like a human? Short answer: “yes but only after selecting relevant and valuable experiences from the memory.” I am glad to announce my new paper and its open-source project, both built on my research about “variance reduction based experience replay” (VRER). Long story short, VRER is a generic experience replay method with provable sample efficiency. VRER makes the reinforcementlearning agent remember by selectively replaying past experiences. This selective mechanism can adaptively filter out samples that are outdated, irrelevant and unstable. Our empirical study shows that VRER substantially improves the state-of-the-art policy optimization algorithms, such as trust region policy optimization and proximal policy optimization, in both convergence speed and robustness.