Recent advances in Reinforcement Learning (RL) consist in building agents that can learn to solve complex tasks, such as mastering the game of Go, in relatively few trial-and-error attempts. This number of attempts is commonly compared to the dimensionality of the space of states they can be in or the space of actions they can perform in those states. The ability to efficiently solve tasks for which state/action-spaces grow exponentially with the number of parameters describing them is called “tackling with the curse of dimensionality” or “generalization” in the literature. Projective Simulation (PS) , a physics-inspired framework for the design of intelligent agents, provides through its Episodic Compositional Memory (ECM) some mechanisms to enable generalization .
On the other hand, Reflecting PS (RPS), an extension of the basic PS model, achieves a quadratic quantum speed-up of the underlying deliberation process of agents with a restricted non-generalizing form of ECMs . But there are currently no known applicable methods that achieve a similar quantum speed-up for PS agents with generalization. In this work, inspired by a solution to the curse of dimensionality first introduced in , which relies on Boltzmann machines, an energy-based recurrent neural network, and Monte Carlo Markov Chain (MCMC) methods, we first establish a direct connection between RPS and MCMC methods for Reinforcement Learning. We then explore the design of quantum algorithms to gain a quantum speed-up of the deliberation process of our newly defined RPS agents. Briegel, Hans J., and Gemma De las Cuevas. “Projective simulation for artificial intelligence.” Scientific reports 2 (2012): 400.
 Melnikov, Alexey A., et al. “Projective simulation with generalization.” Scientific reports 7.1 (2017): 14430.
 Paparo, Giuseppe D., et al. “Quantum speedup for active learning agents.” Physical Review X 4.3 (2014): 031002.
 Sallans, Brian, and Geoffrey E. Hinton. “Reinforcement learning with factored states and actions.” Journal of Machine Learning Research 5.Aug (2004): 1063-1088.