I find it inspiring how Mathematicians distill down aspects of real life in to an applicable and usually generalizable understanding. A problem that always fascinated me was learning or the acquisition of new knowledge by some means, and whether it was possible to mathematically represent that process. As it turns out for certain kinds of problems it is possible, given you represent interactions in context of an environment that gives feedback on performance. The field of enforcement learning seeks to find solution methods for such problems and provides a mathematical framework for understanding probabilistic interplay between a decision-making agent and a reward giving environment, and is heavily influenced by biology and cognitive science. In this report the trivial problem of an agent learning to navigate a maze is explored under the context of reinforcement learning. In particular, I wanted to explore how predicting its own future performance can help it learn faster and what timescales of predictions give the best results. Nexting is a psychological concept which seeks to describe the phenomenon of people making large numbers of small-scale predictions about their immediate future. For example when walking a person unconsciously is aware of where their foot is going to land and and as a result keeps walking. In a sense, Nexting constitutes a basic and intrinsic understanding of an environment, something that an intelligent agent should emulate. This project attempts to take Nexting to the extreme and predict massive amounts of sensory input over multiple timescales which, in a sense, asks the question if I make this good move now, will my available moves in the future and my overall performance be good? To test this idea in practice a simulation of fish learning to navigate a maze was created in Unity, whilst being a challenging environment to work in, does allow simulations to be easily communicated and understood. While the results did tend to be positive there is still a lot of work to be done in improving the overall feature representation and potentially moving to a real-world simulation.
University of Western Australia