
Il mondo è pieno d'infinite
ragioni che non furono mai in isperienza.
In broad terms, this dissertation is about decision making under uncertainty. At each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world (or agent) to change. The agent's goal is to maximize the total reward it derives over its entire duration of operation---an interval that may require the agent to strike a delicate balance between two sometimes conflicting impulses: (1) greedy exploitation of its current world model, and (2) exploration of its world to gain information that can refine the world model and improve the agent's policy.
Over the years, a number of researchers have formulated this problem mathematically--- ``adaptive control processes,'' ``dual control,'' ``value of information'', and ``optimal learning'' all address essentially the same issue and share a basic Bayesian framework that is well-suited for modeling the role of information and for defining what a solution is. However, classical procedures for computing policies that optimally balance exploitation with exploration are intractable and have only been able to address problems that have a very small number of physical states and short planning horizons.
This dissertation proposes computational procedures that retain the Bayesian formulation, but sidestep intractability by employing Monte-Carlo simulation, function approximation, and diffusion modeling of information-state dynamics.
duff@gatsby.ucl.ac.uk