Hidden Mode MDPs (HM-MDPs) are a model of nonstationary model where the environment dynamics change over time according to a MDP

Special case of POMDP

Although general POMDP algorithms can be used, more effective algorithms which leverage special cases of HM-MDPs can be more effective in these domains

Proposed algorithm works by decomposing Q() into a number of components, simplifying the operation, allowing for a special form of VI

In HM-MDPs, environmental distributions are restricted to a fixed number of unobservable modes; each mode specifies an MDP.

States, however, are fully observable

Proposes keeping track of belief state of mode

As most POMP literature, seems this algorithm assumes complete knowledge of the environmental distributions

Says VI can’t be computed explicitly because it needs to compute all infinitely many belief states. Offhand, I can’t figure out why you can’t just do VI for each model and then blend those values when you are in a belief state over models?

Their VI algorithm looks like it enumerates all belief states that are encountered during the course of computing VI? This would be finite, but also very large.

Guess they could do this because they know initial distribution over modes

They then give a couple of methods of doing these computations more cheaply

Compares their performance vs standard POMDP planner in terms of fixed computation time