Solving Hidden-Mode MDPs. Choi, Zhang, Yeung

  • Hidden Mode MDPs (HM-MDPs) are a model of nonstationary model where the environment dynamics change over time according to a MDP
    • Special case of POMDP
  • Although general POMDP algorithms can be used, more effective algorithms which leverage special cases of HM-MDPs can be more effective in these domains
  • Proposed algorithm works by decomposing Q() into a number of components, simplifying the operation, allowing for a special form of VI
  • In HM-MDPs, environmental distributions are restricted to a fixed number of unobservable modes; each mode specifies an MDP.
    • States, however, are fully observable
  • Proposes keeping track of belief state of mode
  • As most POMP literature, seems this algorithm assumes complete knowledge of the environmental distributions
  • Says VI can’t be computed explicitly because it needs to compute all infinitely many belief states.  Offhand, I can’t figure out why you can’t just do VI for each model and then blend those values when you are in a belief state over models?
  • Their VI algorithm looks like it enumerates all belief states that are encountered during the course of computing VI?  This would be finite, but also very large.
    • Guess they could do this because they know initial distribution over modes
  • They then give a couple of methods of doing these computations more cheaply
  • Compares their performance vs standard POMDP planner in terms of fixed computation time

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: