- Basically uses eligibility traces to find policies in POMDPs; previous methods mainly used memory to estimate state, which gets computationally expensive
- Results are relevant to: “POMDPs that have good memoryless policies, i.e., on problems in which there may well be very poor observability but there also exists a mapping from the agent’s immediate observations to actions that yield near-optimal return.”
- Paper is almost entirely empirical results
Advertisements