- As the title suggests, discusses setting up an NN that optimizes for predictability and slowness
- In addition to normal criteria for learned representation (allows rep of value function, allows original representation to be reproduced, allows for Markov property/prediction) they add a couple of other criteria:
- Slowness
- Allows for transfer
- Consider fully observable tasks
- Also requires that representation is diverse (this exists in SFA to prevent trivial constant output)
- Formally, the cost function has 3 terms added together:
- Slowness (representation changes only a small amount between two subsequent steps)
- Diversity (states farther in time should be mapped farther apart)
- Transition function (must allow for accurate prediction of next state, given state, action)
- The representation learned according to this criteria is then used to learn the Q function
- The domain this is tested in is a racetrack domain -the rep is a graphical 10×10 overhead greyscale <doesnt seem so hard, but this is just a workshop paper>
Advertisements