RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data. Witteveen, Duivesteijn, Knobbe, Grunwald. Advances in Intelligent Data Analysis 2014

Ah its from a symposium

An implementation exists here https://github.com/joukewitteveen/hint

  1. Idea behind minimum description length (MDL) principle is that it is possible to do induction by compression.
  2. Here they take a popular MDL algorithm, KRIMP, and extend it to real valued data
  3. “Krimp seeks frequent itemsets: attributes that co-occur unusually often in the dataset. Krimp employs a mining scheme to heuristically find itemsets that compress the data well, gauged by a decoding function based on the Minimum Description Length Principle.”
  4. RealKRIMP “…finds interesting hyperintervals in real-valued datasets.”
  5. “The Minimum Description Length (MDL) principle [2,3] can be seen as the more practical cousin of Kolmogorov complexity [4]. The main insight is that patterns in a dataset can be used to compress that dataset, and that this idea can be used to infer which patterns are particularly relevant in a dataset by gauging how well they compress: the authors of [1] summarize it by the slogan Induction by Compression. Many data mining problems can be practically solved by compression.”
  6. “An important piece of mathematical background for the application of MDL in data mining, which is relevant for both Krimp and RealKrimp, is the Kraft Inequality, relating code lengths and probabilities”  They extend the Kraft Inequality to continuous spaces
  7. <Ok skipping most – interesting but tight on time.>

Predictive Projections. Sprague. IJCAI 09.

  1. algorithm for linear “projections in which accurate predictions of future states can be made using simple nearest neighbor style learning.”
  2. Experiments on simulated pendulum and real roomba robot
  3. Considers video data
  4. Based on earlier work (by Tesauro among others) in “which nearest neighbor learning is recast in a probabilistic framework that allows for gradient based optimization of the distance metric.  These existing algorithms discover projections of the training data under which nearby points are likely to have the same class label or similar regression targets.”
    1. Neighborhood Components Analysis (NCA) – for nearest neighbor classification
  5. “It would be difficult to directly find an A that minimizes the k-nearest-neighbor classification error rate, because the number of errors will be a highly discontinuous function of A; very small changes in A may change the set of nearest neighbors for some points. The innovation behind the NCA algorithm is to recast nearest neighbor learning in a probabilistic framework. In this framework, expected error is a continuous, differentiable function of A and thus may be minimized using gradient based techniques.”
  6. NCA was originally designed for classification but was later extended to regression
  7. A minor tweak is needed to get NCA for regression to work well on predicting vectors when the problem is stochastic
  8. Here they use conjugate gradient to do the optimization
  9. The approach can scale well; a paper from ’07 was able to run on 60k training points
  10. “Following this gradient acts to reduce the objective in two different ways. It adjusts A to be more predictive by increasing the probability that a neighbor will be chosen if it successfully predicts the next state. It also adjusts A to be more predictable, by moving target states together whenever there is a high probability they will be chosen to predict each other.”
  11. Important to whiten data first
  12. To deal with actions, partition data by action but constrain to use same projection <how is this different than just throwing out the action data>
  13. Only parameters to the algorithm are size of projection matrix A and its initial values
  14. “The predictive projections algorithm as described above may not perform well in cases where the effects of different actions are restricted to specific state dimensions. Since there is no explicit penalty for failing to predict some dimensions, the algorithm may minimize the objective function by finding an A which is not full rank, thus accurately predicting some dimensions while discarding others.”
    1. Although there is a potential fixes for this
  15. For the applications this method is coupled with LSPI (along with RBFs)
  16. Algorithm deals well with Lagoudakis+Parrs’ LSPI
  17. For the roomba experiment they put a camera on top – all the robot needs to do is to continue to move forwards without hitting the wall
    1. <Seems like not a terribly hard task just turn when the wall takes up most of the view?>
  18. temp
  19. Image is 20×20: 1200-d once color is factored in
    1. Then this is reduced via PCA to 25d
    2. Then their approach takes that to 2d (A is initialized to the first 2 principal components)
  20. Their approach basically recovers translation and rotation, while PCA recovers amount of wall present and lighting (their approach can use this to make a good policy, but that can’t be done linearly from what PCA produces)
  21. “To our knowledge, the proto-value function framework has not been applied to the type of noisy, high dimensional control problems addressed in this paper. It seems likely that the neighborhood calculations required for constructing the diffusion model could be dominated by noise dimensions, particularly in very noisy tasks such as the modified pendulum domain described above. In that case, the PVF approach and
    predictive projections would be complementary: The PP algorithm could find a low dimensional state projection that contains relevant state information, and the PVF algorithm could then be used to discover a set of appropriate basis functions in that space”
  22. “Another closely related project is the basis iteration algorithm described in [Sprague, 2007]. This algorithm also uses gradient based metric learning to discover an appropriate projection, but it focuses directly on finding a metric that allows for accurate estimation of the optimal value function. It accomplishes this by iterating value function estimation with updates to the projection matrix. This algorithm has the advantage of incorporating reward information, but it depends on starting with an initial projection that enables a reasonableestimate of the optimal value function.”

Event perception. Radvansky, Zacks. WIREs 2010.

  1. “…we believe that events are fundamental to human experience. They seem to be the elements that constitute the stream of experience, the things that are remembered or forgotten in  autobiographical memory, and the components of our
    plans for future action.”
  2. “… the world can be organized in a variety a ways, but only some of these are recognized by people. As such, the important elements and structure of an event are often, in some way, imposed by a person. There is presumably a reasonably high degree of uniformity across people in the way that they conceive of events. Thus, the components and structure of an event are not deterministically derived from the world itself.”
  3. The model they discuss (introduced by someone else) consists of binary relations, properties of individuals, as well as situational states that describe something about the environment in which the event is taking place.  Spatiotemporal locations are also important different types of events are either constrained to a spatiotemporal location, or span them.
    1. Distinguish between real and abstract events
  4. Mental models for systems (general – how things work – physical or abstract) and events (specific – what happened, broken in to situation and experience models)
  5. <Spring cleaning – didn’t finish>

State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction. Jonschowski, Brock. RSS 20Re10Q

related to https://aresearch.wordpress.com/2015/04/18/learning-task-specific-state-representations-by-maximizing-slowness-and-predictability-jonschkowski-brock-international-workshop-on-evolutionary-and-reinforcement-learning-for-autonomous-robot-s/

  1. Uses the fact that robots interact with the physical world to set constraints on how state representations are learnt
  2. Test on simulated slot car and simulated navigation task, with distractors
  3. How to extract a low dimensional representation relevant to the task being undertaken from high dimensional sensor data?
  4. The visual input in the experiments is 300-D
  5. From the perspective of RL
  6. “According to Bengio et al. [1], the key to successful representation learning is the incorporation of “many general priors about the world around us.” They proposed a list of generic priors for artificial intelligence and argue that refining this list and incorporating it into a method for representation learning will bring us closer to artificial intelligence.”
  7. “State representation learning is an instance of representation learning for interactive problems with the goal to find a mapping from observations to states that allows choosing the right actions. Note that this problem is more difficult than the standard dimensionality reduction problem, addressed by multi-dimensional scaling [14] and other methods [23, 29, 6] because they require knowledge of distances or neighborhood relationships between data samples in state space. The robot, on the other hand, does not know about semantic similarity of sensory input beforehand. In order to know which observations correspond to similar situations with respect to the task, it has to solve the reinforcement learning problem (see Section III), which it cannot solve without a suitable state representation.”
  8. State representation learning can be done by:
    1. Deep autoencoders
    2. SFA (and its similarity to proto-value functions)
    3. Predictability / Predictive actions.  Points to a Michael Bowling paper <I haven’t read – will check out>
  9. Bunch of nice references
  10. The Robotic priors they care about (these are all defined mathematically later):
    1. Simplicity: For any task, only a small number of properties matter
    2. Temporal coherence: important properties change gradually over time
    3. Proportionality: Amount of change in important properties is proportional to action magnitude
    4. Causality: important properties and actions determine the reward
    5. Repeatability: Same as causality but in terms of transition not reward
  11. These properties hold for robotics and physical systems but aren’t necessarily appropriate to all domains
    1. Even in robotics these sometimes don’t hold (for example, a robot running into a wall will have an abrupt change in velocity; once pushed against the wall proportionality doesn’t hold because no amount of pushing will allow the robot to move
  12. They set learning up as an optimization problem that includes loss terms for each of the priors above (just a linear combination)
  13. <these priors aren’t directly applicable to the mocap work I am doing now because they involve actions which we dont have access to>
  14. Formally they would need to compare all samples, leading to n^2 loss, but they restrict this to a window
  15. Linear mapping from observations to states
  16. Epsilon greedy exploration, although there is a bias to repeat the previous action
  17. Have distractors in their simulations
  18. In the navigation task they demonstrate an invariance to perspective by using either an overhead or first-person view from the robot
    1. Representations learned are highly similar
  19. Learned in the course of 5,000 observations
  20. The observations in the experiment are 300 pixels or 100 (not 300×300)
  21. For the simulated slot car task, the state sample matrix has rank 4.  Two large eigenvalues correspond to the position of the controlled car, and two smaller eigenvalues correspond to the position of the distractor car
    1. Ideally the distractor shouldn’t show up at all, but because of things like stochasticity and limited samples, weight can be placed on it to explain events that it is not related to
  22. They then to an RL comparison based on a number of different methods for learning the representation (5 features extracted by)
    1. Their approach
    2. SFA
    3. PCA
    4. Raw 300D representation
    5. (They also compare to ground truth representation)
  23. Use neural fitted Q
  24. screenshot
  25. SFA features are really terrible, their approach is pretty similar to operating from the ground truth
  26. With further investigation on the same results they demonstrate that their method has very good generalization properties
  27. Conjecture that the primary difference between their approach and the other dimension reduction methods are that they didn’t learn to disregard the distractors

Learning Task-Specific State Representations by Maximizing Slowness and Predictability. Jonschkowski, Brock. International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems (ERLARS) 2013

Referenced from here: https://aresearch.wordpress.com/2015/04/17/autonomous-learning-of-state-representations-for-control-an-emerging-field-aims-to-autonomously-learn-state-representations-for-reinforcement-learning-agents-from-their-real-world-sensor-observations/

  1. As the title suggests, discusses setting up an NN that optimizes for predictability and slowness
  2. In addition to normal criteria for learned representation (allows rep of value function, allows original representation to be reproduced, allows for Markov property/prediction) they add a couple of other criteria:
    1.  Slowness
    2. Allows for transfer
  3. Consider fully observable tasks
  4. Also requires that representation is diverse (this exists in SFA to prevent trivial constant output)
  5. Formally, the cost function has 3 terms added together:
    1. Slowness (representation changes only a small amount between two subsequent steps)
    2. Diversity (states farther in time should be mapped farther apart)
    3. Transition function (must allow for accurate prediction of next state, given state, action)
  6. The representation learned according to this criteria is then used to learn the Q function
  7. The domain this is tested in is a racetrack domain -the rep is a graphical 10×10 overhead greyscale <doesnt seem so hard, but this is just a workshop paper>

Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations. Bohmer, Springenberg, Beodecker, Riedmiller, Obermayer. Künstliche Intelligenz 2015.

  1. Trying to do robotics control directly from high-dimensional sensor readings.  The trick is that with robots experience is costly so you need to use data very efficiently <the deepmind guys were able to run each game for the equivalent of over a month because its just in simulation>
  2. Review 2 methods for extracting features: deep auto-encoders and slow feature analysis
  3. Mention deep learning’s deep Q network
  4. Until DQN, using FAs has been largely unsuccessful, and has generally worked only on toy domains <although Tesauro’s TD Gammon is a significant exception>
  5. “In the last years several researchers have also considered end-to-end learning of behavioral policies p, represented by general function approximators. First attempts towards this include an actor-critic formulation termed NFQ-CA [14], in which both the policy p and the Q-function are represented using neural networks and learning proceeds by back-propagating the error from the Q-network directly into the policy network—which was, however, only applied to low dimensional control problems.”
  6. Also cites deterministic policy gradient https://aresearch.wordpress.com/2014/06/09/deterministic-policy-gradient-algorithms-silver-lever-heess-degris-wierstra-riedmiller-icml-2014/, which worked on a 30-DOF arm
  7.  Cites deep fitted Q (same lab, papers from a couple of years prior that also was used on slot cars and an inverted pendulum from raw camera images
  8. The first part of deep fitted Q is unsupervised, setup as an autoencoder
  9. In order for the RL to work, it is important that the problem has low “intrinsic dimensionality” (the manifold it lies on)
    1. For example, the image a camera gets from a robotic inverted pendulum can be arbitrarily large, but the underlying dimension of the problem is just 2
  10. They do not assume a pomdp; assume that the high-dimensional input completely describes the problem <although this is almost certainly not actually the case with at least the slot car, but that is not a ding on their work it just makes it more impressive>
  11. The representation chosen for the state depends on an optimization problem
  12. Train an autoencoder to minimize MSE
  13. DFQ is a clustering-based approach<?>
    1. There are some problems with DFQ that they will discuss later, but:
    2. Representation learned by DFQ isn’t reward-based, so representation doesn’t care about that
    3. “learning auto-encoders for inputs with high variability (i.e. many objects of relevance) can be hard” Although says regularization can help this, and cites a number of other papers that deal with this
  14. “Despite these drawbacks DFQ also comes with advantages over end-to-end learning: since the auto-encoder merely learns to reconstruct sampled observations and it can be fed with samples generated by any sampling policy for any task, is thus less susceptible to non-stationarity of the training data.”
  15. Mentions things like proto-value functions, laplacian eigenmaps etc for discrete RL, relationship between SFA and LEM and PVF
  16. “From a theoretical point of view, SFA and PVF both approximate subspace-invariant features. This classification has its origin in the the analysis of approximation errors in linear RL [39]. Here subspace-invariant features induce no errors when the future reward is propagated back in time. It can be shown that under these conditions the least-squares temporal difference algorithm (LSTD, [9]) is equivalent to supervised least-squares regression of the true value function [5]. However, this is only possible for the class of RL-tasks with a self-adjoint transition model. As this class is very rare, both SFA and PVF substitute a self-adjoint approximation of the transition model to compute almost subspace-invariant representations.4 An analysis of the optimal solution5 shows that SFA approximates eigenfunctions of the symmetrized transition operator [50]. Moreover, with a Gaussian prior for the reward, one can show that SFA representations minimize a bound on the expected LSTD error of all tasks in the same environment [5]. However, as the solution depends on the sampling distribution, straight forward application for transfer learning is less obvious than in the case of PVF. Future works may rectify this with some sensible importance sampling, though.”
  17. When the data completely covers the space, PVF and SFA are pretty equivalent, but when data is generated from a random walk, PVF fails and SFA still manages to do almost as well as when there is perfect coverage
  18. In less artificial settings/domains there is less of a performance gap between PFV and SFA but it still exists, SFA seems to do a better job generating features for LSTD and LSPI
  19. In these problems, the representation learned must:
    1. Maintain Markov property (no partial observability)
    2. Be able to represent the value function well enough to bootstrap from it and improve the policy
    3. Generalize well (especially to cases that may not be well covered by the training distribution)
    4. Low dimensional
  20. “In stochastic environments one can only compare probability distributions over future states based on a random policy, called diffusion distances. It can be shown that SFA approximates eigenfunctions of the symmetrized transition operator, which encode diffusion distances [5]. SFA features are therefore a good representation for nonlinear RL algorithms as well.”
    1. <Depends how you define “stochastic environments” I assume they mean environments where there is no choice in action selection at all because otherwise what is written is incorrect>
  21. “In summary, SFA representations X seem in principle the better choice for both linear and non-linear RL: nonlinear SFA extracts eigenfunctions of the transition model Pp, which are the same for every isomorphic observation space Z, encode a diffusion metric that generalizes to states with similar futures and approximates a Fourier basis of the (unknown) underlying state space.”
  22. Empirical work shows SFA works best when exposed to data that gets uniform coverage
  23. “a Fourier basis as approximated by SFA grows exponential in the underlying state dimensionality. Linear algorithms, which depend on this basis to approximate the value function, are therefore restricted to low dimensional problems with few or no variables unrelated to the task. Non-linear RL algorithms, on the other hand, could work in principle well with only the first few SFA features of each state-dimension/variable. e. The order in which these variables are encoded as SFA features, however, depends on the slowness of that variable. This can in practice lead to absurd effects. Take our example of a wheeled robot, living in a naturally lit room. The underlying state space that the robot can control is three-dimensional, but the image will also depend on illumination, that is, the position of the sun.”
    1. Describes this as being vulnerable to slow distractors
  24. “Also, auto-encoders minimize the squared error over all input dimensions of Z equally. This can produce incomplete representations if a robot, for example, combines observations from a camera with measurements from multiple joints. Due to the large number of pixels, small improvements in the reconstruction of the image can outweigh large improvements in the reconstruction of the joint positions.”
  25. Compares results from a deep auto encoder and one with a NN that has objectives of slowness and predictability.  The latter produces a much better output <but this is really a garbage comparison because the prior is based on actual physical experiments and the latter is based on an extremely oversimplified clean simulation.>
  26. “Take the example of uncontrollable distractors like blinking lights or activity outside a window. Each distractor is an independent variable of the isomorphic state S, and to learn an isomorphic representation X requires thus samples from all possible combinations of these variables.  The required training samples grow exponentially in the number of distractors.”
    1. <The XSENS suit has tons of blinking lights on it.  I was thinking about this as a potential distractor for the methods we are working on…>
  27. Moves on to a discussion of “factored representations and symbolic RL” at the end of the paper
  28. Basically discuss object oriented RL (discuss it in terms of relational RL)

A Survey on Vision-based Human Action Recognition. Poppe. Image and Vision Computing 2010.

  1. Consider task of annotating action in video
  2. Task is a combination of feature extraction <motion trackers at least solve this>, and then classification
  3. One taxonomy for the task breaks actions down between:
    1. Action primitives: low level (such as limb) movements ex/move left leg
    2. Actions: built of primitives, may be composed of contemporaneous primitives (such as arm and leg movement), may be cyclic. ex/run
    3. Activities: a task that can be given a label. ex/jumping hurdles
    4. <These examples are from the text, and perhaps not the best example, because certainly sometimes running is the entire activity?>
  4. Here do not consider environmental context, multi-person interaction, or objects
    1. Consider only full body movement and not gesture recognition
  5. In the field of gait recognition, the goal is to differentiate people based on the way they walk; here the goal is the opposite – that is to look at multiple people doing different tasks and figure out in which cases they are the same task
    1. Although there is more recently work that tries to do both – identify activity and individuals doing the activity
  6. Pose reconstruction is akin to a regression problem, whereas activity recognition is a classification problem
  7. <Didn’t get to finish-posting for spring cleaning>

Behavioral and Neuropshysiological Correlates of Regret in Rat Decision-Making on a Neuroeconomic Task. Steiner, Redish. Nature Neuroscience 2014

  1. Deals with regret as described in the RL literature as rats make decisions (they make a choice they can’t go back on, but may have gotten something better with a different choice)
  2. “In humans, the orbitofrontal cortex is active during expressions of regret, and humans with damage to the orbitofrontal cortex do not express regret.  In rats and nonhuman primates, both the orbitofrontal cortex and the ventral striatum have been implicated in reward computations.”
  3. In situations of high regret <presumably, the reward of the other option is presented, but not accessible> “… rats looked backwards toward the lost option, cells within orbitofrontal cortex and ventral striatum represented the missed action, rats were more likely to wait for the long delay, and rats rushed through eating the food after that delay.
  4. “Dissapointment is the realization that a realized outcome is due to one’s own mistaken action “
  5. <Didn’t get to finish – posting for spring cleaning>

Rollout Algorithms for Constrained Dynamic Programming. Bertsekas. LIDS 2005

<This is a tech report?>

  1. Based on the abstract, seems similar to http://www.mit.edu/~jnt/Papers/J066-97-rollout.pdf, and his other work that deals with heuristics and rollouts
  2. “Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm also produces a feasible solution, whose cost is no worse than the cost correspoding to the base heuristic.”
  3. <Important!> “In the base heuristic is a DP policy, i.e., it generates state/control sequences via feedback functions μk, k=0,1,…,N-1, so that at any state xk, it applies the control μk(xk), then the algorithm can be viewed as a policy improvement step of the policy iteration method.  Based on the generic results for policy iteration, it then follows that the rollout algorithm’s cost <doing cost minimization in this setting> is no worse than the one of the base heuristic.  This cost improvement property has been extended in [BTW97 <Bertsekas, Tsitsiklis, Wu. Rollout algorithms for combinatorial optimiziation>] to base heuristics that are not DP policies, but rather satisfy an assumption called sequential improvement, which will be discussed shortly within the constrained context of this paper.”
  4. <Spring cleaning – Didn’t get to finish>

Angelic Hierarchical Planning: Optimal and Online Algorithms (Revised). Marthi, Russell, Wolfe. Tech Report 2009 (revision of earlier ICAPS paper)

<Not reading super carefully because of limited time>

  1. Deals with “angelic” high level actions that can be used to develop a plan and figure out if it will succeed without transforming into low-level actions
  2. Here the concept is expanded to be able to figure out if such a plan is optimal or not
    1. Algorithm based on A* <at least thats what it seems like based on the name>
  3. “Humans somehow manage to choose quite intelligently the twenty trillion primitive motor commands that constitute a life, despite the large state space. It has long been thought that hierarchical structure in behavior is essential in managing this complexity. Structure exists at many levels, ranging from small (hundred-step?) motor programs for typing characters and saying phonemes up to large (billion-step?) actions such as writing an ICAPS paper, getting a good faculty position, and so on. The key to reducing complexity is that one can choose (correctly) to write an ICAPS paper without first considering all the character sequences one might type.”
  4. “…downward refinement property: every plan that claims to achieve some condition does in fact have a primitive refinement that achieves it. This property would enable the derivation of provably correct abstract plans without refining all the way to primitive actions, providing potentially exponential speedups. This requires, however, that HLAs have clear precondition–effect semantics, … we defined an “angelic semantics” for HLAs, specifying for each HLA the set of states reachable by some refinement into a primitive action sequence. The angelic approach captures the fact that the agent will choose a refinement and can thereby choose which element of an HLA’s reachable set is actually reached. This semantics guarantees the downward refinement property and yields a sound and complete hierarchical planning algorithm that derives significant speedups from its ability to generate and commit to provably correct abstract plans.”
  5. Their previous paper was concerned only with correctness, not efficiency, so here the ability to use heuristics is added, and they want optimality as well: “the exact cost of executing a high-level action to get from state s to state s” is the least cost among all primitive refinements that reach s^’ “
  6. Getting an exact cost estimate is difficult though so they start with trying to get accurate upper and lower bounds
  7. “we derive the first algorithm capable of generating provably optimal abstract plans… We also provide a satisficing algorithm that sacrifices optimality for computational efficiency and may be more useful in practice. Preliminary experimental results show that these algorithms outperform both “flat” and our previous hierarchical approaches”
  8. Also considers online planning where there is planning is done by repeated limited lookahead
    1. Do a version of LRTA* for online hierarchical angelic planning
  9. Consider finite deterministic domains, with at least one finite-cost solution (cost minimization)
  10. A refinement of a plan is basically a way of making a plan less abstract and more concrete, although it doesn’t necessarily ground the plan out in primitives; it may simply decompose it into more high-level actions that are executed in certain conditions
  11. High-level actions are basically defined in terms of what states they are allowed to start in, and what states they may finish in (although again remember the domain is deterministic)
  12. But these high level actions also have costs associated with them, and if we are doing optimal planning these costs must also be optimal
  13. Can describe the formalism with NCSTRIPS with costs added (Nondeterministic Conditional STRIPS) <wait I thought we are in deterministic land, maybe the high level actions can have stochasticity?>
  14. Offline and online methods of planning
  15. Offline search done by branch-and-bound search
  16. <Ok skipping the rest – think I get the idea>
Follow

Get every new post delivered to your Inbox.