Past-future Information Bottleneck in Dynamical Systems. Creutzig, Globerson, Tishby. Physical Review 2009.

“Biological sensory systems need to encode the most relevant incoming information and transmit this information successfully under noisy conditions in real time. Extracting the most predictive information from incoming temporal signals is crucial for two reasons. First, predictive adaptive coding is a natural implementation of redundancy reduction of data, thus making efficient use of scarce resources and therefore called efficient coding. Second, it is only information about the future that can be behaviorally relevant.“

This is examined from an information-theoretic perspective

Compressing a signal is called source coding. Getting it transmitted over a noisy channel and decoded is channel coding. Finally, doing both is called source-channel coding.

“We are here interested in joint lossy source-channel coding in time.”

It can also be thought of from the learning theory perspective to develop a learning system of specified complexity

The goal is to isolate components of the data stream that are most predictive of future data

Results for IB based on case where variables are jointly Gaussian

“Our results show that as the trade-off parameter increases, the compressed state space goes through a series of structural phase transitions, gradually increasing its dimension. Thus, for example, to obtain little information about the future, it turns out one can use a one-dimensional scalar state space. As more information is required about the future, the dimension of the required state space increases up to its maximumn. The structure and location of the phase transitions are related to the eigenvalues of the so-called Hankel matrix.“

<There is a lot of math in this paper, and the paper is short. Not reading the math carefully so this whole summary probably won’t be long.>

There is a β in the Lagrangian in IB that controls accuracy vs compression. In the setting here, the resulting curve as β changes is called the information curve.

Assume dynamics are linear

They have they approach applied to a spring-mass system as an example

<Seems like in order to be able to use the approach it has to be fully linear, and all coefficients must be known. Nice math, but doesn’t seem very generally applicable.>