- “A frequent assumption in behavioural science is that most of an animal’s activities can be described in terms of a small set of stereotyped motifs.”
- Create a system to analyze behavior and find that about half the time, behavior is based on 100 different stereotyped behavioral states
- Stereotypy – “that an organism’s be- haviours can be decomposed into discrete, reproducible elements”
- Although animals can move in a really enormous space of movements, they are thought to keep motion in a small set of motion, made up of stereotyped actions (may be specific to time range, individual throughout life, or a species)
- “A discrete behavioural repertoire can potentially arise via a number of mechanisms, including mechanical limits of gait control, habit formation, and selective pressure to generate robust or optimal actions.”
- For the most part, stereotypy hasn’t been studied experimentally, mostly because of a “lack of a comprehensive and compelling mathematical frame- work for behavioural analysis”
- They introduce a system for doing so

- Most previous methods for characterizing behavior fall into one of two categories:
- Very coarse metrics (like mean velocity, count # times a barrier is crossed). This makes analysis and data collection easy, but only captures a tiny amount of relevant information
- The other approach is to log behavior in terms of a number of different possible categories. This can be done by hand or by machine. The problem with this is it introduces bias (you find only what classes you defined in the first place, and assumes small number of discrete high-level behaviors)

- A better system would be one that works from the bottom up, starting directly with the data as opposed to operator-defined classes
- “The basis of our approach is to view behaviour as a trajectory through a high-dimensional space of postural dynamics. In this space, discrete behaviours correspond to epochs in which the trajectory exhibits pauses, corresponding to a temporally-extended bout of a particular set of motions. Epochs that pause near particular, repeatable positions represent stereotyped behaviours. Moreover, moments in time in which the trajectory is not stationary, but instead moves rapidly, correspond to non-stereotyped actions.”
- Specifically, based on the data found from fruit flies
- “These stereotyped behaviours manifest themselves as distinguishable peaks in the behavioural space and correspond to recognizably distinct behaviours such as walking, running, head grooming, wing grooming, etc.”

- The fly is isolated into a 200×200 window in each frame (full image has 40k pixels), and then rotated/translated/resized to get standard representation
- Nearly all variance (93%) in images can be represented by PCA down to 50D
- Use a spectrogram representation of postural dynamics based on Morlet continuous wavelet transform <?>
- “Although similar to a Fourier spectrogram, wavelets possess a multi-resolution time- frequency trade-off, allowing for a more complete description of postural dynamics occurring at several time scales”

- Embedding is made up of 25 frequency channels for the 50 eigenmodes, so each point in time is represented by 1,250D
- Because behavior is highly correlated, they speculate a much lower dimensional manifold lies inside this space that describes behavior

- The goal is to map the 1250D data to something much smaller where trajectories in that space pause when stereotyped behavior occurs. “This means that our embedding should minimise any local distortions.”<I don’t know why>
- So they approach they chose “reduces dimensionality by altering the distances between more distant points on the manifold.”
- PCA, MDS, Isomap to exactly the opposite of this, prioritizing large-scale accuracy over local

- “t-Distributed Stochas- tic Neighbor Embedding (t-SNE)” however, does satisfy their requirement
- “For t-SNE, the conserved invariants are related to the Markov transition probabilities if a random walk is performed on the data set.” Transition probabilities between two time points are based on a Gaussian kernel over distance
- Minimizes “local distortions” <?>

- With t-SNE, transition probabilities are similar to larger-space transition probabilities, but are proportional to Cauchy/Student-t kernel of points’ Euclidian distances
- Problem with t-SNE is quadratic memory use – they use importance sampling to subsample to 35k data points and then run on that
- A distance function is still needed. They use KL-divergence
- They are able to embed data nicely in 2D – going to 3D leads to a 2% reduction of embedding cost function
- Get a probability density over behavior by convolving each point in embedded map with Gaussian
- Space has peaks, and trajectories pause at peak locations when conducting stereotyped behavior

- This 2D space is then decomposed by a “watershed transform”, where points are grouped together if hill-climbing from them leads to the same (local) maximum
- Peaks correspond to commonly defined behaviors, but here, everything is bottom-up from the data. Also, nearby regions encode similar but distinct behavior
- As expected, many behaviors (like running) are confined to an orbit in the low-dimensional space
- Were able to pull apart distinguishing characteristics between male and female behavior