- Discusses deterministic as well as probabilistic SFA <I don’t know anything about the latter>
- Derive a version of deterministic SFA “… that is able to identify linear projections that extract the common slowest varying features of two or more sequences.”
- Also, an EM method for inference in probabilistic SFA and extend it to handle multiple data sequences
- This EM-SFA algorithm “… can be combined with dynamic time warping techniques for robust sequence time-alignment.”
- Used on face videos
- <A number of good references in here – a few I still haven’t read>
- Seems like the probabalistic version of SFA is based on similarity to an ML estimate of a linear generative model with “… a Gaussian linear dynamical system with diagonal covariances over the latent space.” <Need to look into that more>
- Their version of probabilistic SFA is novel because it is based on EM, as opposed to other ML approaches. This allows for modeling latent distributions that aren’t restricted by diagonal covariance matrices.
- <There is a hack in the math in the standard ML probabilistic SFA that maps variance to 0, which “… essentially reduces the model to a deterministic one…”

- Call facial expressions, action units (AUs), this is the first unsupervised method that does so based on temporal data; other unsupervised approaches are based on “…global structures…” <I assume that means global properties of still frames>
- <Not taking notes on particulars of math for probabilistic SFA, both ML and EM versions>
- When looking at more than one data sequence, SFA for this setting is designed to find the common slowly varying features
- <Naturally> assumption is that there is a common latent space between multiple-sequence SFA
- Math for both deterministic and probabilistic multi-sequence SFA

- <The time warping part / aligning sequences of different length is probably not relevant>
- The video data they work from is a public database of 400 annotated videos, each with a constrained structure of neutral, onset, apex, and offset.
- Not so clearly described, but seems like they dont operate off raw video data; they work off of 68 facial points tracked in 2D
- Compare performance of EM-SFA, SFA, and traditional linear dynamic systems.
- Seems like each is run on a subset of facial data, either mouth, eyes, or brows

- Use slowest feature of SFA as a classifier for when the expression is being performed (when its value goes from negative to positive, and back to negative)
- EM-SFA outperforms SFA ad LDS
- <Did a quick read, but I don’t see where they have experimental results of multi-sequence. If it is indeed missing that is strange because they do have the math and the data already to test it.>
- The temporal alignment algorithm is used to match up videos from the same category so that the neutral, apex, etc… frames are synchronized

Advertisements