Sparse modeling for high-dimensional data. Bin Yu. ITA 2011 Tutorial Video

https://www.youtube.com/watch?v=EflnBGwkvM4

Structure:
1. V1 through fMRI
2. Occam’s Razor and Lasso
3. Unified theory: M-esitmation with decomposable regularization
4. Learning about V1 through sparse models
5. World-imaging through sparse modeling and human experiment
fMRI – can we decode what a person was looking at by examining fMRI signals?
1. This was done previously in a classification version of the task where there was a set of images <100?> and the goal was to figure out which of them were being looked at
She did a database of 10k images
From fMRI data, get feature vector that is 10921D
Goal: in order to help understanding of V1, need to develop a sparse model that is performs accurate prediction
1. Minimizing L2 loss leads to both ill-posed computational problem, and poor prediction
Worked with a lab that in 2006 tried neural nets, SVMs, and then settled on Lasso
1. Had consistency problems with NNs, lasso more stable
Fisher in the 1920s promoted ML methods (turns Bayes posterior with uniform prior into something likelihood based)
1. BUT Max likelihood with least squares leads to the largest model because least squares is a projection that leaves a large subspace (bigger space, smaller mean squared error), leads to problem of poor prediction power
<Under assumptions?> there are 2^D possible models <size of hypothesis space>. Finding the best is intractable, but on the other hand, due to noise and tiny size of data w/respect to 2^D it doesn’t make sense to do exhaustive search anyway (as it will not give you the right answer). Therefore, we have a good reason to use tractable, but suboptimal methods
Akaike information criterion is L-zero penalty/regularization (from the 70s)
1. Schwartz came up with Bayes Information Criterion which is also L0
L1/Lasso was introduced by Chen, Donoho in the 90s.
Properties of Lasso:
1. Sparsity and regularization
2. Convex relaxation of L0 penalty
Lasso happens to use L2 loss, but you can use anything in general of course. Also Lasso happens to do L1 regularization, but can use others for that as well (L2 is ridge).
They propose a nonlinear system that works better for the fMRI problem than the linear one (although they are in the same ballpark)
<Ok, bailing>

Ari Weinstein's Research

Sparse modeling for high-dimensional data. Bin Yu. ITA 2011 Tutorial Video

Leave a comment

Ari Weinstein's Research

Sparse modeling for high-dimensional data. Bin Yu. ITA 2011 Tutorial Video

Share this:

Related

Leave a comment