## Sliced Inverse Regression

In a way, the Pitfall work can be viewed as a task in dimension reduction.  We are trying to find a model of which objects influence other objects, to constuct something that looks like a Bayesian network.

Given that we can try and predict the features of an object based on other objects, one question that comes up is how to find which other objects will aid in the prediction.  I’ve been reading on some aspects of dimension reduction, and one important algorithm that seems to be cited frequently is Sliced Inverse Regression by Duan and Li (1991).

I found a copy of Applied Multivariate Statistical Analysis by Hardle, Simar, which has a section on SIR, and SIR2 (an extension of SIR).

The basic idea behind SIR is that given a feature vector X and a result Y, the algorithm looks by finding E[X|Y=y] as opposed to E[Y|X=x], that is where the “inverse regression” comes from.  The “slices” are segments of the feature space that contain some predefined constant number of samples, and basic calculations are performed on the samples in that slice is a crude estimate for the inverse regression.

I wont write out the algorithm here, but it is pretty short and looks like it could be written in just a few lines of Matlab.  Both outlines of SIR point out that is is very similar to PCA.  From Hardle, Simar:

SIR is strongly related to PCA. If all of the data falls into a single interval, which means that Cov{m_1(y)} is equal to Cov(Z), SIR coincides with PCA.

Where Z = Cov^(-1/2){X-E[X]}, m_1(y) = E[Z|Y=y].

Some other interesting parts of SIR are that it is claimed SIR doesn’t suffer from curse of dimensionality in X, and that no actual regression of the form Y = f(X) is performed.

Even still, I’m not sure this is a winner for this problem; if it is so similar to PCA, I wonder if it also suffers from the same shortfalls that PCA is subject to?  Specifically, all PCA cares about is variance, but large variance isn’t necessarily what is important.  What is important is what that variance means, as small changes in one part of the feature vector may have a great deal of predictive power, or vice versa.  Additionally, PCA is not scale independent, which is a pretty large drawback.

Tagged , ,