## Estimating the order with other regression algorithms

When you asked if the property that seemed to show up in the last experiment was something that I thought was particular to GPs, I said no, that I figured that other regression techniques should also perform similarly.

Well, thanks to Weka, by just changing one line of code I was able to perform that experiment with a number of different regression algorithms, and the results are mixed.

Just for reference, this was the result with GPs (all plots are mean absolute error):

GP error

Which looks very clean, and almost parabolic.  Now here are the corresponding plots for other regression algorithms:

#### Linear Regression:

Linear regression error, order 1 to 20

The order/error graph for LinReg is pretty well behaved, but oddly the error spikes at 2.  There seems to be some flat regions and a then high degree of change in a short period of time with some noise.  The minimum error is at order 9.

#### Perceptron:

Perceptron error vs order

This is the noisiest graph produced out of any of the regression algorithms I tried.  Even still there is a noticeable gain in error as order increases.  Like GPs, Perceptrons had the smallest error at order 2.

#### SVM Regression:

SVM Regression error, order 1 to 6

This one is interesting, the error is very low and flat for all values aside from at 1.  Even still, the error at order 2 is smallest than all other values I tried by a small amount.  I don’t know much about SVM regression though, so its tough for me to interpret this at all.

I also tried Isotonic Regression (I don’t know what that is either), and the results were almost perfectly flat and consistent, with error minimized at order 1, the error for order 1 to 14 were all between 13.7 and 13.9, so Isotonic Regression seems to give you nothing for this approach.

#### Thoughts:

Like GPs, SVM Regression and Perceptrons had error minimized at 2.  Isotonic had minimum error at order 1.  LinReg was quite far off at 9.  In terms of the lowest observed error over all algorithms at any one point Perceptrons actually performed the best.

SVM and Isotonic Regression had very slowly growing error as a function of order, so the fact that they perform well in those cases actually makes it poorly suited for this approach.

The general shape of the graph with LinReg and Perceptrons seemed to be similar to that demonstrated by GPs, but with a higher degree of noise, which I suppose also makes them less well suited for the task.

Based *only* on this experiment it seems like GPs may be best for this approach, as the graph looks the best behaved, and seemed to find the “right” value.  This is a highly qualified statement, in that the results here seem weak.  If this experiment could be reproduced with similar results in another domain, I would be more confident of that statement.