Reliable Effective Terascale Linear Learning. John Langford. Talk

  1. His method (Vowpal Wabbit) takes a couple of seconds to train a 500~mb compressed file and gives the best result on the dataset (very large, extremely sparse)
  2. The method is online
  3. This talk is bad.  
  4. He says his stuff is fast but isn’t comparing it to anything (for example, compare to c45 decision trees). His defining of what features are in the database is also unusual and is causing problems.  He is also using unusual terminology for things like weights
  5. I think a takeaway is that you can’t show up to a talk and be like “my stuff is the most awesome stuff.  period.” because it will just make people skeptical.  You also can’t say “we beat the future” because that claim doesn’t make any sense
    1. Another way to say this is don’t love your method too much
  6. He throws out acronyms like MPI and RCV1, but nobody knows what that means, and he didn’t define them
  7. What are nodes?  (I think it means the number of machines its parallelized onto, but I don’t think he said it)
  8. Says hadoop is bad because it takes a minute to start up a task, this uses something new called allreduce
    1. Basic idea is each node starts with a number and then you add all numbers and propagate the sum to all
    2. Gave some other reasons why hadoop is no good
  9. Hadoop is nice because it moves the program to data (which is good when the data is huge)
  10. Says all the algorithms that run at scale do gradient descent (Don’t know if I totally agree with that, but ok)
  11. Says if units aren’t all at same scale it messes up the gradient.
    1. This is commonly fixed by a Newton update, which requires a Hessian, which you can’t do on large data
  12. Method runs as a mixture of adaptive and batch
  13. Learning rate is defined for each dimension, and is related to the previous gradients in that dimension
  14. Normalizes data
  15. Also leverages importance weighting (like what occurs in boosting), and has a very effective way of using them
    1. Says this also helps having an aggressive learning rate
  16. Says L-BFGS is very important to know
  17. There are a large number of pieces to this learning system and its really tough to maintain whats going on.  It would be better to more thoroughly explain some parts and leave others completely unexplained
  18. Says VW lead to a more effective spam filter as well
    1. Says spam is adversarial in that it changes over time and in response to filters
  19. The problem being solved is batch
  20. The method is robust and gives good results on most data sets

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: