Instructional Videos for Unsupervised Harvesting and Learning of Action Examples. Yu, Jiang, Hauptmann. ACM Multimedia 2014

  1. Harvest examples of actions from instructional video by examining narration
  2. Unsupervised method
  3.  “..examples harvested are of reasonably good quality”
  4. Performance is on par with supervised methods
  5. They look for phrases like “going to” and “let us”, as they signify an action is to happen.
    1. From there they look for the direct object (the noun which is the focus of a verb)
    2. Use Stanford parser
  6. They compare their results with that based on manually annotated data “… the detectors trained on labels collected in an unsupervised fashion performs as well as or even better than the detectors trained on manually labeled data.”
  7. They make action detectors, but its not clear what data exactly was used for the training
  8. Only did videos with closed captioning provided

