RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data. Witteveen, Duivesteijn, Knobbe, Grunwald. Advances in Intelligent Data Analysis 2014

Ah its from a symposium

An implementation exists here

  1. Idea behind minimum description length (MDL) principle is that it is possible to do induction by compression.
  2. Here they take a popular MDL algorithm, KRIMP, and extend it to real valued data
  3. “Krimp seeks frequent itemsets: attributes that co-occur unusually often in the dataset. Krimp employs a mining scheme to heuristically find itemsets that compress the data well, gauged by a decoding function based on the Minimum Description Length Principle.”
  4. RealKRIMP “…finds interesting hyperintervals in real-valued datasets.”
  5. “The Minimum Description Length (MDL) principle [2,3] can be seen as the more practical cousin of Kolmogorov complexity [4]. The main insight is that patterns in a dataset can be used to compress that dataset, and that this idea can be used to infer which patterns are particularly relevant in a dataset by gauging how well they compress: the authors of [1] summarize it by the slogan Induction by Compression. Many data mining problems can be practically solved by compression.”
  6. “An important piece of mathematical background for the application of MDL in data mining, which is relevant for both Krimp and RealKrimp, is the Kraft Inequality, relating code lengths and probabilities”  They extend the Kraft Inequality to continuous spaces
  7. <Ok skipping most – interesting but tight on time.>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: