Category Archives: Tensors

MetaFac: Community Discovery via Relational Hypergraph Factorization. Lin, Sun, Castro, Sundaram, Kelliher. KDD 2009.

  1. Looks to find community structure in social networks
  2. Community structure represents the latent social context of user actions.”
  3. The algorithm proposed, MetaFac (MetaGraph Factorization) has 3 components:
    1. Metagraph, a relational hypergraph representation for modeling multirelational and multi-dimensional social data
    2. an efficient factorization method for community extraction on a given
    3. an on-line method to handle time-varying relations through incremental metagraph factorization.
  4. Tested on large datasets from Digg
  5. Social network data is complex as it is high dimensional and time-varying
  6. All these [previous] works restrict themselves to pairwise relations between entities (e.g. user-user or user-paper). In rich online social media, networked data consists of multiple coevolving dimensions, e.g. users, tags, feeds, comments, etc. Collapsing such multi-way networks into pairwise networks results in the loss of valuable information, and the analysis of temporal correlation among multi-dimensions is difficult.”
  7. For multi-dimensional network analysis, can use either:
    1. Tensor-based analysis
    2. Multi-graph mining (joint factorization over two or more matrices).  Its application is domain-specific
  8. The order of a tensor is the number of modes or ways it has.
    1. This is different from the dimension
    2. A mode has a dimension, which is the number of elements in that mode
  9. Use chi (or bold italic X) to denote a tensor
  10. mode-d matricization or unfolding is the process of reordering an M-way array into a matrix <2D? Don’t understand this exactly>
    1. The inverse operation is a folding
  11. vectorization linearizes the tensor into a vector
  12. The mode-d product is the product of a mode-d tensor with a (2d) matrix
  13. mode-d accumulation sums entries across all rows aside for mode d, producing a vector of length I_d
    1. Like unfolding, this can also be done across multiple dimensions, with a mode-(c,d) accumulation
  14. Tensor decomposition or factorization is a form of higher-order principal component analysis.  It decomposes a tensor into a core tensor multiplied by a matrix along each mode.”
    1. A special case of tensor decomposition is referred as CP or PARAFAC decomposition”
  15. The problem of finding latent community structure in social networks is a 3-part problem:
    1. “how to represent multi-relational social data”
    2. “how to reveal the latent communities consistently across multiple relations”
    3. “how to track communities over time”
  16. We introduce metagraph, a relational hypergraph for representing multi-relational and multi-dimensional social data. We use a metagraph to configure the relational context specific to the system features – this is the key to making our community analysis adaptable…”
  17. There are 3 important concepts:
    1. facet – a set of objects or entities of the same type: “a user facet is a set of users, a story facet is a set of stories, etc”
    2. relation – interactions among facets: “digging” (liking) a story involves two facets (binary – user, story), making a comment involves 3 facets (user, story, comment).  A facet may be implicit
    3. multi-relational hypergraph (metagraph) describes “the combination of relations and facets in a social media context. A hypergraph is a graph where edges, called hyperedges, connect to any number of vertices. The idea is to use an M-way hyperedge to represent the interactions of M facets: each facet as a vertex and each relation as a hyperedge on a hypergraph. A metagraph defines a particular structure of interactions among facets, not among facet elements.”
  18. In their metagraph/hypergraph, the set of nodes is the set of all facets, and the edges are the set of all relations.
  19. We formalize the community discovery problem as latent space extraction from multi-relational social data represented by a metagraph.”  The goal is to find clusters of users that interact (stochastically) in communities in some way. 
  20. If considering two objects ij, the relationship between i and j through community k, relationships are described in terms of the probability of k iteracting with i (p_{ki}), and k interacting with j.  Given this, the interaction bettween i and j is
    1. x_{i,j} = p_{kip_{kjp_{k}
    2. Because working with hypergraphs, can add an arbitrary number of objects <although I think the analysis is always for one community?>
  21. There are a couple of problems
  22. Metagraph Factorization (MF)
    1. Given a metagraph (hypergraph), and a set of observed data tensors defined on that graph
    2. “…find a nonnegative core tensor [z] and factors {U^(q)}_{q \in V} for corresponding facets V={v^{(q)}}” <I don’t know what this means>
  23. The second problem is the same metagraph factorization problem, but for time evolving  / nonstationary data.  This basically adds an index to the above problem (now looking for a core tensor [z_t])
  24. They approach the problem from the perspective of optimization
  25. Start with a simple example of a 2-way hypergraph (just a normal graph) with 3 vertices (facets).
  26. The data set “… corresponding to the hyperedges are two second-order data tensors (i.e. matrices) {Χ^(a), Χ^(b)}”  They are made of facets <1,2> and <2,3>, with facet 2 being shared by both tensors.
  27. The goal is to extract community structure by “… finding a nonnegative core tensor [z] and factors {U^(1), U^(2), U^(3)} corresponding to the three facets.  The goal is to be able to describe the 2 matrices that make up to the corpus as follows:
    1. Χ^(a) [made of facets <1,2>] = [zU^(1) U^(2)
    2. Χ^(b) [made of facets <2,3>] = [zU^(2) U^(3)
  28. [z] and U^(2) are shared by both.  The “… length of [z] is determined by the number of latent spaces (communities) to be extracted.”
  29. Because we are working in terms of distributions, the KL-divergence (they denote as D(•||•)) is a reasonable approximation of cost.  In this example, the resulting cost is:
    1. D(Χ^(a) || [zU^(1) U^(2)) + D(Χ^(b) || [zU^(2) U^(3))
    2. Each D in this equation corresponds to one hyperedge, “… each tensor product corresponds to all how facets are incident to an hyperedge and the summation corresponds to all hyperedges on the graph.”
    3. There is one additional constraint in the cost function such that each column of the Us must sum to 1.  This is based on the assumption that probability of a relation occurring in one community is independent of other entities in a community
  30. Not going over the algorithm exactly, but the optimization is nonconvex, so the solution found is local
  31. The algorithm is iterative – EM based (as its a local method and the problem is nonconvex).  Its a generalization of an algorithm “… for solving the single nonnegative matrix factorization problem.”
  32. The algorithm is computationally expensive so they propose a modification that makes it cheaper
  33. There is the version thats based on nonstationarity <not paying much attention because not relevant to the setting I consider>.
  34. <Ok on to the experiments>
  35. Data set consists of different types of relations (contact, reply, comment, etc), but by far the largest relation type is the “digg” which has 1.15 million tuples
  36. A limitation of the approach is that the number of communities must be specified by user up front (like K-means)
  37. <The community separation makes a lot of sense for K=2 communities, but with K=4 there are already some spurious results, for example one cluster is clearly politically oriented, but another also has politics and tech mixed together, and then there is yet another cluster that is also tech.  For K=12 the clusters don’t seem to make much sense>
  38. They then use the method to make predictions (such as who will digg what)
  39. The MFT (nonstationary model) outperforms MF (stationary model) slightly.  They seem to outperform other methods my a pretty large margin (2 to 3 times as good if I understand the metric)
    1. The tests they do aren’t suited to a number of other approaches that only consider pairwise interactions so they have to be tweaked (checking for many pairwise interactions).
  40. Run scalability evaluations on synthetic data – scales to very large data and is linear