[paper reading notes] fast network embedding enhancement via high order probability approximation
Qinze 2021-06-04 10:18:43

The structure of this paper

1. solve the problem
2. Main contributions
3. primary coverage
4. reference

(1) solve the problem

Most previous jobs , Or the high-order similarity of the network is not considered （ Spectral clustering ,DeepWalk,LINE,Node2Vec）, Either it is considered, but the efficiency of the algorithm is very low , It can't be extended to large-scale networks （ Such as GraRep）.

(2) Main contributions

Contribution 1. Many of the existing NRL The algorithm architecture is summarized into a unified framework （ Similarity matrix construction and dimension reduction ）, And come to a conclusion , If higher order similarity information is considered in the similarity matrix , that NRL The representation effect of the algorithm will be improved .

Contribution 2. Put forward NEU Enhancement strategy to improve the existing NRL The representation effect of the algorithm , Through NEU The representation matrix processed by the algorithm R In theory, the higher-order similarity of nodes is integrated （ The approximate ）. Last , Experiments on multi label classification and link prediction show that the algorithm is not only effective in time , And the accuracy is also greatly improved .

(3) primary coverage

1. Preliminary knowledge

• K Order similarity ： The first-order similarity can be expressed as the edge weight of two nodes , Second order similarity can be expressed as the number of common neighbors of two nodes , So how to extend it to higher-order similarity ？ First, consider another explanation of second-order similarity ： node vi Two steps to the node vj Probability . The first and second order similarity is simply extended to k Order similarity , That is node vi go k Step to the node vj Probability . hypothesis A Is the normalized adjacency matrix （ First order similarity transition probability matrix ）, that k The transition probability matrix of order similarity is Ak（k Order similarity transfer matrix ）,Akij Representation node vi go k Step to the node vj Probability .

Personal understanding ： Why does higher-order similarity work ？ Because the real network is often sparse , This means that the size of the edge and the size of the node are often the same . therefore , The first-order similarity matrix of real networks is usually very sparse , The first-order similarity is not enough to reflect the relationship between nodes . therefore , Need to combine higher-order node similarity ）

2. Unified framework

The paper proposes that A dimension reduction method based on similarity matrix （ Matrix decomposition ） A unified framework for , And the existing algorithms are summed up in this framework .

Dimension reduction based on similarity matrix （ Matrix decomposition ） The unified framework consists of two steps ：

• Step 1： Similarity matrix M Construction .（ Such as adjacency matrix , Laplace matrix ,k Order similarity matrix, etc ）
• Step 2： Dimension reduction of similarity matrix , That is, matrix decomposition , Such as eigenvalue decomposition or SVD decompose .
The goal is ： Decomposition matrix M=RCT, That is to find the matrix R And matrices C Like a matrix M, matrix M And matrices RCT The deviation of can be expressed by the matrix norm of difference . among ,R Represent moments for the central vector ,C Representation matrix for context vector .

An example shows that the algorithm conforms to the above unified framework ：

Example 1：Spectral Clustering（SC）

Similarity matrix M： Normalized Laplacian matrix （ First order similarity ）
Dimensionality reduction method ： Eigenvalue decomposition .

Example 2：Graph Factorization （GF）

Similarity matrix M： The normalized adjacency matrix （ First order similarity ） Dimensionality reduction method ：SCD decompose .

Example 3：DeepWalk

Similarity matrix M： DeepWalk The algorithm approximates high-order similarity by sampling based on random walk generation , And there's no accurate calculation in practice k Order similarity matrix .

Dimensionality reduction method ： In the way of objective function optimization ,SkipGram Objective optimization of （SGD）, Looking for matrices R And matrices C bring RCT The approximate M.

Example 4：GraRep

Algorithm principle ：
GraRep Calculate accurately 1,...k rank ,k A similarity matrix , And calculate a specific table for each similarity matrix sign （ utilize SVD decompose ）, Finally, I will k Connect the two representations .

In essence, it is also based on similarity matrix decomposition , Belongs to the proposed unified framework, but ,GraRep It doesn't work for large-scale networks , Computational efficiency is too low .

3. Algorithm principle

According to the problems of the above algorithm ： This paper studies how to start from Approximate higher order Effective learning network representation in similarity matrix （ It makes the algorithm efficient and effective ）.

Suppose we've used the above NRL An algorithm in the framework learns a relatively low order similarity matrix f(A) Approximation of RCT. On this basis , Our goal is to learn a better R' and C', Its R'C'T Approximate a higher order matrix g(A), Its degree ratio f(A) Higher .

f(A) The definition of （ Similarity matrix ）： By A Of 1...k Polynomials of powers .f(A) Degree k Represents the similarity of the largest order considered in a polynomial , namely A The greatest power of , Refer to DeepWalk The similarity matrix of ,f(A)=M.

be aware NEU The main purpose of the algorithm is to enhance the embedding results of other representation learning models , That is to say, on the basis of embedding vector containing low order information , Fusing higher-order information to generate better embedding vectors . The principle of this algorithm is very simple , That is to do a post-processing operation on the representation vector embedding matrix obtained by other algorithms , The iterative updating formula is as follows ： A question ： This R and C How does the iterative update of take into account higher-order similarity ？

Theorem
Given the network representation matrix R And vector representation matrix C（ It can be learned from other representation algorithms ）, hypothesis RCT Approximate similarity matrix M=f(A), Approximate error limits And f(A) The degree of K. Through the above iterative formula （3） Updated R’ and C’ Product of R’C’T It's like a matrix g(A) have K+2 Degree , And the approximate error limit is The conclusion can be drawn from the above theorem ： That is, every iteration is updated , The degree promotion of approximate similarity matrix of decomposition 2, But the corresponding error limit will increase 2.25 times , Therefore, we must balance the high-order node similarity information and the corresponding error .

A variant of the iterative formula ： It can be deduced that the variant iterative updating formula can obtain higher order similarity in one iteration （ The first iteration formula is just more than one iteration 2 rank ）.（ Of course, the iterative formula which is more complex than the variant iterative formula and obtains higher order similarity in one iteration can be similarly generalized ）

summary ： That's the iterative update of the embedding matrix obtained by other representation learning algorithms , The higher-order information can be integrated into the embedded vector .

(4) reference

Yang C , Sun M , Liu Z , et al. Fast Network Embedding Enhancement via High Order Proximity Approximation[C]// International Joint Conference on Artificial Intelligence. AAAI Press, 2017.