Graph Convolutional Network (GCN) - Background
Background Knowledge
Why we need GCN?
CNN cannot deal with Non-Euclidean structure data. The reason is that the neighborhood nodes of each vertice are not the same. So the convolutional kernel size is not the same.
We need a method to effectively extract information from graph structure for machine learning training.
Many data structures are non-euclidean, like social networks, computer networks, IoT, DNA information, etc. So graph could be a better way to represent the data and keep more information than the euclidean data.
Essentially, the difficulty to process the graph data is that the numbers of neighborhood vertices of one vertice are not constant. --> A fixed size of a convolutional kernel cannot be used for information extraction.
==> There are two main directions to process the graph data:
Transforming non-Euclidean data to Euclidean space
Find a convolutional kernel that can process the neighborhood vertices with different sizes. --> GCN
Spatial Domain and Spectral Domain
Spatial Domain. Extract the graph features by finding the neighbors of each vertex:
There are two problems to extract features in Spatial domain:
How to find the neighbors of a vertex, i.e. how to determine receptive field?
With the receptive field, how to deal with the features of the different numbers of neighbors.
The main drawbacks are:
The neighbors of each vertex are different -> the computation need to deal with every vertex
The extracted feature lose too much information

Spectral Domain. It wants to use the Spectral domain to calculate the convolution between kernel and graph data. Intuitively, we want to use the Eigenvalues and Eigenvectors of the Laplacian matrix of a graph to extract graph feature information
What does the GCN really do?
图神经网络的核心工作是对空间域(Spatial Domain)中节点的Embedding进行卷积操作(即聚合邻居Embedding信息),然而图数据和图像数据的差别在于节点邻居个数、次序都是不定的,因此传统用于图像上的CNN模型中的卷积操作(Convolution Operator)不能直接用在图上,因此需要从频谱域(Spectral Domain)上重新定义这样的卷积操作再通过卷积定理转换回空间域上。
为了在频谱域和空间域中转换,我们借助了傅里叶公式,并且定义了图上傅里叶变换(从空间域变换到频谱域)和图上傅里叶逆变换(从频谱域回到空间域)的变换公式。具体操作是我们将节点的Embedding 通过傅里叶正变换从空间域变换到了频谱域 ,在频谱域上和卷积核hh进行卷积操作,再将变换后的节点Embedding通过傅里叶逆变换回到空间域,参与后续的分类等任务.[2]
Laplacian Matrix
For an undirected graph, G = (V, E), its Laplacian matrix is L = D - A. D is the Degree matrix, and A is the Adjacent matrix.

There are also other two definitions of the Laplacian matrix:
Symmetric Normalized Laplacian. . --> Most papers used this format
Random Walk Normalization Laplacian.
For , since (1) L = D - A and (2) D is symmetric metrix, ==> can be decomposed.
-> This is a very important characteristic. The reason is that The Fourier Transform needs to use the eigenvalues and eigenvectors of the Laplacian matrix => to transform graph data from spatial domain to spectral domain to calculate the convolution.
Laplacian matrix decomposition
Laplacian matrix is positive semi-definite (半正定矩阵). It has three characteristics:
Symmetric matrix must have n independent eigenvectors
==>
The eigenvalues of a semi-definite matrix must be positive
Since U is orthogonal, , since ,
==>
The eigenvectors of a symmetric matrix are orthogonal
NOTES: Definition of 正定矩阵,半正定矩阵:
正定矩阵: A 是n阶方阵,如果对任何非零向量x,都有 ,其中 表示 的转置,就称A正定矩阵
<==> A的所有特征值 > 0
半正定矩阵:A是n阶实对称矩阵。如果对任意的实非零列向量x有 ,就称A为半正定矩阵。
<==> A的所有特征值 >= 0
Fourier Transform on Graph
Fourier Transform: Spatial Domain -> Spectral Domain
Continuous Fourier Transform:
The Fourier Transform of f(t) is the integral of the result of and the basis function
The basis function needs to fit the following criteria:
is the Laplacian Operator:
With the definition of eigenvalues and eigenvectors: , where is a matrix, is its eigenvectors, is its eigenvalues.
=> is the eigenvectors of , and is closely related to the corresponding eigenvalues
Laplacian matrix is the discrete Laplacian operator. We got the Laplacian matrix of a graph before.
=> To do Fourier Transform on Graph, we need to find the basis function. Intuitively, we have the , and we need to find the corresponding
Short Summary:
傅里叶变换迁移到图上核心的工作就是把拉普拉斯算子的特征函数 对应到图拉普拉斯矩阵的特征向量上。因此求解图拉普拉斯矩阵的特征向量是至关重要的。而根据上文的介绍,图拉普拉斯矩阵是半正定矩阵,可以通过特征分解(谱分解)求得特征向量,即 。
With the eigenvectors of a graph, , we can define the Fourier Transform on Graph similar to discrete Fourier transform:
Where, is the transform on Graph vertice, like the vertice embedding. corresponding to the vertex on graph one-by-one. is the eigenvector's element.
The Fourier Transform on Graph for embedding with eigenvalue
<=> The inner product between and
Therefore, the Fourier Transform on Graph for all eigenvalues or graph vertex is
=>
The embedding of input vertex , with left inner product , we can get the Fourior Transform on Graph
Inverse Fourier Transform: Spectral Domain -> Spatial Domain
Inverse Fourier Transform on the frequency :
Similarly, for graph Inverser Fourier Transform on the eigenvalue :
For the Inverse Fourier Transform on all graph vertex is:
=>
Convolution on Graph
Convolution theorem: under suitable conditions, the Fourier transform of a convolution of two signals is the pointwise product of their Fourier transforms:
The steps of doing convolution on Graph are
The Fourier transform of :
The Fourier transform of the convolutional kernel : , which also can be represented as .
The inner product of these two above:
Lastly, left inner product to obtain the inverse Fourier Transform:
Evolution of GCN
Gen 1: Naive method
From the graph convolution function: , we know that the convolution of graph on embedding vertex , is just operation of matrix. The information of the graph is stored in , so we need to find appropriate convolutional kernal , so that we can reduce the loss a model.
==> The target of machine learning on the graph is to find to reduce the loss of a model. In other words, we need to find the parameters for
==> Regarding the as the model parameters, we can use gradient decent to find the proper values for them. Therefore, it regard as the parameter :
Where, is similar to the weights in nerual network, and x is the f
Drawbacks:
is the eigenvetors of Lapalacian matrix , and the time complexity to find the eigenvectors is O(n^3)
For each forward propagation, it needs to calculate the , , and
The convolutional kernel has n parameters, where n is the number of vertex in a graph. For a large graph, there are too many parameters for a model
Gen 2: Reduce the number of parameters
Since there are n parameters in the Naive model, which is too many to learn. There is a new approximation method comes out:
Then,
Further,
Since, ==>
Finally,
Where, are the model parameters.
Improvements:
Comparing to the n parameters in Naive model, there are only parameters in this model.
Since the , the computation to decompose is not needed, which saves O(n^3)
With the approximation on the convolutional kernel, it introduces spatial localization. K is the receptive field, which means that the updates of embedding on a vertex will only related to its K-hop neighborhood. Hop k所有节点的加权聚合系数都为
However, this method still need to calculate the , which needs time complexity
Gen 3: Reduce time complexity
Since the model in Gen 2 did not reduce the time complexity of the convolutional computation, the next work comes up with an approximation method based on Chebyshev polynomial to compute the convolutional kernel:
Where is the parameters of Chebyshev polynomial.
In, .
The reason is that Chebyshev polynomial need the input has value range [-1, 1]
Chebyshev polynomial is defined by the function:
Since the calculation of does not need matrix production, but only need recursion as above, the computation complexity of becomes O(n^2). ==> the time complexity of is O(Kn^2)
==>

Gen 4: GCN
In GCN, there are two assumptions: (1) ; (2) . Therefore,
There are two parameters , which are shared for all vertex in the graph.
Since we can restrict the parameter number to avoid over-fitting, so we set that :
Since the value range of is [0,2], and the ouput can be used as next input, it will lead to the gradient explosion or vanishing.
We need a renormalization: ->
Where,
Finally, it is the GCN updating function:
The input matrix , and every input sample has channels (i.e. every graph node have C channels). The Convolution has filters or feature maps:
Where, is the parameter matrix of the filters. is the output after the convolutional operation.
Furthermore, since is the input vector, and there are multiple layers in a neural network, we can represent by . The neural network parameters are . So the forward propagation function of the graph neural network becomes:
Explain of GCN
Intuitive understanding
The feed-forward propagation function of the GCN is
Intuitively, this function means that every vertex of the graph obtain the information of its neighbor nodes and collects them in its own embedding vector.
--> the normalized adjacent matrix
--> a linear transform on all vertex in the layer.
Left production --> transform the feature of a vertex to the sum of its neighbors' features
Example
Assume we have a 2-layer GCN to do classification. Set , based on the feed-forward propagation, we know the output of the GCN embedding is
Where,
is the weight matrix, and is used to transform the input embedding X.
is another weight matrix, and is used to transform the results from the hidden layer embedding H.
For semi-supervised learning, the loss function is
Where, is all the labeled samples.
Lastly, we can use gradient descent to calculate the and
Drawback
Since GCN needs the input of the whole adjacent matrix and the input feature matrix , it needs a lot of memory. The author used the sparse matrix to mitigate the needs of memory and use GPU for accelerate the computing. However, it is still not suitable for a very large graph.
GraphSAGE introduces another way to conquer this issue.
Local Connectivity and Parameter Sharing
Reference
Last updated
Was this helpful?