Geometric Deep learning with Graph Neural Network

Salman Faroz

7 min readNov 11, 2020

First of all,

What are Euclidean Space and Non-Euclidean Space?

**Euclidian** geometry is developed by Euclid.

Euclidean Space

This incorporates 1 dimensional, 2 dimensional, and N number of dimensional functional.

Every time a mathematician develops a formula in geometry, he or she must demonstrate that the formula holds true across all dimensions.

Enter non-Euclidean space now.

Yes, there is no such thing as Euclidean space; instead, hyperbolic and sphere-like spaces exist. Simple Euclidean searches for surfaces that are flat and are non-curved.
Many times, this is considerably more beneficial than Euclidean space.

Neural networks are performing well in Euclidean Data. example: Text, Audio, Images, etc.

But what about Non-Euclidean Space? example: Graphs/Networks, Manifolds.Etc. (Likewise complex structures)

To know more about Non-Euclidean Space

So there comes Geometric Deep learning. Paper
Bronstein et al. first introduced the term Geometric Deep Learning (GDL) in their 2017 article “Geometric deep learning: going beyond euclidean data”

They are trying on the graphs and applying 3d model on CNN and etc.

Now we are going to look into one of those subdomains Graph let's jump into it.

For a little bit, we will brush up on the basic stuff.

Graphs

The graph is Simple G= (V, E)

Where, V= Vertices/Nodes. E =Edges.

Important types of Graph

Undirected Graph — have edges that do not have a direction
Directed Graph — have edges with direction.
Weighted Graph — edge has a numerical value called weight

the Weight/Labels can be numeric or string.

Nodes /vertices can have features , for example node A can be have features like it properties. (Weight and Features are not the same)

The computer Doesn't understand the graph, So what to do? We can use Matrix

We know that any graph can be derived as the Matrix(N * M).

N-number of nodes
M-edges of graph

Adjacency matrix

Properties

It is a Square matrix.
Values can be either 0 or 1.
Without self-loops, diagonals will be 0.
It's all about finding Each node to another node adjacent or not. if not 0 else 1.
Adjacency matrix can be done with a weighted graph, but there we will fill up the weights instead of 0 and 1.

Simply , Collecting the neighborhood of the each node.

Degree Matrix

It is a Diagonal matrix, just a count of how many edges are connected to the node (self-loop will be considered as 2).

Laplacian matrix

             Laplacian = { D - A }
where ,
 D - Degree Matrix
 A - Adjacency Matrix

It tells how smooth the graph is or the measurement of the smoothness of a vertex.

The Graph Neural Network

We know Neural networks and CNN are having fixed input values in the model But, Graphs are not fixed. So we have to do new approaches.

Get into CNN, CNN works in 3 steps

Locality — Kernal /filter are taking a particular grid from beginning to end of the image, They are fixed grid.
Aggregation — multiply with a mask by that particular grid and sum them up.
Composition — Function of function (hidden layer computation f(f(x))…)

The similar process can be applied to GNN. Aggregation refers to how they are contributing to their associated nodes by weights. In this case, locality will be taken into account for the neighbourhood (how a node is connected to another node localising the nodes). composition (layer stacking) moving on to further layers.

Node Embedding

Aim : Similarity (u , v)= (z_u)^T(z_v) , z denotes the embedded space.

When a node is embedded, it becomes d-dimensional, where d is the dimension of the embedding space, not the dimension of the original network space.
Without altering the distance between the u and v nodes, the encoding function ENC() transforms the nodes into d-dimensional objects.
Take into account that u has the feature vector x and v has y.

so , (z_u)(z_v) = (x)^T (y).

Locality (neighbors)

we have to create a computational graph for the target node. A’s neighborhoods are B, C, D, so first B, C, D connected towards A, then connecting neighbors of neighbors, B is connected with A, C, Likewise, it goes

here it is working as an encoder function, A node to Z^x is actually an encoding. (Z^x is the feature vector of the A)

Also, all of the nodes have their own feature vector.

Aggregation (neighbors)

The A and C will be added up to get B, and the A, B, C, and D will be added up to get B. A permutation invariant means that both (A+B) and (B+A) are true.
In the example above, we chose A as the target node but moving forward, we must choose all nodes as the targets, since b will be the target node and provide us with a separate computational network, just like for the other nodes.

Forward propagation

We know how Forward propagation works.

How do Graph Convolutional Networks operate, though? The Spectral GCN appears. Spectral GCNs achieve this information transmission strategy by using the Eigen-decomposition of the graph Laplacian matrix. GCN Paper

A is the adjacency matrix in this scenario. For the Self-loops, we can multiply the value of A by an identity matrix, where A* represents the normalized value of A.
By incorporating the local node information with it, Spectral Graph Convolution functions as a message-passing network.

For training GCN we need 3 elements

* Adjacency matrix- learn the feature representations based on nodes connectivity.* Node attributes- which are input features* Edge attributes- data of Edge connectivity

Consider, X- as the input features, A as Adjacency matrix, D is degree matrix.

The dot product of Adjacency Matrix and Node Features Matrix represents the sum of neighboring node features

AX = np.dot(A,X)

Normalizing A is can be done in the way of

Doing the dot product with an inverse of degree matrix and AX but in this paper, Kipf and Welling are suggesting to do the symmetric normalization.

#Symmetrically-normalization
D_half_norm = fractional_matrix_power(D, -0.5)
DADX = D_half_norm.dot(A_hat).dot(D_half_norm).dot(X)

Backpropagation is not necessary if not. The forward propagation formula changes slightly as each node is transformed to the computational graph, and we also eliminate bias b in the calculation for simplicity’s sake. The function as it is just sends the adjacency matrix and input features with it, and only the forward propagation happens. The difficulty with Graph neural networks, however, is in the preparation of the data. We need to prepare the data for the network by gathering edge connectivity information, input attributes, and an adjacency matrix.

For example, let us take the Cora dataset :
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Nodes = Publications (Papers, Books …)
Edges = Citations
Node Features = word vectors
7 Labels = Publication type e.g. Neural_Networks, Rule_Learning, Reinforcement_Learning, Probabilistic_Methods…

Number of graphs: 1 
Number of features: 1433 
Number of classes: 7

Code for this Model building.