Introduction to ML on graphs

By: Lamija Hibovic

This blog is an introductory one to our series of blogs where we will talk about machine learning on graphs. In this particular blog, we will focus on graphs in general, and their application in machine learning.

If you search a word graph on Google, the most common word appearing next to it is an adjective ubiquitous. The reason behind graphs popularity is their ability of describing complex systems and focusing on relationships between points in the system. By definition, a graph is a structure of objects (nodes) and their interactions (edges). For example, to describe Facebook network, or any social network as a graph, nodes correspond to profiles and edges correspond to formed friendships.

Graphs do not only visualize real-world networks but also provide mathematical foundation that is used for analyzing and learning from it. As we all know, in the last few decades there has been an explosion of data especially in the social networks. As a result of data explosion and complexity of relations, machine learning is one of the obvious choices to model, analyze and learn from graph data.

One of the biggest challenges that machine learning deals with is actually its application on irregular structures such as graphs. Therefore, the reason why machine learning on graphs is a more difficult problem than computer vision or natural language processing is the structure of the graph. Natural language processing uses a simple data type called sequence that usually represents a text or a speech. Computer vision uses images as input data, however, images can be resized and represented as fixed size grids or highly regular graphs. There are very successful tools for analyzing this type of data structures; sequences and images. On the other hand, graphs and networks are more complicated to process since they are of arbitrary size and they have no spatial locality as in grids or in text. In addition, there is no fixed ordering between nodes, meaning that we have no clue about the first origin node. Fixed node ordering is also called reference point, and that is something that would allow us to do deep learning. You can see the difference between data structures in Figure 1.1.

Figure 1.1, Comparison of graphs, images and text

Machine learning on graphs is actually a generalization of the modern machine learning toolbox where the input is not required to be a highly regular structure as an image or a sequence.

Nowadays, as a solution, techniques based on dimensionality reduction and deep learning are used. However, there is still the question of how to take advantage of relational structure for better prediction. One of the most successful ways is by explicitly modeling relationships such as taking into consideration similarities between nodes, number of neighbors, number of common neighbors and so on. We will talk about this in the following blogs. Explicit relationships are especially important in deep learning since its toolbox works with simple data types such as sequences and grids.

Researchers are still trying to improve some aspects of machine learning on graphs, however, applications of machine learning on graphs are everywhere around us. Its variety ranges from drug-disease treatments, through marketing campaigns to recommendation systems. As said before, success of machine learning on graphs lies on its ability to focus on relationships between objects. In addition, the graphs are capable of connecting isolated points that other data structures lack. Another advantage is that graphs store relation information that are used to distinguish between subgraphs based on some feature or metric.

In the following blogs, we will talk about traditional methods, such as graphlets and graph kernels that encompass node, link and graph-level features. After traditional methods, the focus will be on methods for node embeddings and graph neural networks.

Entropy387

Social media