Graph Theory and Network Science for Natural Language Processing – Part 1

From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language. This series gives you tips on how to get started with graph and network theory, which Python tools to use, where to look for graph databases and how to visualize networks, finally we offer a few resources on Graph Neural Networks.

Graph Theory and Network Science

First of all, one might ask what’s the difference between Graph Theory and Network Science. We argue that there is no sharp boundary between the two fields. It seems that NLP practitioners tend to prefer graphs to networks, while cognitive scientists and AI researchers tend to have reversed preferences. We’ll be sloppy and use the two terms interchangeably here. But for the sake of those who stick to the separation of the two field, let’s see how Wikipedia defines them

“Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes (or vertices) and the connections between the elements or actors as links (or edges). “

Wikipedia: Network Science

“In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. “

Wikipedia: Graph theory

Graphs and networks are versatile fields on their own. Here we focus on the very basics of the theory behind them. For the practical parts, we only deal with resources available to Pythonistas.

Theoretical Background

Richard J. Trudeau’s Introduction to Graph Theory is a short, cheap, and accessible introduction into the field. It is a math classic from Dover, containing just enough material to get started with graphs.

Barabási, Network Science

Network Science by Albert-László Barabási is a comprehensive, freely available textbook. It can be used as a reference work to look up the gritty nitty details of network theory from time to time. Don’t be scared by the long chapters of the book. To understand graph-based NLP, you don’t need the second half of it (from chapter 6).

Graph-Based Natural Language Processing and Information Retrieval by Mihalcea and Radev is a short (less than 190 pages) yet comprehensive book. The authors are top-tier researchers of their field, their TextRank algorithm is one of the best unsupervised keyword extraction and extractive summarizer algorithms. The book gives you a comprehensive overview of graph-based methods in NLP. You can use it as a texbook as well as a reference work. Its first and second chapters (which are devoted to Graph Theory and Graph Based Algorithms respectively) are not suitable for complete beginners. Instead, we recommend Trudeau’s and Barabási’s books to learn the basics of graph theory and network science. If you want to learn more about graph algorithms, read our post on resources to learn the basics of algorithms.

The Python way

Although there is a plethora of network packages, NetworkX stands out as one of the most comprehensive Python package, one with an active group of maintainers. It is awesome for small and medium sized networks up to about 100.000 nodes. Check out this post on benchmarking all major graph libraries to select the one that best suits to your needs. Unless you have a very specific problem, we strongly recommend using NetworkX. If your network is too large, you should use a graph processing framework to analyze it. You’ll also need a graph database to store it and run analytic queries on it. We’ll cover these topics in the second part of these series. Now let’s turn back to Python tools.

Social Network Analysis for Startups by Tsvetovat and Kouznetsov is a fantastic book despite its misleading title. This book is a practical introduction into graph theory/network science and social network analysis using Python. The chapters follow each other in a logical manner, the examples are really good, and the explanations are superb. The only problem with this book is its age. Having published in 2011, this book shows its age and you have to adapt the example code to the present day versions of Python, matplotlib, NetworkX and other tools.

Complex Network Analysis in Python by Zinoviev is a more recent title. It uses NetworkX to teach network science in a pragmatic manner. The first part deals with the basics, the second is devoted to classic explicit networks. The third and fourth parts are rare gems. They are dealing with creating networks, based on co-occurrence and similarity,. These are topics which are hardly found in other sources! The last part is devoted to directed networks, which sadly contains only one chapter.

If you are interested in graph-based methods in machine learning in general, Graph-Powered Machine Learning by Alessandro Negro is the best resource to use. It is freely available here. By the way, Alessandro will speak at our meetup soon. Register here to attend the online event, or you can watch the recorded talk later on our YouTube channel.

What’s coming up next?

We hope you enjoyed our journey in the world of graphs and networks. In the next part, we will collect the best resources on graph analytics frameworks and graph databases. The third part will be devoted to visualization. Stay tuned!

Subscribe to our newsletter

Get highlights on NLP, AI, and applied cognitive science straight into your inbox.

powered by TinyLetter