If you believe that graph and network visualization is a kind of art, this post was written for you. If you believe that it isn’t, then you should also keep reading. Since we love using graph-based methods in our work, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language, we started a series on graph theory and network science. The first part was devoted to the theoretical background of graphs and how to deal with them using Python, while the second part was about graph databases and analytics engines. Now we turn to graph and network visualization.

(more…)

From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work so much, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language, that we decided to start a series on the topic. The first part explored the theoretical background of network science and dealt with graphs using Python. This part focuses on graph processing frameworks and graph databases.

(more…)

From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language. This series gives you tips on how to get started with graph and network theory, which Python tools to use, where to look for graph databases and how to visualize networks, finally we offer a few resources on Graph Neural Networks.

(more…)

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. This talk introduces the Spark NLP library – the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning & transfer learning NLP research, as a permissive open-source library backed by a highly active community and team.

(more…)

More than 90% of machine learning applications improve with human feedback. For example, a model that classifying news articles into pre-defined topics has been trained on 1000s of examples where humans have manually annotated the topics. However, if there are tens of millions of news articles, it might not be feasible to manually annotate even 1% of them. If we only sample randomly, we will mostly get popular topics like “politics” that the machine learning model can already identify accurately. So, we need to be smarter about how we sample. This talk is about “Active Learning”, the process of deciding what raw data is the most optimal for human review, covering: Uncertainty Sampling; Diversity Sampling; and some advanced methods like Active Transfer Learning.

(more…)

Corpus Linguistics is a neglected field of linguistics. Linguists tend to think that it cannot offer much, only some methodological tools to support their ideas. However, they often blame it, when it contradicts to their results. Corpus Linguistics was often considered the historic predecessor of Natural Language Processing in the pre-Big Data era. In this post, we claim that Corpus Linguistics offers a unique perspective on language, and it provides experts with theoretical and practical framework to analyze linguistic data. For the best resources of Corpus Linguistics, don’t stop reading!

(more…)