Today, with the ever more long documents and multimedia data, finding the right information is more important and challenging than ever. The rise of deep learning has ushered in a new era of “neural search”. However, building a neural search system is non-trivial work for many engineers. The main challenges are: (1) long dev cycle due to the complex tech stack (2) poor scalability due to the glued-architecture (3) strong requirements on the domain knowledge to fine-tune the results. With Jina (https://github.com/jina-ai/jina), engineers can quickly build up a search engine powered by state-of-the-art AI in just minutes. In this talk, I will introduce the design philosophy and the key features of Jina; and showcase how Jina bootstraps a QA semantic search system and a short-video search system in just lines of code.

(more…)

From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work so much, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language, that we decided to start a series on the topic. The first part explored the theoretical background of network science and dealt with graphs using Python. This part focuses on graph processing frameworks and graph databases.

(more…)

From keyword extraction to knowledge graphs, graph and network science offer a good framework to deal with natural language. We love using graph-based methods in our work, like generating more labeled data, visualizing language acquisition and shedding light on hidden biases in language. This series gives you tips on how to get started with graph and network theory, which Python tools to use, where to look for graph databases and how to visualize networks, finally we offer a few resources on Graph Neural Networks.

(more…)

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. This talk introduces the Spark NLP library – the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning & transfer learning NLP research, as a permissive open-source library backed by a highly active community and team.

(more…)

City names in Hungary are like lego parts. You can put together two ore more words almost freely and get an existing city name. However one can easily discover some pattern of the names, i.e. their reasonable proportion ends with the same word. Our project aims to map the most frequent endings of the municipality names of Hungary.

(more…)