Who has not ever wondered about the interplay between art and artificial intelligence? Here we share our thoughts on how art can help us to shed light on various aspects of scientific knowledge, how cognitive science serves as a bridge between art and science, and what the limitations of visualization techniques are.
A transit map of associations
You can see a transit map below. If you are a citizen of Budapest, you recognize that the map depicts the tram and underground lines on the Buda side of the city. At a closer look, you can realize that the station names of the lines have been altered. But why? Because each line represents a chain of word associations, such as “growth” -> “recover” -> “crisis” -> “capitalism” -> “Marx” (the line at the lower left corner) or “beer” -> “wine” -> ” lake Balaton” – > “hotel” -> “beach” -> “coast” -> “Libya” -> “migrant”.
AI and data visualization – from word embeddings to semantic similarity network
The chains of word associations are paths of a semantic similarity network, which was generated from a language embedding model. To put it simple, the words are the nodes, and there is an edge between two words if the words are within each other’s five nearest neighbors. The language embedding was trained on a corpus that contains articles published by the main Hungarian online media sites in the last 15 years.
Originally we used our language model to investigate hidden biases in the corpus (just like e.g. Bolukbasi et al.). So first we built the semantic similarity network, which is shown below, to be able to explore the paths starting from various words. However, we realized that it is very hard to express our findings to the general audience and it is almost impossible to explain them why such research is important.
Some cognitive science helps us to understand the data
Below is an example of two paths. These chains resemble to word associations and this resemblance is not a mere coincidence. Studies (e.g. this one) suggest that semantic similarity networks give pretty good results at predicting human associations to a given word. These word associations come to our mind unconsciously as they are engraved into our mind since our early childhood.
The limits of (scientific) visualization
Our goal was to make these associations explicit, because sometimes their unconscious nature can hurt us. For example, using a corpus that is full of human biases to train a language model for classifying the CVs of job applicants is a very bad idea. Amazon can tell you that for sure. Not only machine learning algorithms can be mislead by hidden biases, but our own thinking is prototype and stereotype based.
What if we show only the most significant part of the semantic similarity network? To do so, we can filter out each node under a certain threshold of a centrality measure (e.g. PageRank) or we can filter out nodes under certain frequency in the corpus. One result of this approach can be found below.
Cognition and aesthetic experience
The visualization above is a nice one – at least for our own taste. But it is not an informative one, as it makes hard to find any pattern in the data and to make the data meaningful. More interestingly, most of the words and the chains originating from them are not biased at all. By studying the visualization above one may conclude that Hungarian language has no biases.
We have to admit that we are neither impartial observers nor objective ones. The whole project was started to detect biases and to make them visible for everyone. What we are doing is a kind of art in the sense defined by Alva Noë in Strange Tools (p.101).
“Works of art put our making practices and our tendency to rely on what we make, and so also our practices of thinking and talking and making pictures, on display. Art puts us on display. Art unveils us to ourselves.”
According to this view, diverting an everyday object into a piece of art is the best example of how aesthetic experience occurs. Changing the context of an everyday object questions everything we associate with it. Hence it really unveils us to ourselves.
These ideas motivated us to divert the transit map of Buda. We deliberately selected biased words as starting points of paths in the semantic similarity graph. Then the most interesting paths were chosen among them.
An artifact is born
The concept introduced above will be materialized soon. If you want to be informed where our artwork will be displayed, follow our blog and social media accounts.
Data and code
You can find the code used to generate the networks in this repository. A link to our data and language model is provided in the repo’s readme.
On the language model
If you speak Hungarian, you are encouraged to read more about our language model and the corpus on which it was trained here.
By clicking the links below, the embedded visualizations open in separate tabs.
Subscribe to our newsletter
Get highlights on NLP, AI, and applied cognitive science straight into your inbox.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.