Open Source Packages

We build and maintain a suite of open source Python libraries and tools for computational text analysis. The packages are designed as a coherent ecosystem — each layer building on the one below.

chronowords

Temporal word embedding analysis. Detect how word meanings shift across time in large text corpora using memory-efficient PPMI-based embeddings, NMF topic modeling, and Procrustes alignment.

Originally developed to study gender bias in Hungarian online media — the first version powered our 2019 analysis of how 102,240 news articles represent women, men, and minorities in the semantic space (read the analysis).

pip install chronowords


kenon

Semantic network construction from text corpora. Build and analyse word association graphs, find paths between concepts, and compare text-derived networks to human association norms (Nelson norms, Small World of Words).

Named after the Greek kenon (κενόν) — the void that enables connection.

pip install kenon


corvus

corvus logo

A cookiecutter template for data science and text analysis projects. Pre-configured scaffold with uv, ruff, DVC, MLflow, Sphinx docs, and structured directories for raw/processed data, models, notebooks, and a Python package — eliminate manual setup and start analysing.

Originally developed as our internal project template for computational linguistics and NLP research, now publicly available.

uvx cookiecutter https://github.com/crow-intelligence/corvus.git


lexograph (coming soon)

Computational text art and visualisation. Turtle graphics sentence walks, punctuation spirals, rhythm punch cards, and concordance plots. Depends on both chronowords and kenon.

Named after the Greek graphein (γράφειν) — to write, to draw.


All packages are MIT licensed and open source.