ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Jirka Lhotka
Francesco Salvi
Liudvikas Lazauskas

Repo structure

The repository contains 3 main notebooks aswell as 4 modules:

embeddings_italian.ipynb Responsible for training and evaluating embeddings on italian text (Dante's Inferno).
embeddings_latin.ipynb Responsible for training and evaluating embeddings on latin text (Albert of Aix).
embeddings_voynich.ipynb Responsible for training embeddings on the Voynich Manuscript.
corruptions.py Provide methods to compute ambiguities distributions and to artificially corrupt the texts.
uncertainties.py Provide a class to represent ambiguities with their contexts and methods to create a list of ambiguities given a corrupted text.
baseline.py Provide methods to generate baseline predictions, computing letter frequencies in the text.
validation.py Provide methods to generate predictions and to evaluate the models by computing their accuracy.

Data

The texts used in this project can be mainly found in the foler texts/. The folder contains historical texts such as Dante's Inferno and Albert of Aix, and Voynich transliterations available here. The transliterations are further processed with ivtt, and processed texts are found in the data/ folder.

Resources

Benchmarks The benchmark used for the Latin synonym selection task can be found in the benchmarks/ folder.
Software The software used for filtering and processing the transliterations can be found in software/ folder, taken from here.
Documentation Documentation for the usage of IVTT and IVTFF format can be found in the documentation/ folder.

Predictions

The resulting predictions of the model trained on Voynich can be found in the predictions/ folder.

Requirements

Gensim Models
- version: 4.1.2
- package name gensim
NumPy
- version: 1.19.5
- package name numpy
SciPy
- version: 1.7.3
- package name scipy
Natural Language Toolkit
- version: 3.6.5
- package name nltk
Smart Open
- version: 5.2.1
- package name smart-open
The Classical Language Toolkit
- version: 1.0.21
- package name cltk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Repo structure

Data

Resources

Predictions

Requirements

Gensim Models

NumPy

SciPy

Natural Language Toolkit

Smart Open

The Classical Language Toolkit

Files

README.md

Latest commit

History

README.md

File metadata and controls

ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Repo structure

Data

Resources

Predictions

Requirements

Gensim Models

NumPy

SciPy

Natural Language Toolkit

Smart Open

The Classical Language Toolkit