Skip to content

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Notifications You must be signed in to change notification settings

emma-horton/PartsOfSpeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Badge NLTK Badge CoNLLU Badge Dynamic Programming Badge

Part-of-Speech Tagging with Dynamic Algorithms


Overview

This project explores the implementation and comparison of three Part-of-Speech (POS) tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—across English, Swedish, and Korean. These algorithms were designed to navigate the complexities of morphology and syntax in different languages, revealing intriguing patterns in linguistic structure and algorithm performance.

Project Goals

  1. Implement three distinct POS tagging algorithms of varying complexity: Eager, Viterbi, and Individually Most Probable Tags.
  2. Train and evaluate these algorithms using multilingual corpora from the Universal Dependencies Treebank.
  3. Uncover linguistic insights by analyzing algorithm performance across English, Swedish, and Korean.

Key Findings

Algorithm Performance at a Glance

Language Eager Accuracy (%) Viterbi Accuracy (%) Individually Most Probable Tags Accuracy (%)
English 88.6 91.3 88.6
Swedish 85.7 90.2 85.7
Korean 80.8 79.2 80.8

How to Use This Project

1. Install Dependencies

pip install conllu
pip install nltk

2. Run the Script

python3 pos_tagging.py

Technologies Used

  • Python: Primary programming language for implementation.
  • CoNLL-U: For parsing and preparing corpora.
  • NLTK: To calculate emission and transition probabilities.

Acknowledgements

Grateful for the Universal Dependencies Treebank for providing high-quality multilingual data, enabling this exploration into the intricacies of POS tagging.

Want to fin out more?

For more insights, read the associated blog post:

About

Implementing three part-of-speech tagging algorithms—Eager, Viterbi, and Individually Most Probable Tags—and comparing their accuracy across English, Korean, and Swedish.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages