Scandinavian ULMFiT

We're in the process of releasing BERT models as well. Get the first one here: https://github.com/mollerhoj/danish_bert

Scandinavian ULMFiT

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

This repository contains the weights for the embedding layer of a UMLFiT language model that can be used as the first step in fine-tuning any Natural Language Processing task.

The weights were trained on 90% of all text in the corresponding language wikipedia as per 3. July 2018. The remaining 10% was used for validation.

Supported Languages:

Danish

Trained on 78,373,122 tokens, and validated on 7,837,310 tokens. We achieve a perplexity of 30.9. Download files: Link

Norwegian

Trained on 80,284,231 tokens, and validated on 8,920,387 tokens. We achieve a perplexity of 26.31. Download files: Link

Finnish

Trained on 68,775,370 tokens, and validated on 7,641,571 tokens. We achieve a perplexity of 27.66

Training even higher performance models is possible, but require more (costly) training time. If you need a model with higher performance, feel free to contact us. Download files: Link

Our servers crashed when training the Swedish model, but if you're in need of it, contact us and we can train it for you.

Paper

See Universal Language Model Fine-tuning for Text Classification, Jeremy Howard, Sebastian Ruder, https://arxiv.org/abs/1801.06146

File descriptions

enc.h5 Contains the weights in 'Hierarchical Data Format'
enc.pth Contains the weights in 'Pytorch model format'
itos.pkl (Integers to Strings) contains the vocabulary mapping from ids (0 - 30000) to strings

Sponsor

This work was sponsored by Danish chatbot company BotXO http://www.botxo.co/

Thanks

Thanks to Tobias Lindberg from Damvad Analytics for converting the vectors to pth-format.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
README.md		README.md
danish_enc.h5		danish_enc.h5
danish_enc.pth		danish_enc.pth
danish_itos.pkl		danish_itos.pkl
finnish_enc.h5		finnish_enc.h5
finnish_enc.pth		finnish_enc.pth
finnish_itos.pkl		finnish_itos.pkl
norwegian_enc.h5		norwegian_enc.h5
norwegian_enc.pth		norwegian_enc.pth
norwegian_itos.pkl		norwegian_itos.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scandinavian ULMFiT

Supported Languages:

Paper

File descriptions

Sponsor

Thanks

About

Releases

Packages

Contributors 2

mollerhoj/Scandinavian-ULMFiT

Folders and files

Latest commit

History

Repository files navigation

Scandinavian ULMFiT

Supported Languages:

Paper

File descriptions

Sponsor

Thanks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages