unsupervised-learning-case-study

Getting started

Please install all requirements from the requirements.txt at first. Ensure compatibility with your python version. As many of the libraries used have direct ties to CUDA and are therefore challenging to run efficiently in a venv, I did not build this as a poetry project.

Data

Get the data from arxiv papers published and place them in the folder unsupervised_learning_case_study/data folder as a json.

Pre-filtering the data

Run the exploration notebook once, so you receive a pre-filtered dataset for the following steps. Also you will receive some general graphs about the distribution of the data.

Training the model

Run the module unsupervised_learning_case_study.pipeline.train_model first by running:

python -m unsupervised_learning_case_study.pipeline.train_model

Categorizing the papers

Run the module unsupervised_learning_case_study.pipeline.categorize_paper next to categorize the papers.

python -m unsupervised_learning_case_study.pipeline.categorize_paper

Create visualizations

Create the visualizations using unsupervised_learning_case_study.pipeline.viz.

python -m unsupervised_learning_case_study.pipeline.viz

Notes on locally stored data

Each of the pipeline steps will store different data locally (corpus, embeddings, model, visualizations,...). These enable you to e.g. not retrain the model, when you want to classify a new (updated?) set of papers. Just run only the categorization and the visualization part, so you do not need to wait for a lengthy training process.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
unsupervised_learning_case_study		unsupervised_learning_case_study
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
distance_map.html		distance_map.html
hierarchy.html		hierarchy.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unsupervised-learning-case-study

Getting started

Data

Pre-filtering the data

Training the model

Categorizing the papers

Create visualizations

Notes on locally stored data

About

Releases

Packages

Languages

steffdlh/unsupervised-learning-case-study

Folders and files

Latest commit

History

Repository files navigation

unsupervised-learning-case-study

Getting started

Data

Pre-filtering the data

Training the model

Categorizing the papers

Create visualizations

Notes on locally stored data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages