dbs

to_csv.py This file compiles the text documents into a single csv file of format (File Name, Contents) for easier mainpulation

prior_to_model.py Preprocessing steps: Punctuation Lowercase stopwords commonwords lemmatization bagofwords(BOW) preprocessing before modelling using K Nearest Neighbour(2) Chose K-Nearest Neighbour since it is an unsupervised learning technique(Output defined by employment and amendment)

extract.py
TF-IDF points the relative importance of each word in each document.Hence took that for extracting informative parts of the documents it is done after the similar pro processing tasks as same as classifier and processed.

P.S: I haven't completed the assignment to the full extent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dbs

Files

README.md

Latest commit

History

README.md

File metadata and controls

dbs