Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 851 Bytes

README.md

File metadata and controls

20 lines (16 loc) · 851 Bytes

dbs

to_csv.py This file compiles the text documents into a single csv file of format (File Name, Contents) for easier mainpulation

prior_to_model.py Preprocessing steps: Punctuation Lowercase stopwords commonwords lemmatization bagofwords(BOW) preprocessing before modelling using K Nearest Neighbour(2) Chose K-Nearest Neighbour since it is an unsupervised learning technique(Output defined by employment and amendment)

extract.py
TF-IDF points the relative importance of each word in each document.Hence took that for extracting informative parts of the documents it is done after the similar pro processing tasks as same as classifier and processed.

P.S: I haven't completed the assignment to the full extent