to_csv.py This file compiles the text documents into a single csv file of format (File Name, Contents) for easier mainpulation
prior_to_model.py Preprocessing steps: Punctuation Lowercase stopwords commonwords lemmatization bagofwords(BOW) preprocessing before modelling using K Nearest Neighbour(2) Chose K-Nearest Neighbour since it is an unsupervised learning technique(Output defined by employment and amendment)
extract.py
TF-IDF points the relative importance of each word in each document.Hence took that for extracting informative parts of the documents it is done after the similar pro processing tasks as same as classifier and processed.
P.S: I haven't completed the assignment to the full extent