Skip to content

Latest commit

 

History

History
6 lines (5 loc) · 771 Bytes

README.md

File metadata and controls

6 lines (5 loc) · 771 Bytes

Malware_detection

In the experiments I used static analysis to extract features from PE files. In the model selection, I choose several commonly used models, including GaussianNB, MLP, Linear Regression, Decision Tree, and Gradient Boosting. I design three different data preprocessing methods to train the model, the first one retains the extracted feature information, the second one uses feature selection to process the data set and filters out some features before training the model, and the third one uses an AutoEncoder to encode original features to their latent representations. Finally, conclusions are drawn by comparing the performance of different models on different data. I concluded that DecisionTree has the highest accuracy, the accuracy is 99.79%.