-
Notifications
You must be signed in to change notification settings - Fork 27
Article Classification
The Article Classification method is split into two steps:
- Classifying articles as relevant or not
- Classifying relevant articles into Conflict & Violence or Disaster
The Article Relevance classification method is based on combining the results of manually crafted rules as well as a machine learning classifier.
The keyword approach is based on reading through the texts and identifying tokens that might uniquely identify displacement events. The whole series of texts is first tokenized and stemmed and each token is compared to the possible keywords.
This gives results:
- Precision: 0.83
- Recall: 0.81
- F1 Score: 0.81
The general machine learning approach used is to convert documents to a TF-IDF representation, and then to mode topics (or simply reduce dimensionality) by implementing an LSI algorithm. The resulting vectors were then used as features for training a Random Forest (1,000 estimators).
This on its own gives results:
- Precision: 0.79
- Recall: 0.79
- F1 Score: 0.79
The keyword and machine learning results were combined based on the following rule:
Where there is disagreement, (i.e. on approach says not relevant and the other says relevant), choose relevant.
This gives results:
- Precision: 0.84
- Recall: 0.80
- F1 Score: 0.80
The method used for article categorization (among relevant articles) is a pure machine learning approach, as no additional improvement was obtained by incorporating keyword analysis.
The general machine learning approach used is to convert documents to a TF-IDF representation, and then to mode topics (or simply reduce dimensionality) by implementing an LSI algorithm. The resulting vectors were then used as features for training a Random Forest (1,000 estimators).
This gives results:
- Precision: 0.98
- Recall: 0.98
- F1 Score: 0.98