Return a indicator (in %) of any text, using dictionnary.
TQI is a node.js written module which get any text data and return you a number regarding the quality of it.
TQI compares your text to a list of words comming from large affix dictionnaries in some languages.
TQI supports all languages present in the list of dictionaries for Hunspell.
You could use all languages which are in nodes_modules/dictionnaries or a personnel dictionnary.
Hunspell (version >= 1.3)
sudo apt-get install hunspell
npm install --save text-quality-indicator
// Load NPM Module
const Tqi = require('text-quality-indicator'),
tqi = new Tqi();
// correct/mispelled words are disable by default. To activate it :
// you can also set a custom timeout for hunspell calls (default to 5 sec)
const options = { wordsResult: true, timeout: 5 }
// Analyze a file
tqi.analyze(file.txt, options).then((result) => {
console.log("result : ", result);
}
// Will return you :
{ correct: 3,
misspelled: 0,
rate: 100,
words: { correct: [ 'somme', 'english', 'words' ], mispelled: [] }
}
When you init TQI you can send an array of langage's code, a path to a personnal dictionnary or a mix of both:
const Tqi = require('text-quality-indicator'),
tqi = new Tqi("en"),
tqiEnFr = new Tqi(["en", "fr"]);
tqiEnFrAndMyDictionnary = new Tqi(["en", "fr", "/path/to/my/dictionnary"]);
npm install -g text-quality-indicator
tqi --help
-
On a sample french txt files containing 1 "bad word":
cat ./test/data/fr-sample.txt -> En se réveillant un matin après des rêves agités, Gregor Samsa se retrouva, dans son lit, métamorphosé en un monstrueux insecte.
Lauch TQI with fr lang option :
tqi -d fr ./test/data/fr-sample.txt
Will return you:
fr-sample.txt => { correct: 20, mispelled: 1, rate: 95.23809523809523 }
-
On an english folder containing txts :
tqi /path/to/folder
English is the default lang used.
You can ask cli to send back you the corect/mispelled words :
./bin/cli.js -w ./pathToTxt.txt