Skip to content

Commit

Permalink
Fix flaky test that calculated wrongly the ratio between keywords len…
Browse files Browse the repository at this point in the history
…ght. (#72)

This was caused when calculating keywords using stopwords. As there are
more words that are filtering, the list of concepts gets smaller and the error
gets bigger.

Because of this, one of every five runs failed because the ratio between runs
was slightly different than two.

This was fixed expecting that the ratio of the runs is double with 5% tolerance
instead of up to one decimal place.
  • Loading branch information
fbarrios authored Jun 22, 2019
1 parent be9a101 commit d9252a2
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion test/test_keywords.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from summa.keywords import keywords
from summa.preprocessing.textcleaner import deaccent
from numpy import isclose
from .utils import get_text_from_test_data


Expand Down Expand Up @@ -94,7 +95,10 @@ def test_keywords_ratio_wstopwords(self):
selected_docs_20 = keywords(text, ratio=0.2, split=True, additional_stopwords=additional_stoplist)
selected_docs_40 = keywords(text, ratio=0.4, split=True, additional_stopwords=additional_stoplist)

self.assertAlmostEqual(float(len(selected_docs_40)) / len(selected_docs_20), 0.4 / 0.2, places=1)
actual_ratio = float(len(selected_docs_40)) / len(selected_docs_20)
expected_ratio = 0.4 / 0.2
# Expect the same ratio with a relative tolerance of 5%.
self.assertTrue(isclose(actual_ratio, expected_ratio, rtol=0.5), "Ratio between number of keywords should be 2.")

def test_keywords_consecutive_keywords(self):
text = "Rabbit populations known to be plentiful, large, and diverse \
Expand Down

0 comments on commit d9252a2

Please sign in to comment.