Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Test holidays for multiple countries is failing for Pytest 3.12 #1599

Open
MaiBe-ctrl opened this issue Jun 25, 2024 · 2 comments
Open

Comments

@MaiBe-ctrl
Copy link
Collaborator

MaiBe-ctrl commented Jun 25, 2024

Prerequisites
Python 3.12

Describe the bug
We are getting the following error in our pipeline test: FAILED tests/test_event_utils.py::test_multiple_countries - AssertionError: assert 'Erster Weihnachtstag' not in {'Christi Himmelfahrt', 'Christi Himmelfahrt; Erster Mai', 'Christmas Day', 'Columbus Day', 'Erster Mai', 'Erster Weihnachtstag', ...}
This is mainly due to the fact that we are getting the holidays names for the US in English and the ones for Germany in German. As a result, we would have duplicated holidays having names in different languages.

Expected behavior
We would expect at the end to get a unique list of holidays(no matter what the language is). The assertion shouldn't fail.

Additional context

This only happens for the specific case of python 3.12. We tried solving this issue by creating a holiday name as a combination of both its occurrences and make sure to have at the end the holiday appearing only one time in the list.

We should find a cleaner way to do it without adding more complexity to the code.

Add any other context about the problem here.

Pytest to re-add to tests/test_event_utils.py

def test_multiple_countries():
    # test if multiple countries are added
    df = pd.read_csv(PEYTON_FILE, nrows=NROWS)
    m = NeuralProphet(
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        learning_rate=LR,
    )
    m.add_country_holidays(country_name=["US", "Germany"])
    m.fit(df, freq="D")
    m.predict(df)
    # get the name of holidays and compare that no holiday is repeated
    holiday_names = m.model.config_holidays.holiday_names
    assert "Independence Day" in holiday_names
    assert "Christmas Day" in holiday_names
    assert "Erster Weihnachtstag" not in holiday_names
    assert "Neujahr" not in holiday_names
@lookslikeitsnot
Copy link

Assuming the expected result from identical holiday dates across multiple countries is to only store the first occurrence (as the test case seems to imply) and not some combination of both
e.g.:

us_holidays = {
    datetime.date(2008, 1, 1): "New Year's Day",
    datetime.date(2008, 7, 4): "Independence Day",
    datetime.date(2008, 12, 25): "Christmas Day",
}
de_holidays = {
    datetime.date(2008, 1, 1): "Neujahr",
    datetime.date(2008, 3, 21): "Karfreitag",
    datetime.date(2008, 12, 25): "Erster Weihnachtstag",
}
# combining into
all_holidays = {
    datetime.date(2008, 1, 1): "New Year's Day",
    datetime.date(2008, 3, 21): "Karfreitag",
    datetime.date(2008, 7, 4): "Independence Day",
    datetime.date(2008, 12, 25): "Christmas Day",
}
# resulting in holiday_names in test_multiple_countries
holiday_names = {'New Year's Day', 'Karfreitag', 'Independence Day', 'Christmas Day'}

tweaking current implementation of event_utils.get_all_holidays like such

def get_all_holidays(years, country):
    """
    Make dataframe of country specific holidays for given years and countries
    Parameters
    ----------
        year_list : list
            List of years
        country : str, list, dict
            List of country names and optional subdivisions
    Returns
    -------
        pd.DataFrame
            Containing country specific holidays df with columns 'ds' and 'holiday'
    """
    # convert to list if not already
    if isinstance(country, str):
        country = {country: None}
    elif isinstance(country, list):
        country = dict(zip(country, [None] * len(country)))

    all_countries_specific_holidays = dict()
    # iterate over countries and combine holidays by date
    # since HolidayBase.update replaces existing holidays,
    # iterate in reverse order to keep original countries holiday order
    for single_country, subdivision in reversed(country.items()):
        # For compatibility with Turkey as "TU" cases.
        single_country = "TUR" if single_country == "TU" else single_country
        # get dict of dates and their holiday name
        single_country_specific_holidays = country_holidays(
            country=single_country, subdiv=subdivision, years=years, expand=True, observed=False, language="en"
        )
        all_countries_specific_holidays.update(single_country_specific_holidays)

    all_holidays = defaultdict(list)
    for date, name in all_countries_specific_holidays.items():
        all_holidays[name].append(pd.to_datetime(date))

    return all_holidays

would allow for the test to pass.

@lookslikeitsnot
Copy link

another option would be to use language 'en_US' instead of 'en' which isn't supported by holidays replacing

single_country_specific_holidays = country_holidays(
            country=single_country, subdiv=subdivision, years=years, expand=True, observed=False, language="en"
)

with

single_country_specific_holidays = country_holidays(
            country=single_country, subdiv=subdivision, years=years, expand=True, observed=False, language="en_US"
)

in event_utils.get_all_holidays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants