Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HistoricalSignal: Series names need to be equal #234

Closed
Impelon opened this issue Aug 23, 2024 · 0 comments · Fixed by #235
Closed

HistoricalSignal: Series names need to be equal #234

Impelon opened this issue Aug 23, 2024 · 0 comments · Fixed by #235

Comments

@Impelon
Copy link
Contributor

Impelon commented Aug 23, 2024

I have found another minor error message that is counter-intuitive.
This one happens if one creates a HistoricalSignal from two pd.Series with different names. _resample_to_frequency will then fail for anything except bfill.

Tested with vessim 0.8.0:

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> start = datetime.now()
>>> actual = pd.Series(range(101), pd.date_range(start, start + timedelta(minutes=100), freq="1min"), name="A")
>>> forecast = pd.Series(range(100, 201), pd.date_range(start, start + timedelta(minutes=100), freq="1min"), name="F")
>>> signal = HistoricalSignal(actual, forecast)
>>> signal.forecast(start + timedelta(minutes=5), start + timedelta(minutes=15), resample_method="linear")  # works
{...}
>>> signal.forecast(start + timedelta(minutes=5), start + timedelta(minutes=15), frequency="5s", resample_method="linear")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 312, in forecast
    return self._resample_to_frequency(
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 349, in _resample_to_frequency
    data = np.insert(data, 0, self.now(start_time, column))
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 192, in now
    times, values = self._actual[_get_column_name(self._actual, column)]
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 380, in _get_column_name
    raise ValueError(f"Cannot retrieve data for column '{column}'.")
ValueError: Cannot retrieve data for column 'F'.

As far as I can tell from looking at the source, this would happen even after applying the fixes in #232.

The code tries to look up data in actual with the series name from forecast.
If one looks at the source code for __init__, this behaviour makes sense, as the name of the series is used instead of a column name, and when dealing with DataFrames with multiple columns it is clear that both DataFrames should contain the same column name.

When using Series, this seems a bit counter intuitive though, at least to me, because there is only one column, and little care is often given to the name of the Series object. (It is true that Series are not mentioned in the init docstring at all, but it is quite evident how to use HistoricalSignal with them.)

Even worse, this also happens if the series name is left blank (and the other is not), as it was the case when I first came across the behaviour. The associated column name will be the string "None", resulting in a ValueError: Cannot retrieve data for column 'None'. Which is extra confusing, seeing as _get_column_name specifically has a check for a column of None.

I'd suggest either:

  1. Mention in the doc for HistoricalSignal that series names are used as column names (and thus need to be the same).
  2. Not converting a series name of None to "None" (though this only solves the case in which one of the series has an unspecified name).
  3. Not using the series name as column name in _resample_to_frequency and instead always use a column name of None when dealing with one-column data in this line:
    data = np.insert(data, 0, self.now(start_time, column))

    i.e. self.now(start_time, None if len(data) == 1 else column)
@marvin-steinke marvin-steinke mentioned this issue Sep 3, 2024
marvin-steinke added a commit that referenced this issue Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant