HistoricalSignal: Series names need to be equal #234

Impelon · 2024-08-23T14:32:04Z

I have found another minor error message that is counter-intuitive.
This one happens if one creates a HistoricalSignal from two pd.Series with different names. _resample_to_frequency will then fail for anything except bfill.

Tested with vessim 0.8.0:

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> start = datetime.now()
>>> actual = pd.Series(range(101), pd.date_range(start, start + timedelta(minutes=100), freq="1min"), name="A")
>>> forecast = pd.Series(range(100, 201), pd.date_range(start, start + timedelta(minutes=100), freq="1min"), name="F")
>>> signal = HistoricalSignal(actual, forecast)
>>> signal.forecast(start + timedelta(minutes=5), start + timedelta(minutes=15), resample_method="linear")  # works
{...}
>>> signal.forecast(start + timedelta(minutes=5), start + timedelta(minutes=15), frequency="5s", resample_method="linear")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 312, in forecast
    return self._resample_to_frequency(
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 349, in _resample_to_frequency
    data = np.insert(data, 0, self.now(start_time, column))
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 192, in now
    times, values = self._actual[_get_column_name(self._actual, column)]
  File "/home/impelon/venv/lib/python3.8/site-packages/vessim/signal.py", line 380, in _get_column_name
    raise ValueError(f"Cannot retrieve data for column '{column}'.")
ValueError: Cannot retrieve data for column 'F'.

As far as I can tell from looking at the source, this would happen even after applying the fixes in #232.

The code tries to look up data in actual with the series name from forecast.
If one looks at the source code for __init__, this behaviour makes sense, as the name of the series is used instead of a column name, and when dealing with DataFrames with multiple columns it is clear that both DataFrames should contain the same column name.

When using Series, this seems a bit counter intuitive though, at least to me, because there is only one column, and little care is often given to the name of the Series object. (It is true that Series are not mentioned in the init docstring at all, but it is quite evident how to use HistoricalSignal with them.)

Even worse, this also happens if the series name is left blank (and the other is not), as it was the case when I first came across the behaviour. The associated column name will be the string "None", resulting in a ValueError: Cannot retrieve data for column 'None'. Which is extra confusing, seeing as _get_column_name specifically has a check for a column of None.

I'd suggest either:

Mention in the doc for HistoricalSignal that series names are used as column names (and thus need to be the same).
Not converting a series name of None to "None" (though this only solves the case in which one of the series has an unspecified name).
Not using the series name as column name in _resample_to_frequency and instead always use a column name of None when dealing with one-column data in this line:

vessim/vessim/signal.py

Line 349 in 648d3b3

data = np.insert(data, 0, self.now(start_time, column))

i.e. self.now(start_time, None if len(data) == 1 else column)

The text was updated successfully, but these errors were encountered:

Fix #234

marvin-steinke mentioned this issue Sep 3, 2024

Fix #234 #235

Merged

marvin-steinke added a commit that referenced this issue Sep 3, 2024

Fix #234

b2cb601

marvin-steinke closed this as completed in #235 Sep 9, 2024

marvin-steinke closed this as completed in 79d2c0c Sep 9, 2024

marvin-steinke added a commit that referenced this issue Sep 9, 2024

Merge pull request #235 from dos-group/fix_#234

5bdcb81

Fix #234

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HistoricalSignal: Series names need to be equal #234

HistoricalSignal: Series names need to be equal #234

Impelon commented Aug 23, 2024 •

edited

Loading

HistoricalSignal: Series names need to be equal #234

HistoricalSignal: Series names need to be equal #234

Comments

Impelon commented Aug 23, 2024 • edited Loading

Impelon commented Aug 23, 2024 •

edited

Loading