Handling of stepwise information collected by StepLogCallback #25

danielwolff1 · 2025-02-05T14:09:02Z

Problem

The currently generated csv files from the StepLogCallback class are very irregular and difficult to parse, because some entries of the observation column are automatically converted to strings by pandas during csv export. @clemens-fricke came up with a way of parsing the csv file back into a pandas data frame, which was successfully tested on artificially generated csv files, however did not work on a recent csv file that I produced during an actual experiment run, see below.

📎 step_log.csv

Solution suggested by @clemens-fricke

⚠ Does not work on the provided step_log.csv!

def f_array(x):
    return np.array(ast.literal_eval(re.sub(r'\s+', ',', x)), dtype=float)

a = pd.read_csv("step_log.csv", converters={"actions": f_array, "observations": f_array, "rewards": f_array})

Alternative solution for parsing

# Function to parse lists (for actions, observations, and rewards)
def parse_lists(value, make2d=False):
    if isinstance(value, str):
        try:
            # first remove brackets
            value = value.strip("[").strip("]")
            # remove trailing and leading whitespaces
            value = value.strip()
            # the remaining string should contain the values, separated by spaces
            entries = value.split()
            # convert entries to float
            parsed_values = np.array([float(number) for number in entries], dtype=float)
            # Reshape to keep the 2D structure if necessary
            return parsed_values.reshape(1, -1) if make2d else parsed_values
        except ValueError:
            print(f"ERROR: Cannot convert value {value} to list of floats.")
            return np.nan
    return value

# Load CSV file
df = pd.read_csv(step_log_path, dtype=str, index_col="timesteps")  # Read everything as strings initially

# Convert numeric columns
df["episodes"] = df["episodes"].astype(int)

# Apply parsing functions
df["observations"] = df["observations"].apply(lambda x: parse_lists(x, True))
df["actions"] = df["actions"].apply(parse_lists)
df["rewards"] = df["rewards"].apply(parse_lists)

This parses the provided step_log.csv file correctly, however is not straightforward.

Alternative solution

The problem that we're currently facing is purely related to the csv export. Instead of providing a utility function based on regular expressions that we might have to update if we encounter new edge cases in the future that we have not yet observed in our current experiments, we could modify the export of the data and use line-based json format as an alternative (line based to allow for appending):

df.to_json("data.jsonl", mode="a", orient="records", lines=True)

Changing to this type of export would have the benefit of retaining the original data structure and it could be parsed with a single line without weird conversions during parsing:

df = pd.read_json("data.jsonl", orient="records", lines=True)

Conclusion

I think I have a clear favourite solution here :D But I wanted to raise this issue first, I'm open for discussion and different opinions :)

The text was updated successfully, but these errors were encountered:

danielwolff1 · 2025-02-06T09:20:07Z

UPDATE: I slightly modified (and simplified) the solution for parsing the generated csv above.

Turns out that sometimes using your brain is actually a good alternative to continuously (and desperately) prompting ChatGPT when the latter is not able to find a solution for all the edge cases. And turns out that the solution is often much simpler than you think 🙃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of stepwise information collected by StepLogCallback #25

Handling of stepwise information collected by StepLogCallback #25

danielwolff1 commented Feb 5, 2025 •

edited

Loading

danielwolff1 commented Feb 6, 2025

Handling of stepwise information collected by StepLogCallback #25

Handling of stepwise information collected by StepLogCallback #25

Comments

danielwolff1 commented Feb 5, 2025 • edited Loading

Problem

Solution suggested by @clemens-fricke

Alternative solution for parsing

Alternative solution

Conclusion

danielwolff1 commented Feb 6, 2025

danielwolff1 commented Feb 5, 2025 •

edited

Loading