Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor preprocessor architecture to enhance extensibility #693

Closed
wants to merge 5 commits into from

Conversation

Forthoney
Copy link
Contributor

@Forthoney Forthoney commented Oct 26, 2023

Addresses parts of #672
Preparing to add a airflow as a preprocessor target, I noticed that the current architecture is very rigid and difficult to expand upon. This refactoring/reorganization hopefully addresses some of these. Notable changes (and absence of changes) are

No AST traversal or transformation logic has been rewritten

Files outside of shell_ast have not been changed*

preprocess/preprocess.py has been changed, but simply to import new files (some new modules were made, as I discuss below).

Differing behavior depending on target (vanilla, spec, airflow) is no longer handled by an if statement in ast_to_ast.

Previously, replace_df_regions was a function that ran a if statement on a property of TransformationState to check which specific TransformationState implementation we were dealing with (this would tell us the output target). Now, replace_df_regions is an instance method of AbstractTransformationState. This means that any variation in AST transformation between different output targets can be addressed in a single class that implements of AbstractTransformationState.

The location of the AST Transformation logic has been moved from replace_df_regions() to a method on TransformationState.

For example, this pull request created three implementations of AbstractTransformationState: TransformationState for the vanilla pash behavior, SpeculativeTransformationState for pash spec, and AirflowTransformationState for airflow (this one is empty). The ast_to_ast package contains no checks in its logic for checking how to handle AST transformations - they only need to call replace_df_regions of whatever trans_options argument they have at hand.

The TransformationState code has been moved to a separate file transformation_options.py. I do suggest renaming "State" with some more appropriate name, but this is a minor nitpick

preprocess_node_<node_type> functions have been

  • moved to its own module
  • no longer reliant on a large dictionary to dispatch the appropriate function
  • type-annotated.

@github-actions
Copy link

OS:ubuntu-20.04
Thu Oct 26 20:29:25 UTC 2023
intro: 2/2 tests passed.
interface: 41/41 tests passed.
compiler: 54/54 tests passed.

@Forthoney Forthoney closed this Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant