Refactor preprocessor architecture to enhance extensibility #693
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses parts of #672
Preparing to add a airflow as a preprocessor target, I noticed that the current architecture is very rigid and difficult to expand upon. This refactoring/reorganization hopefully addresses some of these. Notable changes (and absence of changes) are
No AST traversal or transformation logic has been rewritten
Files outside of
shell_ast
have not been changed*preprocess/preprocess.py
has been changed, but simply to import new files (some new modules were made, as I discuss below).Differing behavior depending on target (vanilla, spec, airflow) is no longer handled by an if statement in
ast_to_ast
.Previously,
replace_df_regions
was a function that ran a if statement on a property ofTransformationState
to check which specificTransformationState
implementation we were dealing with (this would tell us the output target). Now,replace_df_regions
is an instance method ofAbstractTransformationState
. This means that any variation in AST transformation between different output targets can be addressed in a single class that implements ofAbstractTransformationState
.The location of the AST Transformation logic has been moved from
replace_df_regions()
to a method onTransformationState
.For example, this pull request created three implementations of
AbstractTransformationState
:TransformationState
for the vanilla pash behavior,SpeculativeTransformationState
for pash spec, andAirflowTransformationState
for airflow (this one is empty). The ast_to_ast package contains no checks in its logic for checking how to handle AST transformations - they only need to callreplace_df_regions
of whatever trans_options argument they have at hand.The TransformationState code has been moved to a separate file
transformation_options.py
. I do suggest renaming "State" with some more appropriate name, but this is a minor nitpickpreprocess_node_<node_type> functions have been