Best Practices for Serializing and Reconstructing sqlglot Expression Trees #2593
-
Hello! I'm using SQLGlot to manual construct expression trees. I've always struggled with this a little in terms of getting the syntax correct (especially knowing exactly what the syntax should be) To illustrate, let's consider a basic example: tree = sqlglot.parse_one("l.first_name") This expression, when printed, results in: (COLUMN this:
(IDENTIFIER this: first_name, quoted: False), table:
(IDENTIFIER this: l, quoted: False)) This can be created using from sqlglot import expressions as exp
exp.Column(this='first_name', table='l', quoted=False) In my use case, I often want to use this as a template, but make small chanegs to the arguments (quoted, table, this). This is straightforward in the above example, but in more complex examples, I find it difficult to know exactly what this syntax should be (and I don't think there's an automatic way of going from the tree to the equivalent code to create it). Consider for example:
This can be created like this:
But this sort of syntax is hard to write and a bit tricky to get right. You can't (AFAIK) get Sqlglot to output it. I've implemented a method to achieve this round-trip serialising and reconstruction, by serialising to a dict: tree to dict and backimport sqlglot
from sqlglot import expressions as exp
def expand_tree_args(node):
if not isinstance(node, sqlglot.Expression):
return node
node_info = {"key": node.__class__.__name__}
expanded_args = {}
for key, value in node.args.items():
if isinstance(value, list):
expanded_args[key] = [expand_tree_args(item) for item in value]
else:
expanded_args[key] = expand_tree_args(value)
node_info["args"] = expanded_args
return node_info
def rebuild_tree_from_args(expanded_args):
if not isinstance(expanded_args, dict):
return expanded_args
if "key" not in expanded_args:
return expanded_args
node_type = expanded_args["key"]
args = expanded_args["args"]
rebuilt_args = {}
for key, value in args.items():
if isinstance(value, list):
rebuilt_args[key] = [rebuild_tree_from_args(item) for item in value]
else:
rebuilt_args[key] = rebuild_tree_from_args(value)
expression_class = getattr(exp, node_type)
return expression_class(**rebuilt_args) For example, this will return Example usage:
where the expanded tree is: {
"key": "Column",
"args": {
"this": {
"key": "Identifier",
"args": {
"this": "first_name",
"quoted": false
}
},
"table": {
"key": "Identifier",
"args": {
"this": "l",
"quoted": false
}
},
"db": null,
"catalog": null
}
}
This gives me a quick way of obtaining generic code to construct similar statements. I can just look at expanded_tree_args (a My question is: Is there an existing feature in sqlglot that facilitates this kind of serialization and reconstruction of expression trees? If not, are there any best practices or recommendations for achieving this in a more standardized or efficient manner? Is the way I'm doing it at all sensible? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey 👋
Yes, there is:
|
Beta Was this translation helpful? Give feedback.
Hey 👋
Yes, there is:
Expression.load
andExpression.dump
already implement serde for SQLGlot expressions. Observe: