-
Hi, I am analyzing a large group of super long sql strings. My task is to check if a few expressions exist in a specific order, then return the relevant positions in the sql string, which will be helpful to locate the problems in the queries. For example, in this example query, I would like to check whether some_func is in the subquery. The expected output is the positions of some_func and the relevant FROM.
I am able to get some_func and FROM in the AST by calling Regex is not useful in my scenario, because the sementic of the query cannot be captured. In the previous example, there could be multiple FROM before some_func. But they do not necessarily indicate the target some_func subquery. Any help is greatly appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
One hacky idea: After finding some_func and FROM, wrap them in a from sqlglot import exp, parse_one
q = """
WITH T1 as (SELECT * FROM T2)
SELECT *
FROM (
SELECT a, some_func(b)
FROM T3) as T4, T1
"""
ast = parse_one(q)
# I'm not sure any of this makes sense for your use case... but you get the point
some_funcs = [
func
for func in ast.find_all(exp.Func)
if func.name == "some_func"
]
for func in some_funcs:
from_ = func.find_ancestor(exp.From)
wrapped = exp.Tag(
this=from_.copy(),
prefix=f"<start>",
postfix=f"<stop>"
)
from_.replace(wrapped)
print(ast.sql())
# WITH T1 AS (SELECT * FROM T2) SELECT *<start> FROM (SELECT a, SOME_FUNC(b) FROM T3) AS T4, T1<stop> Now you can use regexes: import re
regex = re.compile(r"<start>(?P<expression>.*?)<stop>")
match = regex.search(ast.sql()) It'll be a little more involved, of course, as you'll need to replace the match with the "expression" group and keep track of changes to "match.start()" and "match.end()"... but I think you get the point. Again, just an idea, albeit a bit clunky. |
Beta Was this translation helpful? Give feedback.
-
if you're inside the scope of the parser, you can do self._find_sql(start_token, end_token) to get the actual sql. you need to be inside of the parser so that you have the actual token which contains the line and column number |
Beta Was this translation helpful? Give feedback.
One hacky idea:
After finding some_func and FROM, wrap them in a
Tag
expression: