-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDBC logging and safety improvements #1117
base: master
Are you sure you want to change the base?
Conversation
Introduces a new `strictValidation` parameter to SQL reading methods, defaulting to `true`, to ensure table names and queries follow a valid format. If disabled, a warning is logged, granting users flexibility while maintaining safety by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces JDBC logging and safety improvements by adding strict validation for SQL queries and table names while also updating test cases to prevent SQL injection. Key changes include:
- Adding tests to verify that invalid table names and malicious SQL queries are rejected.
- Refactoring query execution to use prepared statements and introducing a strictValidation flag.
- Enhancing validation methods with detailed logging for both SQL queries and table names.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/h2/h2Test.kt | Added tests for SQL injection prevention and strictValidation behavior |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt | Refactored query execution to use prepared statements; added strictValidation parameter and enhanced SQL validation |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/util.kt | Updated database utility to use prepared statements for improved safety |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
* Checks if a given string contains forbidden patterns or keywords. | ||
* Logs a clear and friendly message if any forbidden pattern is found. | ||
*/ | ||
private fun containsForbiddenPatterns(input: String): Boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we, as DataFrame, are responsible for catching these types of attacks. Allow me to elaborate:
This PR makes it so that we can catch 5 types of attacks, which is good! But I'm pretty sure there are more, and we'll miss all of those. Couldn't we better ask the JDBC implementation or a proven validation library whether the statement is safe instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated a few libraries with statement validation: some of them supports a limited number of databases or have complex API and seems like overengineering or for example licensing is SOMETHING SPECIAL
Pros of Catching Attacks in DataFrame.readSql*:
- Defense in depth: Even if JDBC is secure, a second layer helps prevent human errors in API usage.
- Prevents misusing readSqlQuery for multiple queries or data modifications.
- Improves developer ergonomics: Clear logs/warnings/errors when something sketchy is passed.
- Customizability: The user can turn strictValidation off in edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also mostly protected on JDBC level with PreparedStatement and executeQuery, but...different drivers give different possiblities to push some DROP Table/Database stuff through executeQuery and execute it BEFORE exception
executeQuery("SELECT * FROM users; DROP TABLE X")
could lead to SQL injection, especially if multiple statements are allowed by the DB/driver (in URL for example)
My idea was a 3-tier protection:
✅ Prepared statements
✅ Manual validation (for SELECT-only queries, no ;, no suspicious tokens)
✅ strictValidation as opt-in/out
Together, that's a pretty solid defense-in-depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a good approach indeed :)
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
} | ||
|
||
// Validate the table name structure: letters, numbers, underscores, and dots are allowed | ||
val tableNameRegex = Regex("^[A-Z0-9_]+(\\.[A-Z0-9_]+)*$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex can be a const somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only uppercase letters are allowed? Oh I see, normalizedTableName is made uppercase, got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, do we specifically only allow A-Z? So, not Æ, Ø, Å, etc? Otherwise, it might make more sense to replace [A-Z0-9_]
by unicode category with [\p{L}\p{N}_]
(https://www.regular-expressions.info/unicode.html). If SQL has a specific ISO standard for these names, let's stick to that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree about possibility to use other name, it seems inconsistent with other DataFrame parts, from other hand, I thought that this behavior could be covered with disabling strictValidation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah probably, but it seems unnecessarily strict to me. If you're from a country that uses a different alphabet you will always have to turn off strictValidation
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said I agree, we're supporting different alphabets and symbols in other stuff, better to be aligned with that, I consider this and add test cases (honestly never named in Cyrillic any table in my life)
|
||
// Check if there are balanced quotes (single and double) | ||
val singleQuotes = sqlQuery.count { it == '\'' } | ||
val doubleQuotes = sqlQuery.count { it == '"' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about backticks? are they fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backticks aren't standard across all databases:
- PostgreSQL uses double quotes (") for quoting identifiers.
- SQL Server uses square brackets ([table]).
Or you need to provide different implementation for different databases (and for custom database) or handle it in one manner and for specific cases you could disable strictValidation
The idea of strictValidation was mostly not to PROTECT API or something like that, but avoid some typical use-cases, when somebody copying invalid code, for example from stackoverflow or ChatGPT not to ruin their databases
This pull request includes several changes aimed at enhancing the validation and execution of SQL queries and table names, as well as improving the database connection handling in the
dataframe-jdbc
module.The most important changes include the introduction of strict validation for SQL queries and table names, the replacement of
createStatement
withprepareStatement
for better security, and the addition of new validation methods.Enhancements to SQL Validation:
strictValidation
parameter to methodsreadSqlTable
,readSqlQuery
, andreadDataFrame
to enforce or bypass strict validation of SQL queries and table names. Default istrue
.isValidSqlQuery
andisValidTableName
methods to validate SQL queries and table names, ensuring they do not contain forbidden patterns and have balanced quotes.Security Improvements:
createStatement
withprepareStatement
in various methods to prevent SQL injection and improve security. (dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/util.ktL28-R29)Code Cleanup:
MULTIPLE_SQL_QUERY_SEPARATOR
fromreadJdbc.kt
.Documentation Updates:
strictValidation
parameter.These changes collectively improve the robustness, security, and clarity of the codebase, ensuring safer and more reliable database interactions.