-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DDL statements to drop branches and tags #23614
base: master
Are you sure you want to change the base?
Add DDL statements to drop branches and tags #23614
Conversation
Codenotify: Notifying subscribers in CODENOTIFY files for diff c00f8af...6a679d8.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local docs build, looks good. Thanks!
3fe4d56
to
ee54cd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level feedback. Most of the code looks good. I will take a closer look on 2nd pass.
Also, I wanted to bring a thought I had about parser extensions. Since many connectors will not support this branching and tagging, I was thinking that maybe we ought to consider designing a SQL syntax plugin extension interface. Spark allows custom syntax extensions through implementing some set of interfaces or bringing your own parser. The upstream iceberg project now maintains their own spark SQL syntax extensions.
I'm not proposing we need that for this PR, but maybe it's something we should start thinking about if connectors start adding more radically different features that would be best left to some syntax extensions/optional plugins, especially for things outside the SQL specification.
} | ||
|
||
@Override | ||
public void checkCanDropTag(ConnectorTransactionHandle transaction, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the granularity of these methods. In other implementation (e.g. spark?) at what granularity do they enforce the ability to do CRUD operations on tags and branches?
I'm thinking about a few cases
- A group or user(s) can only access a certain set of branches or tags
- A group or user(s) can only create branches starting from a specific branch
- A group or user(s) can create tags
I know we're only implementing DROP
but I want to understand the whole story for access control around branches and tags. Would we ever need to pass the branch/tag to these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the granularity of access control methods in Spark for CRUD operations on Iceberg tags and branches is limited by Spark's integration with external systems (such as file systems, catalogs, and security frameworks like Apache Ranger).
For example, Ranger policies can define access controls at the table level, which could be extended to manage specific branches or tag-based access.
And like for cloud-based catalogs like AWS Glue, you can control access to Iceberg metadata (branches and tags) via IAM policies that grant or restrict specific operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. Would the parameters passed here as context have enough information for us to act at a similar granularity? I don't see anything in the method parameters that contains the branch name which I assume we would need to perform access control at a similar level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to discuss around this, Systems like Ranger can define access controls at the table level, column level. So in this case I think access of drop branch & tags could be table based. As per I can think of branch & tag level policies then has to be maintained on engine side if we introduce branch name / tag name in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan What do you think about access control for branches and tags? Would be based on the parent table itself or based on the tags/branches?
My thinking was that since no policies are enforced based on branch/tags via security frameworks, should this honor the same access policies as table? Or if we don't even need access control exposed for dropTag & dropBranch?
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedTestBase.java
Show resolved
Hide resolved
8f4d3fa
to
1e47c05
Compare
@ZacBlanco Can you please take another pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole change overall looks good to me. Some little nits, and one problem for discussing about the behavior of if exists
on branch
and tag
.
return tableExists; | ||
} | ||
|
||
public boolean isbranchExists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public boolean isbranchExists() | |
public boolean isBranchExists() |
@Override | ||
public int hashCode() | ||
{ | ||
return Objects.hash(tableName, tableExists, branchName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you miss branchExists
?
return Objects.equals(tableName, that.tableName) && | ||
Objects.equals(branchName, that.branchName) && | ||
Objects.equals(tableExists, that.tableExists) && | ||
Objects.equals(branchExists, branchExists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.equals(branchExists, branchExists); | |
Objects.equals(branchExists, that.branchExists); |
return Objects.equals(tableName, that.tableName) && | ||
Objects.equals(tagName, that.tagName) && | ||
Objects.equals(tableExists, that.tableExists) && | ||
Objects.equals(tagExists, tagExists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.equals(tagExists, tagExists); | |
Objects.equals(tagExists, that.tagExists); |
@Override | ||
public int hashCode() | ||
{ | ||
return Objects.hash(tableName, tableExists, tagName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed tagExists
?
if (!tableHandleOptional.isPresent()) { | ||
if (!statement.isTableExists()) { | ||
throw new SemanticException(MISSING_TABLE, statement, "Table '%s' does not exist", tableName); | ||
} | ||
return immediateFuture(null); | ||
} | ||
|
||
Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName); | ||
if (optionalMaterializedView.isPresent()) { | ||
if (!statement.isTableExists()) { | ||
throw new SemanticException(NOT_SUPPORTED, statement, "'%s' is a materialized view, and drop tag is not supported", tableName); | ||
} | ||
return immediateFuture(null); | ||
} | ||
|
||
ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName()) | ||
.orElseThrow(() -> new PrestoException(NOT_FOUND, "Catalog does not exist: " + tableName.getCatalogName())); | ||
accessControl.checkCanDropTag(session.getRequiredTransactionId(), session.getIdentity(), session.getAccessControlContext(), tableName); | ||
|
||
metadata.dropTag(session, tableHandleOptional.get(), Optional.of(statement.getTagName().toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, did not consider the if exists
flag for tag
.
else { | ||
throw new PrestoException(NOT_FOUND, format("Branch %s doesn't exist in table %s", branchName.get(), icebergTableHandle.getSchemaTableName().getTableName())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take the value of if exists
into consider here?
else { | ||
throw new PrestoException(NOT_FOUND, format("Tag %s doesn't exist in table %s", tagName.get(), icebergTableHandle.getSchemaTableName().getTableName())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, should we take the value of if exists
into consider here?
assertEquals(icebergTable.refs().size(), 2); | ||
assertQueryFails("ALTER TABLE test_table_branch DROP BRANCH 'testBranchNotExist'", "Branch testBranchNotExist doesn't exist in table test_table_branch"); | ||
assertQuerySucceeds("ALTER TABLE test_table_branch DROP BRANCH IF EXISTS 'testBranch2'"); | ||
assertQueryFails("ALTER TABLE test_table_branch DROP BRANCH IF EXISTS 'testBranchNotExist'", "Branch testBranchNotExist doesn't exist in table test_table_branch"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a correct behavior? As I understand, this statement should not fail.
assertEquals(icebergTable.refs().size(), 2); | ||
assertQueryFails("ALTER TABLE test_table_tag DROP TAG 'testTagNotExist'", "Tag testTagNotExist doesn't exist in table test_table_tag"); | ||
assertQuerySucceeds("ALTER TABLE test_table_tag DROP TAG IF EXISTS 'testTag2'"); | ||
assertQueryFails("ALTER TABLE test_table_tag DROP TAG IF EXISTS 'testTagNotExist'", "Tag testTagNotExist doesn't exist in table test_table_tag"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, should this statement fail?
1e47c05
to
6a679d8
Compare
Description
Add DDL statements to drop branches and tags
Motivation and Context
Resolves #22028
Impact
Resolves #22028
SQL support for dropping a branch from a table :
SQL support for dropping a tag from a table :
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.