Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DDL statements to drop branches and tags #23614

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

agrawalreetika
Copy link
Member

@agrawalreetika agrawalreetika commented Sep 10, 2024

Description

Add DDL statements to drop branches and tags

Motivation and Context

Resolves #22028

Impact

Resolves #22028

SQL support for dropping a branch from a table :

ALTER TABLE users DROP BRANCH 'branch1';

SQL support for dropping a tag from a table :

ALTER TABLE users DROP TAG 'tag1';

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* DDL support for dropping a branch from a table. :pr:`23614`
* DDL support for dropping a tag from a table. :pr:`23614`

Iceberg Connector Changes
* Support for dropping a branch from an Iceberg table. :pr:`23614`
* Support for dropping a branch from an Iceberg table.  :pr:`23614`

Copy link

github-actions bot commented Sep 10, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff c00f8af...6a679d8.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

steveburnett
steveburnett previously approved these changes Sep 10, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build, looks good. Thanks!

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level feedback. Most of the code looks good. I will take a closer look on 2nd pass.

Also, I wanted to bring a thought I had about parser extensions. Since many connectors will not support this branching and tagging, I was thinking that maybe we ought to consider designing a SQL syntax plugin extension interface. Spark allows custom syntax extensions through implementing some set of interfaces or bringing your own parser. The upstream iceberg project now maintains their own spark SQL syntax extensions.

I'm not proposing we need that for this PR, but maybe it's something we should start thinking about if connectors start adding more radically different features that would be best left to some syntax extensions/optional plugins, especially for things outside the SQL specification.

}

@Override
public void checkCanDropTag(ConnectorTransactionHandle transaction, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName)
Copy link
Contributor

@ZacBlanco ZacBlanco Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about the granularity of these methods. In other implementation (e.g. spark?) at what granularity do they enforce the ability to do CRUD operations on tags and branches?

I'm thinking about a few cases

  1. A group or user(s) can only access a certain set of branches or tags
  2. A group or user(s) can only create branches starting from a specific branch
  3. A group or user(s) can create tags

I know we're only implementing DROP but I want to understand the whole story for access control around branches and tags. Would we ever need to pass the branch/tag to these methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the granularity of access control methods in Spark for CRUD operations on Iceberg tags and branches is limited by Spark's integration with external systems (such as file systems, catalogs, and security frameworks like Apache Ranger).
For example, Ranger policies can define access controls at the table level, which could be extended to manage specific branches or tag-based access.
And like for cloud-based catalogs like AWS Glue, you can control access to Iceberg metadata (branches and tags) via IAM policies that grant or restrict specific operations.

Copy link
Contributor

@ZacBlanco ZacBlanco Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. Would the parameters passed here as context have enough information for us to act at a similar granularity? I don't see anything in the method parameters that contains the branch name which I assume we would need to perform access control at a similar level.

Copy link
Member Author

@agrawalreetika agrawalreetika Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to discuss around this, Systems like Ranger can define access controls at the table level, column level. So in this case I think access of drop branch & tags could be table based. As per I can think of branch & tag level policies then has to be maintained on engine side if we introduce branch name / tag name in here?

Copy link
Member Author

@agrawalreetika agrawalreetika Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan What do you think about access control for branches and tags? Would be based on the parent table itself or based on the tags/branches?
My thinking was that since no policies are enforced based on branch/tags via security frameworks, should this honor the same access policies as table? Or if we don't even need access control exposed for dropTag & dropBranch?

@agrawalreetika agrawalreetika force-pushed the iceberg-tag-branch-drop branch 2 times, most recently from 8f4d3fa to 1e47c05 Compare September 11, 2024 21:18
@agrawalreetika
Copy link
Member Author

@ZacBlanco Can you please take another pass?

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole change overall looks good to me. Some little nits, and one problem for discussing about the behavior of if exists on branch and tag.

return tableExists;
}

public boolean isbranchExists()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public boolean isbranchExists()
public boolean isBranchExists()

@Override
public int hashCode()
{
return Objects.hash(tableName, tableExists, branchName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss branchExists?

return Objects.equals(tableName, that.tableName) &&
Objects.equals(branchName, that.branchName) &&
Objects.equals(tableExists, that.tableExists) &&
Objects.equals(branchExists, branchExists);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Objects.equals(branchExists, branchExists);
Objects.equals(branchExists, that.branchExists);

return Objects.equals(tableName, that.tableName) &&
Objects.equals(tagName, that.tagName) &&
Objects.equals(tableExists, that.tableExists) &&
Objects.equals(tagExists, tagExists);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Objects.equals(tagExists, tagExists);
Objects.equals(tagExists, that.tagExists);

@Override
public int hashCode()
{
return Objects.hash(tableName, tableExists, tagName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed tagExists?

Comment on lines 55 to 74
if (!tableHandleOptional.isPresent()) {
if (!statement.isTableExists()) {
throw new SemanticException(MISSING_TABLE, statement, "Table '%s' does not exist", tableName);
}
return immediateFuture(null);
}

Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMetadataResolver(session).getMaterializedView(tableName);
if (optionalMaterializedView.isPresent()) {
if (!statement.isTableExists()) {
throw new SemanticException(NOT_SUPPORTED, statement, "'%s' is a materialized view, and drop tag is not supported", tableName);
}
return immediateFuture(null);
}

ConnectorId connectorId = metadata.getCatalogHandle(session, tableName.getCatalogName())
.orElseThrow(() -> new PrestoException(NOT_FOUND, "Catalog does not exist: " + tableName.getCatalogName()));
accessControl.checkCanDropTag(session.getRequiredTransactionId(), session.getIdentity(), session.getAccessControlContext(), tableName);

metadata.dropTag(session, tableHandleOptional.get(), Optional.of(statement.getTagName().toString()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, did not consider the if exists flag for tag.

Comment on lines 645 to 649
else {
throw new PrestoException(NOT_FOUND, format("Branch %s doesn't exist in table %s", branchName.get(), icebergTableHandle.getSchemaTableName().getTableName()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take the value of if exists into consider here?

Comment on lines 659 to 665
else {
throw new PrestoException(NOT_FOUND, format("Tag %s doesn't exist in table %s", tagName.get(), icebergTableHandle.getSchemaTableName().getTableName()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, should we take the value of if exists into consider here?

assertEquals(icebergTable.refs().size(), 2);
assertQueryFails("ALTER TABLE test_table_branch DROP BRANCH 'testBranchNotExist'", "Branch testBranchNotExist doesn't exist in table test_table_branch");
assertQuerySucceeds("ALTER TABLE test_table_branch DROP BRANCH IF EXISTS 'testBranch2'");
assertQueryFails("ALTER TABLE test_table_branch DROP BRANCH IF EXISTS 'testBranchNotExist'", "Branch testBranchNotExist doesn't exist in table test_table_branch");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a correct behavior? As I understand, this statement should not fail.

assertEquals(icebergTable.refs().size(), 2);
assertQueryFails("ALTER TABLE test_table_tag DROP TAG 'testTagNotExist'", "Tag testTagNotExist doesn't exist in table test_table_tag");
assertQuerySucceeds("ALTER TABLE test_table_tag DROP TAG IF EXISTS 'testTag2'");
assertQueryFails("ALTER TABLE test_table_tag DROP TAG IF EXISTS 'testTagNotExist'", "Tag testTagNotExist doesn't exist in table test_table_tag");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, should this statement fail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add DDL statements to drop branches and tags
5 participants