You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:11:14 UTC
[GitHub] [iceberg] rdblue commented on a change in pull request #2560: Spark: add spark extension for identifier fields

rdblue commented on a change in pull request #2560:
URL: https://github.com/apache/iceberg/pull/2560#discussion_r653856249



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
     | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH transform (AS name=identifier)? #replacePartitionField
     | ALTER TABLE multipartIdentifier WRITE writeSpec                                       #setWriteDistributionAndOrdering
+    | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '(' fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')' #setIdentifierFields

Review comment:
       My comment was based on similar reasoning for other DDL. For example, it is better to support both `ADD COLUMNS` and `ADD COLUMN` interchangeably so that users don't get needless syntax errors when they use a plural or singular term with the "wrong" number of columns.
   
   I think that the problem extending that to this is that it isn't clear that `SET IDENTIFIER FIELD x` actually replaces all existing identifier fields, which I think is why you're mentioning not wanting to use the `ADD IDENTIFIER FIELD` syntax. I think that if there is confusion now, then there would be more confusion for users, so maybe it is better to only support `SET IDENTIFIER FIELDS (...)`. Though I would make the parens optional.
   
   > . . . given that all Hive DDLs are in this batch format
   
   I don't think that we want to take inspiration from Hive very much. Better to use Postgres or Trino as reference points.
   
   > If we want to have a way to only add new fields without the need to know current fields, I can also add a ADD IDENTIFIER FIELDS.
   
   This wasn't my intent and I would not want to have confusion, so let's not allow using `FIELD` interchangeably.

##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
     | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH transform (AS name=identifier)? #replacePartitionField
     | ALTER TABLE multipartIdentifier WRITE writeSpec                                       #setWriteDistributionAndOrdering
+    | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '(' fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')' #setIdentifierFields

Review comment:
       Yeah, I would make it possible to use that syntax. No need for users to remember that they need parens when we don't actually need them to parse correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org