You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "jackye1995 (via GitHub)" <gi...@apache.org> on 2023/01/21 20:11:37 UTC

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6637: Spark: Spark SQL Extensions for create tag

jackye1995 commented on code in PR #6637:
URL: https://github.com/apache/iceberg/pull/6637#discussion_r1083330622


##########
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##########
@@ -168,34 +169,61 @@ fieldList
     ;
 
 nonReserved
-    : ADD | ALTER | AS | ASC | BY | CALL | DESC | DROP | FIELD | FIRST | LAST | NULLS | ORDERED | PARTITION | TABLE | WRITE
-    | DISTRIBUTED | LOCALLY | UNORDERED | REPLACE | WITH | IDENTIFIER_KW | FIELDS | SET
-    | TRUE | FALSE
+    : ADD | ALTER | AS | ASC | BY | CALL | CREATE | DAYS | DESC | DROP | FIELD | FIRST | HOURS | LAST | NULLS | OF | ORDERED | PARTITION | TABLE | WRITE
+    | DISTRIBUTED | LOCALLY | MINUTES | UNORDERED | REPLACE | VERSION | WITH | IDENTIFIER_KW | FIELDS | SET
+    | TAG | TRUE | FALSE
     | MAP
     ;
 
+snapshotId
+    : number
+    ;
+
+snapshotRefRetain

Review Comment:
   I overlooked this in the last PR, do we really need this extra definition? 
   
   To me, it feels more intuitive to just say `RETAIN number timeUnit` instead of `RETAIN snapshotRefRetain snapshotRefRetainTimeUnit`
   
   Another thing I overlooked is that looks like the Antlr convention is to use all capital letters for these definitions, like `TIME_UNIT` instead of `timeUnit`. I don't know if they imply different functionalities though.



##########
spark/v3.3/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4:
##########
@@ -73,6 +73,7 @@ statement
     | ALTER TABLE multipartIdentifier WRITE writeSpec                                       #setWriteDistributionAndOrdering
     | ALTER TABLE multipartIdentifier SET IDENTIFIER_KW FIELDS fieldList                    #setIdentifierFields
     | ALTER TABLE multipartIdentifier DROP IDENTIFIER_KW FIELDS fieldList                   #dropIdentifierFields
+    | ALTER TABLE multipartIdentifier CREATE TAG identifier (AS OF VERSION snapshotId)?  (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? #createTag

Review Comment:
   I am debating with myself, if we should merge this case with the CREATE BRANCH case or not. 
   
   I guess this question will also come for REPLACE BRANCH.
   
   Currently leaning towards keep it as is so it is more clear, each extension just does 1 exact thing, similar to we have 2 different actions for SET/DROP identifier field, 3 for ADD/DROP/REPLACE partition field. 
   
   Any thoughts? @amogh-jahagirdar @hililiwei 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org