You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/25 20:29:31 UTC

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601817722



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       I think there are 2 use cases that have contradicting behaviors:
   1. `ADD PARTITION FIELD bucket(id, 16) AS shard`, then `REPLACE PARTITION FIELD shard WITH bucket(id, 32)`
   2. `ADD PARTITION FIELD days(ts) AS days_col`, then `REPLACE PARTITION FIELD days_col WITH hours(ts)`
   
   For case 1, we do want the `bucket(id, 32)` to also be called `shard`, but we don't really want to call the `hours(ts)` partition as `days_col`. 
   
   So here are a couple of observations for `REPLACE transformFrom WITH transformTo`:
   1. if `transformFrom` is an expression, the default partition field has very specific meanings such as `ts_days`, `id_bucket_16`, and the replaced partition field `transformTo` should not inherit that name
   2. if there is a custom name for the `transformFrom` partition field, the behavior really depends. The 2 examples above shows this contradicting expectations.
   
   So I think the safest approach is to not infer the behavior for the custom partition name. If the caller wants to use the same name, just use the `AS` clause to specify it again, such as `REPLACE PARTITION FIELD shard WITH bucket(id, 32) AS shard`.
   
   What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org