You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/24 03:43:54 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #2365: Spark: SQL extention to update partition field atomically

jackye1995 opened a new pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365

I received some feedback from users about the current Spark SQL extension not able to directly update partition field. Currently it has to first drop and then add the new field, which (1) is not straight-forward for the common use case that updates the granularity of timestamp or bucket transform, (2) creates a time period between 2 commits that is not locked and might cause writer to write data with a wrong partition spec.

This PR introduces the syntax of `ALTER TABLE table CHANGE PARTITION FIELD transform TO transform` that drops the old transform and adds the new transform in a single commit to solve the issue above.

There is no similar syntax as reference in other systems, Delta lake took the route of directly adding or dropping the entire partition spec so I could not use that as a basis. I chose the current syntax based on the following reasons:
1. keyword `CHANGE` is chosen based on the Hive syntax of `CHANGE COLUMN col ...`, I think we might be able to reuse this keyword in the future for column DDL extensions.
2. keyword `TO` is chosen to be consistent with a similar syntax for `RENAME col TO col`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601845799



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.expressions.FieldReference
+import org.apache.spark.sql.connector.expressions.IdentityTransform
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionFieldExec(
+    catalog: TableCatalog,
+    ident: Identifier,
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends V2CommandExec {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+    catalog.loadTable(ident) match {
+      case iceberg: SparkTable =>
+        val schema = iceberg.table.schema
+        transformFrom match {
+          case IdentityTransform(FieldReference(parts)) if parts.size == 1 && schema.findField(parts.head) == null =>
+            // the name is not present in the Iceberg schema, so it must be a partition field name, not a column name
+            iceberg.table.updateSpec()
+              .addField(name.orNull, Spark3Util.toIcebergTerm(transformTo))
+              .removeField(parts.head)
+              .commit()
+
+          case _ =>
+            iceberg.table.updateSpec()
+              .addField(name.orNull, Spark3Util.toIcebergTerm(transformTo))
+              .removeField(Spark3Util.toIcebergTerm(transformFrom))

Review comment:
       Yes you are correct, I added a test for it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601740984



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       `REPLACE ... TO` doesn't make sense. What about `REPLACE ... WITH` instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r602639165



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       I think you make a good argument that names should not be reused automatically. If the caller wants to use the same name, then it isn't hard to put the name twice: `REPLACE shard WITH bucket(32, id) AS shard`. Then all we need to support is drop and add with the same name. I think that's simpler to implement and works well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#issuecomment-809705825


   Merged. Thanks @jackye1995!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601817722



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       I think there are 2 use cases that have contradicting behaviors:
   1. `ADD PARTITION FIELD bucket(id, 16) AS shard`, then `REPLACE PARTITION FIELD shard WITH bucket(id, 32)`
   2. `ADD PARTITION FIELD days(ts) AS days_col`, then `REPLACE PARTITION FIELD days_col WITH hours(ts)`
   
   For case 1, we do want the `bucket(id, 32)` to also be called `shard`, but we don't really want to call the `hours(ts)` partition as `days_col`. 
   
   So here are a couple of observations for `REPLACE transformFrom WITH transformTo`:
   1. if `transformFrom` is an expression, the default partition field has very specific meanings such as `ts_days`, `id_bucket_16`, and the replaced partition field `transformTo` should not inherit that name
   2. if there is a custom name for the `transformFrom` partition field, the behavior really depends. The 2 examples above shows this contradicting expectations.
   
   So I think the safest approach is to not infer the behavior for the custom partition name. If the caller wants to use the same name, just use the `AS` clause to specify it again, such as `REPLACE PARTITION FIELD shard WITH bucket(id, 32) AS shard`.
   
   What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r602639518



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ReplacePartitionField.scala
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionField(
+    table: Seq[String],
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends Command {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override def simpleString(maxFields: Int): String = {
+    s"ReplacePartitionField ${table.quoted} ${transformFrom.describe} " +
+        s"to ${name.map(n => s"$n=").getOrElse("")}${transformTo.describe}"

Review comment:
       Should this also be "with" instead of "to"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601745971



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       Looks like the implementation supports looking up the field by name, and `AS` can support rename, like `REPLACE ts_day WITH hour(ts) AS ts_hour`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#issuecomment-808603097


   This looks ready to go to me. I think there's a minor typo in the `simpleString` method to fix, but otherwise this is good to go. Thanks @jackye1995!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601746113



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.expressions.FieldReference
+import org.apache.spark.sql.connector.expressions.IdentityTransform
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionFieldExec(
+    catalog: TableCatalog,
+    ident: Identifier,
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends V2CommandExec {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+    catalog.loadTable(ident) match {
+      case iceberg: SparkTable =>
+        val schema = iceberg.table.schema
+        transformFrom match {
+          case IdentityTransform(FieldReference(parts)) if parts.size == 1 && schema.findField(parts.head) == null =>
+            // the name is not present in the Iceberg schema, so it must be a partition field name, not a column name
+            iceberg.table.updateSpec()
+              .addField(name.orNull, Spark3Util.toIcebergTerm(transformTo))
+              .removeField(parts.head)
+              .commit()
+
+          case _ =>
+            iceberg.table.updateSpec()
+              .addField(name.orNull, Spark3Util.toIcebergTerm(transformTo))
+              .removeField(Spark3Util.toIcebergTerm(transformFrom))

Review comment:
       Wouldn't adding and then removing cause a duplicate in the intermediate state? The result of multiple API calls should be the same as the result of multiple commits with a single call. So I think it is safer to remove the term and then add it after.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#issuecomment-809556897


   @rdblue thanks for the review, I updated the simple string.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#issuecomment-806195038


   @rdblue yes I agree CHANGE might cause some wrong interpretation, but I don't think it is a big issue for end users, especially people who are only interacting through SQL. If they see the `alwaysNull` partition field I think people would understand it means that partition field was dropped in favor of the current one.
   
   I am good with REPLACE, it is also a part of HiveQL, and Athena also supports it for column updates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r602638628



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.expressions.FieldReference
+import org.apache.spark.sql.connector.expressions.IdentityTransform
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionFieldExec(
+    catalog: TableCatalog,
+    ident: Identifier,
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends V2CommandExec {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+    catalog.loadTable(ident) match {
+      case iceberg: SparkTable =>
+        val schema = iceberg.table.schema
+        transformFrom match {
+          case IdentityTransform(FieldReference(parts)) if parts.size == 1 && schema.findField(parts.head) == null =>

Review comment:
       I just thought it might be shorter. Not a problem to have it this way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601746834



##########
File path: spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
##########
@@ -272,4 +272,76 @@ public void testDropPartitionByName() {
 
     Assert.assertEquals("Should have new spec field", expected, table.spec());
   }
+
+  @Test
+  public void testReplacePartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplacePartitionAndRename() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts) AS hour_col", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts", "hour_col")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplaceNamedPartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts) AS day_col", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts", "day_col")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD day_col TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "day_col")
+        .hour("ts")

Review comment:
       I think this should retain the original name. But then the problem is that `day_col` is reused. We'll have to rename the old field in that case, if that doesn't already happen in the spec evolution code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601741604



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ReplacePartitionField.scala
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionField(
+  table: Seq[String],

Review comment:
       Don't we normally use 2 indents for arguments in Scala?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601867050



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       Currently we strictly disallow rename -> delete renamed field -> add field with the same name in a single commit for use case 1, but I think that sounds like something we can support in `BaseUpdatePartitionSpec`, I am thinking about any consequence of it, will update later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601744420



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.expressions.FieldReference
+import org.apache.spark.sql.connector.expressions.IdentityTransform
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionFieldExec(
+    catalog: TableCatalog,
+    ident: Identifier,
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends V2CommandExec {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+    catalog.loadTable(ident) match {
+      case iceberg: SparkTable =>
+        val schema = iceberg.table.schema
+        transformFrom match {
+          case IdentityTransform(FieldReference(parts)) if parts.size == 1 && schema.findField(parts.head) == null =>

Review comment:
       Does it work to match on `FieldReference(Seq(name))` instead of checking `parts`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601818022



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ReplacePartitionField.scala
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.catalyst.plans.logical
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionField(
+  table: Seq[String],

Review comment:
       Yeah sorry my IDE was configured wrongly for indentation, let me fix it. We should probably add checkstyle rules for this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601869456



##########
File path: spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
##########
@@ -272,4 +272,76 @@ public void testDropPartitionByName() {
 
     Assert.assertEquals("Should have new spec field", expected, table.spec());
   }
+
+  @Test
+  public void testReplacePartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplacePartitionAndRename() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts) AS hour_col", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts", "hour_col")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplaceNamedPartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts) AS day_col", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts", "day_col")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD day_col TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "day_col")
+        .hour("ts")

Review comment:
       Yeah l also mentioned that in an earlier comment. Let me try to add that logic then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue merged pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

jackye1995 commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601868739



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ReplacePartitionFieldExec.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.iceberg.spark.Spark3Util
+import org.apache.iceberg.spark.source.SparkTable
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.Identifier
+import org.apache.spark.sql.connector.catalog.TableCatalog
+import org.apache.spark.sql.connector.expressions.FieldReference
+import org.apache.spark.sql.connector.expressions.IdentityTransform
+import org.apache.spark.sql.connector.expressions.Transform
+
+case class ReplacePartitionFieldExec(
+    catalog: TableCatalog,
+    ident: Identifier,
+    transformFrom: Transform,
+    transformTo: Transform,
+    name: Option[String]) extends V2CommandExec {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override lazy val output: Seq[Attribute] = Nil
+
+  override protected def run(): Seq[InternalRow] = {
+    catalog.loadTable(ident) match {
+      case iceberg: SparkTable =>
+        val schema = iceberg.table.schema
+        transformFrom match {
+          case IdentityTransform(FieldReference(parts)) if parts.size == 1 && schema.findField(parts.head) == null =>

Review comment:
       The current approach is used to be consistent with https://github.com/apache/iceberg/blob/master/spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropPartitionFieldExec.scala#L45, is there any consideration to deviate from that logic?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r602639165



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       I think you make a good argument that names should not be reused. If the caller wants to use the same name, then it isn't hard to put the name twice: `REPLACE shard WITH bucket(32, id) AS shard`. Then all we need to support is drop and add with the same name. I think that's simpler to implement and works well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601746834



##########
File path: spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
##########
@@ -272,4 +272,76 @@ public void testDropPartitionByName() {
 
     Assert.assertEquals("Should have new spec field", expected, table.spec());
   }
+
+  @Test
+  public void testReplacePartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplacePartitionAndRename() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts) AS hour_col", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts", "hour_col")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplaceNamedPartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts) AS day_col", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts", "day_col")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD day_col TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "day_col")
+        .hour("ts")

Review comment:
       I think this should retain the original name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601746464



##########
File path: spark3-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java
##########
@@ -272,4 +272,76 @@ public void testDropPartitionByName() {
 
     Assert.assertEquals("Should have new spec field", expected, table.spec());
   }
+
+  @Test
+  public void testReplacePartition() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts)", tableName);
+    table.refresh();
+    expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(2)
+        .alwaysNull("ts", "ts_day")
+        .hour("ts")
+        .build();
+    Assert.assertEquals("Should changed from daily to hourly partitioned field", expected, table.spec());
+  }
+
+  @Test
+  public void testReplacePartitionAndRename() {
+    sql("CREATE TABLE %s (id bigint NOT NULL, category string, ts timestamp, data string) USING iceberg", tableName);
+    Table table = validationCatalog.loadTable(tableIdent);
+    Assert.assertTrue("Table should start unpartitioned", table.spec().isUnpartitioned());
+
+    sql("ALTER TABLE %s ADD PARTITION FIELD days(ts)", tableName);
+    table.refresh();
+    PartitionSpec expected = PartitionSpec.builderFor(table.schema())
+        .withSpecId(1)
+        .day("ts")
+        .build();
+    Assert.assertEquals("Should have new spec field", expected, table.spec());
+
+    sql("ALTER TABLE %s REPLACE PARTITION FIELD days(ts) TO hours(ts) AS hour_col", tableName);

Review comment:
       Good to see a test for the `AS` use case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#discussion_r601743256



##########
File path: spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -69,6 +69,7 @@ statement
     : CALL multipartIdentifier '(' (callArgument (',' callArgument)*)? ')'                  #call
     | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)?   #addPartitionField
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform                        #dropPartitionField
+    | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform TO transform (AS name=identifier)? #replacePartitionField

Review comment:
       Also, why support `AS`? Shouldn't the new partition field use the same name as the old partition field?
   
   I would also expect using a partition field name to work. So if I used `bucket(16, id) AS shard` to create the partition, then I should also be able to use `REPLACE shard WITH bucket(32, id)`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #2365: Spark: SQL extention to update partition field atomically

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365#issuecomment-806103048


   I considered adding similar syntax when I added the ADD and DROP DDL. I think the problem with this is that it implies that the original field is changed, rather than dropped and added. In format v2, we can sort of do that by replacing the field and using the previous field's name, but even then the values from the old field will no longer appear. In v1, a drop actually replaces the field with a `void` or `alwaysNull` transform and adds a new field to the end of the spec. It seems to me like that would be confusing behavior for `CHANGE`.
   
   One option to fix this is to use `REPLACE` instead, which I think doesn't imply that the field itself is still there, but updated or changed. But then the syntax is further away from column DDL so I'm not sure if that's a good idea. What do you think, @jackye1995, @aokolnychyi, @yyanyy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org