You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/15 07:40:50 UTC

[GitHub] [incubator-iceberg] jun-he opened a new pull request #922: [WIP] Add partition spec evolution

jun-he opened a new pull request #922: [WIP] Add partition spec evolution
URL: https://github.com/apache/incubator-iceberg/pull/922
 
 
   For issue #281 and #836.
   
   This PR requires the change of https://github.com/apache/incubator-iceberg/pull/845.
   
   TODO
    [ ] add additional unit tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449922306



##########
File path: core/src/main/java/org/apache/iceberg/TableMetadata.java
##########
@@ -429,11 +429,16 @@ public TableMetadata updatePartitionSpec(PartitionSpec newPartitionSpec) {
       }
     }
 
-    Preconditions.checkArgument(defaultSpecId != newDefaultSpecId,
-        "Cannot set default partition spec to the current default");
+    // Always setting default partition spec to the new partition spec
+    ImmutableList.Builder<PartitionSpec> builder = ImmutableList.builder();
+    for (PartitionSpec spec : specs) {
+      if (spec.specId() == newDefaultSpecId) {
+        builder.add(freshSpec(newDefaultSpecId, schema, newPartitionSpec));
+      } else {
+        builder.add(spec);
+      }
+    }

Review comment:
       I'm not sure that this is the right change. I think it's a good idea to allow calling this method with the same spec, but that should be detected and should return the exact same TableMetadata. That way, the commit will appear to succeed, but won't actually change the table (no-op commits are used elsewhere, too).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441203899



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecParser.java
##########
@@ -40,26 +55,24 @@ public void testToJsonForV1Table() {
         "}";
     Assert.assertEquals(expected, PartitionSpecParser.toJson(table.spec(), true));
 
-    PartitionSpec spec = PartitionSpec.builderFor(table.schema())
-        .bucket("id", 8)
-        .bucket("data", 16)
-        .build();
-
-    table.ops().commit(table.ops().current(), table.ops().current().updatePartitionSpec(spec));
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();

Review comment:
       Why does this test use a table? It would make more sense to me if it created a custom partition spec (using add to set field IDs directly) and checked the resulting JSON string, without using update or a table at all.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435444523



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {

Review comment:
       I'm not sure it makes much sense to include this. When would it be reasonable to use it?
   
   The most common replacement case I can think of is to change the granularity of a transform. For example, `days` to `hours` or `truncate[10]` to `truncate[5]`. In the first case, it wouldn't make sense to use this because the name is something like `ts_day`, which would be misleading if used for hours. For the second case, although the partition values are comparable, it would still be confusing to mix together bins with different widths.
   
   I can't think of a case where we would want to use the same partition field ID with a different transform, other than deleting a transform in v1 by replacing it with `alwaysNull`. So I don't think we need the `replaceField` method, at least not right now. Let's get this in without the feature and we can add it later if there is a use case for it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-643828752


   @rdblue Thanks for the comments and I updated the PR accordingly.
   
   In the PR, I still keep the lazily reassign the partition field ids if needed during commit instead of tracking field Id during the add fields. The main reason is to reduce the duplicate code between `PartitionSpec` and `PartitionSpecUpdate`. 
   
   If completely maintaining a List of PartitionField here, we then have to keep the duplicate logic in `PartitionSpec.Builder`, e.g. from `sourceName` to `sourceId`, from method name to get `Transform`, from source name to get the default partition name.
   
   Instead, I maintain a list of `PartitionField` for the fields from the current spec and then maintain a List of `Consumer<PartitionSpec.Builder>` for the newly added fields.
   
   Please take a look and let me know your comments. Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #922:
URL: https://github.com/apache/iceberg/pull/922


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435435718



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());

Review comment:
       I don't think this should use the builder to keep track of state. That makes the builder more complicated and leaks concerns from this class to it. This class should be responsible for applying changes and should use the builder to create a new `PartitionSpec` instance without keeping the state from this class in the builder.
   
   I think it makes the most sense to keep track of a list of fields here. The `PartitionField` constructor is accessible, so you can maintain `List<PartitionField>` and make changes to it as update methods are called.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r443993736



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();

Review comment:
       Thanks for the comment. This is true for V1 schema, where we still keep the removed field in the spec. For V2, it won't cause duplication as the pending new field will be exactly the same as the pending removed field (e.g. same field ID) and then equivalent to keep the original untouched.
   
   One case is that if there are a lot of updates for the spec, users might choose to clear it and then add new partition fields. But this seems to bring more troubles than the convenience it offers. So I am thinking we should just remove `clear()` method. If users really want to clear all, they can remove all fields one by one. Any comments on it?
   
   I will add a validation check for the pending changes if users remove a field and then add it back, or add a field and then remove it.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r443987622



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));

Review comment:
       Thanks for the comment. Yep, it handles that situation as the computation loops over all the partition specs in the history. I will add a test for this case.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441190101



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());

Review comment:
       If I'm reading this correctly, then either `id_bucket` or `data_bucket` is assigned ID 1000, depending on the table's format version. But 1000 was the ID assigned to the original data_bucket field and it should not be reused. That would break the `files` table by mixing `id_bucket` and the original `data_bucket` values together.
   
   New fields should always result in new IDs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441199740



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")
+        .addBucketField("id", 8)
+        .addBucketField("data", 6, "data_field")
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "  1001: data_partition: bucket[8](2)\n" +
+        "  1002: id_bucket: bucket[8](1)\n" +
+        "  1003: data_field: bucket[6](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1003, table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testRenameField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    table.updateSpec()
+        .renameField("data_bucket", "data_partition")
+        .addBucketField("id", 8)
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_partition: bucket[16](2)\n" +
+        "  1001: id_bucket: bucket[8](1)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1001, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(1, table.spec().specId());
+  }
+
+  @Test
+  public void testRenameFieldExceptions() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if only renaming a partition field",
+        IllegalArgumentException.class,
+        "Cannot set default partition spec to the current default",
+        () -> table.updateSpec()
+            .renameField("data_bucket", "data_partition")

Review comment:
       I think this should be allowed. We should just rebuild the spec with a different name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r443993923



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },

Review comment:
       👍  will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-665424558


   @rdblue can you let me know your comments? Thanks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441197362



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")
+        .addBucketField("id", 8)
+        .addBucketField("data", 6, "data_field")
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "  1001: data_partition: bucket[8](2)\n" +
+        "  1002: id_bucket: bucket[8](1)\n" +
+        "  1003: data_field: bucket[6](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1003, table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testRenameField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    table.updateSpec()
+        .renameField("data_bucket", "data_partition")

Review comment:
       I would expect rename to change the field's name, without reordering fields in the partition spec. Can you update this test to cover that case?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #922: [WIP] Add partition spec evolution
URL: https://github.com/apache/incubator-iceberg/pull/922#issuecomment-614324838
 
 
   Thanks for working on this, @jun-he!
   
   Besides my comment about replacing the entire spec, I have two other main concerns. First, I think we might want to consider a change-based API, which is related to my comment about not replacing the whole spec. For updating schemas, the API expresses things like addColumn, renameColumn, etc. that correspond to SQL DDL statements. I think it would make sense to have a similar API here:
   * `addField(String source, ...)`
   * `renameField(String name, String newName)`
   * `replaceField(String name, ...)`
   * `removeField(String name)`
   
   I'm not sure how we should pass the transform to `addField` and `replaceField` just yet. We have expressions for the various transforms, so we could use an expression like `addField("ts_hour", hour("ts"))`. We could also add a lot more variations of the method names, like `addHourField`, `addDayField`, etc.
   
   Second, I think we are going to need a v1 version and a v2 version. In v1, we have to maintain the order and number of fields, so we are more limited for partition spec evolution: we can rename partition fields, replace existing partition fields with the `void` transform to "remove" them, and add new partition fields at the end of the spec. But in v2, we have reliable IDs so we can actually remove fields, replace a field with a different transform (drop days and add hours in the same position), also rename and add partition fields. I think the easiest thing is to build separate implementations that validate and apply those changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449921239



##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -241,11 +263,45 @@ public void testAddPartitionFieldsWithAndWithoutFieldIds() {
         .add(1, "id_partition2", "bucket[5]")
         .add(1, 1005, "id_partition1", "bucket[4]")
         .truncate("s", 1, "custom_truncate")
+        .add(1, 1002, "id_partition3", "bucket[3]")
         .build();
 
     Assert.assertEquals(1000, spec.fields().get(0).fieldId());
     Assert.assertEquals(1005, spec.fields().get(1).fieldId());
     Assert.assertEquals(1006, spec.fields().get(2).fieldId());
+    Assert.assertEquals(1002, spec.fields().get(3).fieldId());
     Assert.assertEquals(1006, spec.lastAssignedFieldId());
   }
+
+  @Test
+  public void testAddPartitionFieldsWithInvalidFieldId() {
+    AssertHelpers.assertThrows("Should detect invalid duplicate field id",
+        IllegalArgumentException.class,
+        "Cannot add a partition that duplicates another within",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .add(1, "id_partition2", "bucket[5]")
+            .add(1, 1005, "id_partition1", "bucket[4]")
+            .add(1, 1005, "id_partition3", "bucket[3]")
+            .build());
+  }
+
+  @Test
+  public void testMultipleBucketPartitions() {
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .bucket("id", 8, "bucket")
+            .bucket("s", 16, "bucket").build());
+
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot add redundant partition",
+        () -> PartitionSpec.builderFor(SCHEMA).bucket("id", 8).bucket("id", 16).build());
+
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot add redundant partition",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .bucket("id", 8, "id_bucket1")
+            .bucket("id", 16, "id_bucket2").build());
+

Review comment:
       Nit: unnecessary empty line.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441205985



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = new ArrayList<>();
+  private final Map<String, PartitionField> curSpecFields;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.curSpecFields = base.spec().fields().stream().collect(
+        Collectors.toMap(
+            PartitionField::name,
+            Function.identity(),
+            (n1, n2) -> {
+              throw new IllegalStateException(String.format("Duplicate partition field found: %s", n1));
+            },
+            LinkedHashMap::new

Review comment:
       Using a `LinkedHashMap` doesn't quite preserve ordering because it uses insertion order and `put` adds the new entry to the end, right? I think you may instead want to use an array or `ArrayList` and a name-to-index map.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r443993736



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();

Review comment:
       Thanks for the comment. This is true for V1 schema, where we still keep the removed field in the spec. For V2, it won't cause duplication as the pending new field will be exactly the same as the pending removed field (e.g. same field ID) and then equivalent to keep the original untouched.
   
   One case is that if there are a lot of updates for the spec, users might choose to clear it and then add new partition fields. But this seems to bring more troubles than the convenience it offers. So I am thinking we should just remove `clear()` method. If users really want to clear all, they can remove all fields one by one. Any comments on it?
   
   I will add a validation check for the pending changes if users remove a field and then add it back.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r443999375



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")

Review comment:
       @rdblue Thanks for the comments. One concern is that this change may break the existing table's partition spec if users already have created a table with a partition spec using multiple bucket functions for the same table column. 
   
   We may just ignore existing violations and just enforce it for the new or updated specs by adding the check inside `bucket` method in `PartitionSpec.Builder`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449922830



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -346,11 +346,34 @@ private void checkAndAddPartitionName(String name, Integer identitySourceColumnI
       partitionNames.add(name);
     }
 
+    private String getDedupKey(PartitionField field) {
+      String transform = field.transform().toString();
+      String type;
+      if ("hour".equalsIgnoreCase(transform) || "day".equalsIgnoreCase(transform) ||
+          "month".equalsIgnoreCase(transform) || "year".equalsIgnoreCase(transform)) {
+        type = "time";
+      } else if (transform.startsWith("bucket[")) {
+        type = "bucket";
+      } else {
+        return null;

Review comment:
       Shouldn't this use the transform instead? Otherwise, `truncate` would conflict with `null`, because the key is the same, right?
   
   Also, should we similarly add a case for `truncate` with different lengths?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441206849



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));

Review comment:
       I think you need to add a test case where the current spec's highest ID is lower than another spec's highest ID. We can't reuse IDs if the user removes a field, commits, and starts a new change to add a new field.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449922937



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -346,11 +346,34 @@ private void checkAndAddPartitionName(String name, Integer identitySourceColumnI
       partitionNames.add(name);
     }
 
+    private String getDedupKey(PartitionField field) {
+      String transform = field.transform().toString();
+      String type;
+      if ("hour".equalsIgnoreCase(transform) || "day".equalsIgnoreCase(transform) ||
+          "month".equalsIgnoreCase(transform) || "year".equalsIgnoreCase(transform)) {
+        type = "time";
+      } else if (transform.startsWith("bucket[")) {
+        type = "bucket";
+      } else {
+        return null;

Review comment:
       What about using the transform name in this situation? Then this would always catch duplicate transforms.
   
   Also, should we add a case for `truncate` with different lengths?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r434955978



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder = null;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec newSpec() {

Review comment:
       👌 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441147078



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());

Review comment:
       `PartitionSpec` implements `equals` so you should be able to use it in assertions. I updated this test and this works:
   
   ```java
       PartitionSpec initialSpec = PartitionSpec.builderFor(table.schema()).bucket("data", 16).build();
       Assert.assertEquals("Should use the expected initial spec", initialSpec, table.spec());
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441186392



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },

Review comment:
       I would prefer not to put expected IDs in parameters. This makes it difficult to read and understand the test cases. I find myself going back and forth and counting indexes to validate that the cases look correct.
   
   Instead, when cases differ I'd like to use `V1Assert.assertEquals` and `V2Assert.assertEquals`, which are only enforced if the format version is v1 or v2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435440600



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);

Review comment:
       `PartitionSpecBuilder.add` is not public. It is used to reconstruct specs when deserializing, but should not be exposed to users. Similarly, users should not interact with IDs and transform strings through this API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441199360



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)

Review comment:
       The duplicate partition detection should take precedence over duplicate name detection. In other words, I would expect this case to fail because there is already a bucket field for data, not because of the name duplication.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922: [WIP] Add partition spec evolution
URL: https://github.com/apache/incubator-iceberg/pull/922#discussion_r409170299
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
 ##########
 @@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /** Update the current partition spec to a new partition spec.
 
 Review comment:
   Nit: Most javadoc starts on the next line.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435431921



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata

Review comment:
       What does this line mean? Sounds like an internal detail to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441204336



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecInfo.java
##########
@@ -91,25 +94,27 @@ public void testSpecInfoPartitionedTable() {
   }
 
   @Test
-  public void testSpecInfoPartitionSpecEvolutionForV1Table() {
+  public void testSpecInfoPartitionSpecEvolution() {
     PartitionSpec spec = PartitionSpec.builderFor(schema)
         .bucket("data", 4)
         .build();
     TestTables.TestTable table = TestTables.create(tableDir, "test", schema, spec, formatVersion);
 
     Assert.assertEquals(spec, table.spec());
 
-    TableMetadata base = TestTables.readMetadata("test");
-    PartitionSpec newSpec = PartitionSpec.builderFor(table.schema())
-        .bucket("data", 10)
+    PartitionSpec expectedNewSpec = PartitionSpec.builderFor(table.schema())
+        .add(2, newFieldId, "data_bucket", "bucket[10]")
         .withSpecId(1)
         .build();
-    table.ops().commit(base, base.updatePartitionSpec(newSpec));
 
-    Assert.assertEquals(newSpec, table.spec());
-    Assert.assertEquals(newSpec, table.specs().get(newSpec.specId()));
+    table.updateSpec().clear()

Review comment:
       Using `clear` should be rare. Most cases should remove a single field and add another field.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he edited a comment on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he edited a comment on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-650829640


   @rdblue  thanks for the comment. I address the comments and update the PR accordingly.
   In this change, `UpdatePartitionSpec` now ignores all soft-deleted fields in `V1` and then applies all changes to create a `V2` partition spec. After that, it then fill each of partition filed id gaps with a `VoidTransform` field.
   
   In this way, we can also support partition spec evolution, e.g.
   - reuse a field id previously removed in V1
   - keep spec concise without useless trailing void fields if needed
   
   Additionally, it does not need to consider `V1` case before committing and make the code cleaner and easier to read. 
   
   Can you please take another look and let me know your comments? Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he edited a comment on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he edited a comment on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-650829640


   @rdblue  thanks for the comment. I address the comments and update the PR accordingly.
   In this change, `UpdatePartitionSpec` now ignores all soft-deleted fields in `V1` and then applies all changes to create a `V2` partition spec. After that, it then fill each of partition filed id gaps with a `VoidTransform` field.
   
   In this way, we can also support partition spec evolution and reuse field Id in V1, including
   - reuse a field id previously removed in V1
   - keep spec concise without useless trailing void fields if needed
   
   Additionally, it does not need to consider `V1` case before committing and make the code cleaner and easier to read. 
   
   Can you please take another look and let me know your comments? Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441202921



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {

Review comment:
       This would also remove the need to make assertions about the table's current spec in every test case. Instead, you'd just pass in the spec you want to make changes to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r450017686



##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -38,6 +38,9 @@
   public void testMultipleTimestampPartitions() {
     AssertHelpers.assertThrows("Should not allow year(ts) and year(ts)",
         IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA).year("ts", "year").year("another_ts", "year").build());

Review comment:
       I think this change is the side effect if the duplicate partition detection takes precedence over duplicate name detection as discussed in https://github.com/apache/iceberg/pull/922#discussion_r441199360.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-650829640


   @rdblue  thanks for the comment. I address the comments and update the PR accordingly.
   In this change, `UpdatePartitionSpec` now ignores all soft-deleted fields in `V1` and then applies all changes to create a `V2` partition spec. After that, it then fill each of partition filed id gaps with a `VoidTransform` field.
   In this way, it does not need to consider `V1` case before committing and make the code cleaner and easier to read.
   
   Can you please take another look and let me know your comments? Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435445742



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));
+    ops.commit(base, update);
+  }
+
+  private PartitionSpec freshSpecFieldIds(PartitionSpec partitionSpec) {

Review comment:
       This won't be necessary if this class keeps track of its own state and assigns IDs internally.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-638997943


   Looks like the test failure is due to importing Guava classes instead of the relocated versions. We now rely on a bundled and relocated version of Guava, introduced in #1068.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435431601



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.

Review comment:
       This won't create a new spec because the field IDs are still limited by the existing spec. I'd prefer to just say that this will remove all existing fields.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449921994



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private static final String SOFT_DELETE_POSTFIX = "__[removed]";
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<PartitionSpec> specs;
+  private final Schema schema;
+  private final Map<String, PartitionField> curSpecFields;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = Lists.newArrayList();
+  private final Map<String, PartitionField> newRemovedFields = Maps.newHashMap();

Review comment:
       I think the way that this operation keeps track of state makes it very confusing to read and understand. I can see the appeal of trying to just delegating to the spec builder, but the current approach requires complicated checks after the builder is configured (like `checkIfRemoved` on line 97) and methods that need to rewrite the entire spec anyway (like `freshSpecFieldIds` and `fillGapsByNonNullFields`).
   
   I think that this would be easier to understand if it did validation and assignment incrementally. Instead of adding a method to rewrite the spec with different field IDs, I think this should check for deleted fields when the configuration method is called and use the correct ID from the start. This should help avoid the need for work-arounds like `getLastPartitionField` in the spec builder.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435432470



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata
+   *
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec clear();
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.

Review comment:
       I think it's okay to leave this out. I think it is clear enough that an unspecified name is given a default value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441193557



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")

Review comment:
       I think we should have a validation that rejects multiple bucket functions applied to the same table column. It doesn't make sense to do this.
   
   I should note that there are some cases where you'd want to. Moving from `bucket[16]` to `bucket[32]`, for example, makes sense because the bucket function ensures that values are split between 2 buckets when moving from 16 to 32. So you could keep both in metadata without harm. But that isn't the case generally: if you had `bucket[7]` along with `bucket[11]`, you'd end up with 77 different combinations. If that's what the user wants, then they should use `bucket[77]`, not two separate bucket functions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446415256



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")
+        .addBucketField("id", 8)
+        .addBucketField("data", 6, "data_field")
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "  1001: data_partition: bucket[8](2)\n" +
+        "  1002: id_bucket: bucket[8](1)\n" +
+        "  1003: data_field: bucket[6](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1003, table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testRenameField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    table.updateSpec()
+        .renameField("data_bucket", "data_partition")

Review comment:
       👌  switch the order of `renameField` and `addBucketField` lines will cover this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446490897



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {

Review comment:
       I moved some tests to a new test `TestTablePartitionSpecUpdate`, where it needs the table for tests related to table metadata, e.g. version, spec history, specId, etc.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449922630



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -405,11 +428,11 @@ public Builder month(String sourceName) {
     }
 
     public Builder day(String sourceName, String targetName) {
-      checkAndAddPartitionName(targetName);
       Types.NestedField sourceColumn = findSourceColumn(sourceName);
       PartitionField field = new PartitionField(
           sourceColumn.fieldId(), nextFieldId(), targetName, Transforms.day(sourceColumn.type()));
       checkForRedundantPartitions(field);
+      checkAndAddPartitionName(targetName);

Review comment:
       I think this was needed because of my comment that the duplicate field should throw an exception before the duplicate name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441190101



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());

Review comment:
       If I'm reading this correctly, then either `id_bucket` or `data_bucket` is assigned ID 1000, depending on the table's format version. But 1000 was the ID assigned to the original data_bucket field and it should not be reused. That would break the `files` table by mixing `id_bucket` and the original `data_bucket` values together.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435429629



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata
+   *
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec clear();
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName);
+
+  /**
+   * Add a new partition field with year transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addYearField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with year transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.
+   * The partition field id is automatically assigned and will be updated during the commit.

Review comment:
       Callers don't need to worry about field IDs, so we can remove these comments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435445435



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));

Review comment:
       Should we also track `lastAssignedPartitionId` in metadata?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449921205



##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -38,6 +38,9 @@
   public void testMultipleTimestampPartitions() {
     AssertHelpers.assertThrows("Should not allow year(ts) and year(ts)",
         IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA).year("ts", "year").year("another_ts", "year").build());

Review comment:
       The context for these checks is incorrect. The problem is not a duplicate transform for a field, it is that the name is reused.
   
   Also, these should not be in this test case for multiple timestamp partitions. These should be in a test case for duplicate names.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441207704



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = new ArrayList<>();
+  private final Map<String, PartitionField> curSpecFields;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.curSpecFields = base.spec().fields().stream().collect(
+        Collectors.toMap(
+            PartitionField::name,
+            Function.identity(),
+            (n1, n2) -> {
+              throw new IllegalStateException(String.format("Duplicate partition field found: %s", n1));
+            },
+            LinkedHashMap::new
+        )
+    );
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    PartitionSpec.Builder specBuilder = PartitionSpec.builderFor(base.schema());
+    curSpecFields.values().forEach(field ->
+        specBuilder.add(
+            field.sourceId(),
+            field.fieldId(),
+            field.name(),
+            field.transform().toString())
+    );
+    newSpecFields.forEach(c -> c.accept(specBuilder));
+    return specBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    newSpecFields.clear();
+    curSpecFields.clear();
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecFields.add(builder -> builder.identity(sourceName, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecFields.add(builder -> builder.identity(sourceName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecFields.add(builder -> builder.year(sourceName, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecFields.add(builder -> builder.year(sourceName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecFields.add(builder -> builder.month(sourceName, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecFields.add(builder -> builder.month(sourceName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecFields.add(builder -> builder.day(sourceName, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecFields.add(builder -> builder.day(sourceName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecFields.add(builder -> builder.hour(sourceName, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecFields.add(builder -> builder.hour(sourceName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecFields.add(builder -> builder.bucket(sourceName, numBuckets, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecFields.add(builder -> builder.bucket(sourceName, numBuckets));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecFields.add(builder -> builder.truncate(sourceName, width, targetName));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecFields.add(builder -> builder.truncate(sourceName, width));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    Preconditions.checkArgument(curSpecFields.containsKey(name),
+        "Cannot find an existing partition field with the name: %s", name);
+    Preconditions.checkArgument(newName != null && !newName.isEmpty(),
+        "Cannot use empty or null partition name: %s", newName);
+    Preconditions.checkArgument(!curSpecFields.containsKey(newName),
+        "Cannot use partition name more than once: %s", newName);
+
+    PartitionField field = curSpecFields.remove(name);
+    curSpecFields.put(newName,
+        new PartitionField(field.sourceId(), field.fieldId(), newName, field.transform()));
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    Preconditions.checkArgument(curSpecFields.containsKey(name),
+        "Cannot find an existing partition field with the name: %s", name);
+    if (base.formatVersion() == 1) {
+      PartitionField field = curSpecFields.remove(name);
+      String newName = field.name() + "_removed"; // rename it for soft delete
+      curSpecFields.put(newName,
+          new PartitionField(field.sourceId(), field.fieldId(), newName, Transforms.alwaysNull()));
+    } else {
+      curSpecFields.remove(name);
+    }
+    return this;
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));
+    ops.commit(base, update);
+  }
+
+  private PartitionSpec freshSpecFieldIds(PartitionSpec partitionSpec) {
+    if (base.formatVersion() == 1) {
+      return partitionSpec;
+    }
+
+    int lastAssignedFieldId = 0;
+    Map<String, Integer> partitionFieldIdByName = new HashMap<>();
+    for (PartitionSpec spec : base.specs()) {
+      for (PartitionField field : spec.fields()) {
+        partitionFieldIdByName.put(getKey(field), field.fieldId());
+      }
+      lastAssignedFieldId = Math.max(lastAssignedFieldId, spec.lastAssignedFieldId());
+    }
+
+    PartitionSpec.Builder specBuilder = PartitionSpec.builderFor(base.schema())
+        .withSpecId(partitionSpec.specId());
+
+    for (PartitionField field : partitionSpec.fields()) {
+      int assignedFieldId = partitionFieldIdByName.containsKey(getKey(field)) ?
+          partitionFieldIdByName.get(getKey(field)) : ++lastAssignedFieldId;

Review comment:
       We don't use the return value of a `++` expression to avoid confusion. Incrementing should happen on a separate line, either before or after the value is used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441198522



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {

Review comment:
       I don't think that this test should extend `TableTestBase`. Instead, I think it should be like `TestSchemaUpdate`, which instantiates `SchemaUpdate` directly by passing in the schema to update and the highest assigned ID. We don't need to test the commit mechanism, just how updates are applied to the base `PartitionSpec`. That would also make it easier to handle v1 and v2 tests, instead of using `V1Assert` and `V2Assert`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922: [WIP] Add partition spec evolution
URL: https://github.com/apache/incubator-iceberg/pull/922#discussion_r409172592
 
 

 ##########
 File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
 ##########
 @@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /** Update the current partition spec to a new partition spec.
+   * <p>
+   * Partition field IDs of the new partitionSpec may be updated during the commit.
+   *
+   * @param partitionSpec new partition spec to update
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec update(PartitionSpec partitionSpec);
 
 Review comment:
   Exposing a method like this requires more validation work because we have to compare the two partition specs by detecting the changes to each field (by ID) and ensuring those changes are allowed. Ensuring only allowed changes are accepted is easier if the user tells Iceberg their intent using methods like `addIdentity(String name)`.
   
   This method would also be difficult for callers to use correctly. The public `PartitionSpec` builder API doesn't allow setting custom field IDs. We could expose custom field IDs, but that's more complication for the user to worry about. We could also reassign the IDs if the fields look equivalent, but we don't really know if that's correct.
   
   Overall, I think it will be easier for maintainers and users to remove this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441201705



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());

Review comment:
       I think the problem is that `clear` is not replacing the existing fields with `alwaysNull` as a placeholder for the existing fields.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441206416



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = new ArrayList<>();
+  private final Map<String, PartitionField> curSpecFields;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.curSpecFields = base.spec().fields().stream().collect(
+        Collectors.toMap(
+            PartitionField::name,
+            Function.identity(),
+            (n1, n2) -> {
+              throw new IllegalStateException(String.format("Duplicate partition field found: %s", n1));
+            },
+            LinkedHashMap::new
+        )
+    );
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    PartitionSpec.Builder specBuilder = PartitionSpec.builderFor(base.schema());
+    curSpecFields.values().forEach(field ->
+        specBuilder.add(
+            field.sourceId(),
+            field.fieldId(),
+            field.name(),
+            field.transform().toString())
+    );
+    newSpecFields.forEach(c -> c.accept(specBuilder));
+    return specBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    newSpecFields.clear();
+    curSpecFields.clear();

Review comment:
       This should call `removeField` for every current field so that v1 placeholders are inserted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441153291



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();

Review comment:
       I'm not sure this change makes sense. Why would someone drop the `bucket_16(data)` field only to add it back? I think this should be detected as an error because the field is compatible with a previous field.
   
   If we don't detect this case and throw an exception, then it would be easy for a user to create two fields with the exact same configuration, leading to needless and confusing duplication in the metadata tables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441199740



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")
+        .addBucketField("id", 8)
+        .addBucketField("data", 6, "data_field")
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "  1001: data_partition: bucket[8](2)\n" +
+        "  1002: id_bucket: bucket[8](1)\n" +
+        "  1003: data_field: bucket[6](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1003, table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testRenameField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    table.updateSpec()
+        .renameField("data_bucket", "data_partition")
+        .addBucketField("id", 8)
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_partition: bucket[16](2)\n" +
+        "  1001: id_bucket: bucket[8](1)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1001, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(1, table.spec().specId());
+  }
+
+  @Test
+  public void testRenameFieldExceptions() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if only renaming a partition field",
+        IllegalArgumentException.class,
+        "Cannot set default partition spec to the current default",
+        () -> table.updateSpec()
+            .renameField("data_bucket", "data_partition")

Review comment:
       I think this should be allowed. We should just rebuild the spec with a different name. We could also use the same spec ID, since it is compatible with the old spec.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r453247380



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecParser.java
##########
@@ -19,16 +19,20 @@
 
 package org.apache.iceberg;
 
+import org.apache.iceberg.types.Types;
 import org.junit.Assert;
 import org.junit.Test;
 
-public class TestPartitionSpecParser extends TableTestBase {
-  public TestPartitionSpecParser() {
-    super(1);
-  }
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class TestPartitionSpecParser {
+  private static final Schema SCHEMA = new Schema(
+      required(1, "id", Types.IntegerType.get()),
+      required(2, "data", Types.StringType.get())
+  );
 
   @Test
-  public void testToJsonForV1Table() {
+  public void testToJson() {

Review comment:
       Thanks for the comment. This test was updated because `TableTestBase` was removed. There is no new test cases added into it..
   I updated the test name because, with the current implementation, this test is not specific for `V1 Table` and there is no special `toJson` handling needed for V2 table.

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private static final String SOFT_DELETE_POSTFIX = "__[removed]";
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<PartitionSpec> specs;
+  private final Schema schema;
+  private final Map<String, PartitionField> curSpecFields;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = Lists.newArrayList();
+  private final Map<String, PartitionField> newRemovedFields = Maps.newHashMap();

Review comment:
       @rdblue Thanks for the comments. I think actually we can remove `newRemovedFields `, `checkIfRemoved `, and `getLastPartitionField ` as there is no need to check if a newly added field is just removed.
   As we discussed in https://github.com/apache/iceberg/pull/922#discussion_r441153291, the main concern for removing a field and then adding it back is to pollute the metadata.
   In the current implementation, it won't add duplications and then equivalent to do nothing or rename the field because the partition field ID reuse. 
   I added a few unit tests for those cases.

##########
File path: core/src/main/java/org/apache/iceberg/TableMetadata.java
##########
@@ -429,11 +429,16 @@ public TableMetadata updatePartitionSpec(PartitionSpec newPartitionSpec) {
       }
     }
 
-    Preconditions.checkArgument(defaultSpecId != newDefaultSpecId,
-        "Cannot set default partition spec to the current default");
+    // Always setting default partition spec to the new partition spec
+    ImmutableList.Builder<PartitionSpec> builder = ImmutableList.builder();
+    for (PartitionSpec spec : specs) {
+      if (spec.specId() == newDefaultSpecId) {
+        builder.add(freshSpec(newDefaultSpecId, schema, newPartitionSpec));
+      } else {
+        builder.add(spec);
+      }
+    }

Review comment:
       @rdblue Thanks for the comment. It is a good idea to support noop commit.
   The main reason to remove the precondition check here is to support rename a partition field. In renaming case, the partition spec is compatible but it is different and need to be committed.
   Updated the code to support no-op commit and added unit tests.

##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -241,11 +263,45 @@ public void testAddPartitionFieldsWithAndWithoutFieldIds() {
         .add(1, "id_partition2", "bucket[5]")
         .add(1, 1005, "id_partition1", "bucket[4]")
         .truncate("s", 1, "custom_truncate")
+        .add(1, 1002, "id_partition3", "bucket[3]")
         .build();
 
     Assert.assertEquals(1000, spec.fields().get(0).fieldId());
     Assert.assertEquals(1005, spec.fields().get(1).fieldId());
     Assert.assertEquals(1006, spec.fields().get(2).fieldId());
+    Assert.assertEquals(1002, spec.fields().get(3).fieldId());
     Assert.assertEquals(1006, spec.lastAssignedFieldId());
   }
+
+  @Test
+  public void testAddPartitionFieldsWithInvalidFieldId() {
+    AssertHelpers.assertThrows("Should detect invalid duplicate field id",
+        IllegalArgumentException.class,
+        "Cannot add a partition that duplicates another within",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .add(1, "id_partition2", "bucket[5]")
+            .add(1, 1005, "id_partition1", "bucket[4]")
+            .add(1, 1005, "id_partition3", "bucket[3]")
+            .build());
+  }
+
+  @Test
+  public void testMultipleBucketPartitions() {
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .bucket("id", 8, "bucket")
+            .bucket("s", 16, "bucket").build());
+
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot add redundant partition",
+        () -> PartitionSpec.builderFor(SCHEMA).bucket("id", 8).bucket("id", 16).build());
+
+    AssertHelpers.assertThrows("Should not allow bucket[8](id) and bucket[16](id)",
+        IllegalArgumentException.class, "Cannot add redundant partition",
+        () -> PartitionSpec.builderFor(SCHEMA)
+            .bucket("id", 8, "id_bucket1")
+            .bucket("id", 16, "id_bucket2").build());
+

Review comment:
       Done

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,284 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private static final String SOFT_DELETE_POSTFIX = "__[removed]";
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<PartitionSpec> specs;
+  private final Schema schema;
+  private final Map<String, PartitionField> curSpecFields;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = Lists.newArrayList();
+  private final Map<String, PartitionField> newRemovedFields = Maps.newHashMap();

Review comment:
       I also updated the code to use the correct ID from the start to avoid `freshSpecFieldIds` at the end.

##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -38,6 +38,9 @@
   public void testMultipleTimestampPartitions() {
     AssertHelpers.assertThrows("Should not allow year(ts) and year(ts)",
         IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA).year("ts", "year").year("another_ts", "year").build());

Review comment:
       Thanks for the explanation! Updated accordingly.

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -346,11 +346,34 @@ private void checkAndAddPartitionName(String name, Integer identitySourceColumnI
       partitionNames.add(name);
     }
 
+    private String getDedupKey(PartitionField field) {
+      String transform = field.transform().toString();
+      String type;
+      if ("hour".equalsIgnoreCase(transform) || "day".equalsIgnoreCase(transform) ||
+          "month".equalsIgnoreCase(transform) || "year".equalsIgnoreCase(transform)) {
+        type = "time";
+      } else if (transform.startsWith("bucket[")) {
+        type = "bucket";
+      } else {
+        return null;

Review comment:
       Yep, that is better. I add a `getName()` method to Transform interface.
   
   Similar to `bucket`, we may only allow one `truncate` transform for a given field. 
   Do you know if there is a valid use case with multiple `truncate` for a field? 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-662814584


   @rdblue can you take another look? Thanks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446700841



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",
+          () -> table.updateSpec()
+              .addBucketField("data", 16, "data_partition")
+              .commit());
+    }
+  }
+
+  @Test
+  public void testAddField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec()
+        .addBucketField("data", 8, "data_partition")
+        .addBucketField("id", 8)
+        .addBucketField("data", 6, "data_field")
+        .commit();
+
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "  1001: data_partition: bucket[8](2)\n" +
+        "  1002: id_bucket: bucket[8](1)\n" +
+        "  1003: data_field: bucket[6](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1003, table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testRenameField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+    Assert.assertEquals(0, table.spec().specId());
+
+    table.updateSpec()
+        .renameField("data_bucket", "data_partition")

Review comment:
       Yep.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r449922403



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecParser.java
##########
@@ -19,16 +19,20 @@
 
 package org.apache.iceberg;
 
+import org.apache.iceberg.types.Types;
 import org.junit.Assert;
 import org.junit.Test;
 
-public class TestPartitionSpecParser extends TableTestBase {
-  public TestPartitionSpecParser() {
-    super(1);
-  }
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class TestPartitionSpecParser {
+  private static final Schema SCHEMA = new Schema(
+      required(1, "id", Types.IntegerType.get()),
+      required(2, "data", Types.StringType.get())
+  );
 
   @Test
-  public void testToJsonForV1Table() {
+  public void testToJson() {

Review comment:
       Instead of changing existing test cases, let's add new test cases for the case you want to exercise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r435431020



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -476,11 +482,52 @@ Builder add(int sourceId, int fieldId, String name, String transform) {
       Types.NestedField column = schema.findField(sourceId);
       checkAndAddPartitionName(name, column.fieldId());
       Preconditions.checkNotNull(column, "Cannot find source column: %s", sourceId);
+      checkDuplicateFieldId(fieldId);
       fields.add(new PartitionField(sourceId, fieldId, name, Transforms.fromString(column.type(), transform)));
       lastAssignedFieldId.getAndAccumulate(fieldId, Math::max);
       return this;
     }
 
+    Builder rename(String name, String newName) {
+      Preconditions.checkArgument(partitionNames.contains(name),
+          "Cannot find an existing partition field with the name: %s", name);
+      Preconditions.checkArgument(newName != null && !newName.isEmpty(),
+          "Cannot use empty or null partition name: %s", newName);
+      Preconditions.checkArgument(!partitionNames.contains(newName),
+          "Cannot use partition name more than once: %s", newName);
+
+      for (int i = 0; i < fields.size(); ++i) {
+        PartitionField field = fields.get(i);
+        if (field.name().equals(name)) {
+          fields.set(i, new PartitionField(field.sourceId(), field.fieldId(), newName, field.transform()));
+          partitionNames.remove(name);
+          partitionNames.add(newName);
+          break;
+        }
+      }
+      return this;
+    }
+
+    int remove(String name, boolean softDelete) {

Review comment:
       I don't think that we need to change this API. This is for creating new partition specs, so it doesn't make much sense to create a field and then remove or rename it.
   
   Changing a partition spec should be handled by the update operation and it is cleaner to keep that logic separate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r450352736



##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -38,6 +38,9 @@
   public void testMultipleTimestampPartitions() {
     AssertHelpers.assertThrows("Should not allow year(ts) and year(ts)",
         IllegalArgumentException.class, "Cannot use partition name more than once",
+        () -> PartitionSpec.builderFor(SCHEMA).year("ts", "year").year("another_ts", "year").build());

Review comment:
       The context, `Should not allow year(ts) and year(ts)` isn't correct because the call uses two different source columns, `ts` and `another_ts`. It should be `Should not allow partition fields with the same name`.
   
   And since this is passing in the partition name, you can change it to avoid hitting the wrong error case. You could move this to a new test for name collisions, and update this to avoid the name collision by removing the explicit partition field name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446700894



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)

Review comment:
       👌  updated accordingly.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446700514



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());

Review comment:
       👌  Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #922: [WIP] Add partition spec evolution
URL: https://github.com/apache/incubator-iceberg/pull/922#issuecomment-614335167
 
 
   #924 just made it in, so this can implement `dropField` for v1 by replacing the field with a `void` transform. That will avoid ID problems.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r437173927



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata
+   *
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec clear();
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName);
+
+  /**
+   * Add a new partition field with year transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addYearField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with year transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.
+   * The partition field id is automatically assigned and will be updated during the commit.

Review comment:
       👌 

##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.

Review comment:
       👌 

##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata

Review comment:
       👌 

##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /**
+   * Clear all partition fields in the current partition spec.
+   * <p>
+   * This will create a new partition spec without any partition field.
+   * Partition field IDs is automatically assigned and will be updated during the commit.
+   * Table schema should be obtained from the current table metadata
+   *
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec clear();
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field id is automatically assigned and will be updated during the commit.
+   *
+   * @param sourceName the field name of the source field in the {@link PartitionSpec spec's} table schema
+   * @param targetName the name of this partition field
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec addIdentityField(String sourceName, String targetName);
+
+  /**
+   * Add a new partition field with identity transform to the partition spec.
+   * <p>
+   * The partition field name is automatically assigned set.

Review comment:
       👌 

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {

Review comment:
       Thanks for the comments. Will do.
   

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));
+    ops.commit(base, update);
+  }
+
+  private PartitionSpec freshSpecFieldIds(PartitionSpec partitionSpec) {

Review comment:
       @rdblue Yes, that will also work. Current implementation lazily reassign the partition field ids if needed during commit. 
   
   Both ways have the similar complexity. I feel the lazy way is slightly simpler as it does not track the states.  For example, if users add and remove the same field before commit,
   ```
   table.updateSpec()
           .addBucketField("data", 8)
           .removeField("data_bucket")
           .commit();
   ```
   then, there is no need to assign IDs.
   

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));

Review comment:
       `lastAssignedPartitionId` is kept inside each partition spec. metadata's `lastAssignedPartitionId ` is only used here before committing. So we can lazily compute it while looping over the partition spec history. So it is unnecessary to add an extra field into Table metadata.
   

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());

Review comment:
       @rdblue thanks for the comments. Let me check if this will introduce some duplicate code (same logic in partition spec builder and here).
   

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);

Review comment:
       👌 

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+import org.apache.iceberg.transforms.Transform;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    for (PartitionField field : base.spec().fields()) {
+      this.newSpecBuilder.add(field.sourceId(), field.fieldId(), field.name(), field.transform().toString());
+    }
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec clear() {
+    this.newSpecBuilder = PartitionSpec.builderFor(base.schema());
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName, String targetName) {
+    newSpecBuilder.identity(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addIdentityField(String sourceName) {
+    newSpecBuilder.identity(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName, String targetName) {
+    newSpecBuilder.year(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addYearField(String sourceName) {
+    newSpecBuilder.year(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName, String targetName) {
+    newSpecBuilder.month(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addMonthField(String sourceName) {
+    newSpecBuilder.month(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName, String targetName) {
+    newSpecBuilder.day(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addDayField(String sourceName) {
+    newSpecBuilder.day(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName, String targetName) {
+    newSpecBuilder.hour(sourceName, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addHourField(String sourceName) {
+    newSpecBuilder.hour(sourceName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets, String targetName) {
+    newSpecBuilder.bucket(sourceName, numBuckets, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addBucketField(String sourceName, int numBuckets) {
+    newSpecBuilder.bucket(sourceName, numBuckets);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width, String targetName) {
+    newSpecBuilder.truncate(sourceName, width, targetName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addTruncateField(String sourceName, int width) {
+    newSpecBuilder.truncate(sourceName, width);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, String transform) {
+    newSpecBuilder.add(sourceId, name, transform);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec addField(int sourceId, String name, Transform<?, ?> transform) {
+    return addField(sourceId, name, transform.toString());
+  }
+
+  @Override
+  public UpdatePartitionSpec renameField(String name, String newName) {
+    newSpecBuilder.rename(name, newName);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec removeField(String name) {
+    newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return this;
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, String transform) {
+    int sourceId = newSpecBuilder.remove(name, base.formatVersion() == 1);
+    return addField(sourceId, name, transform);
+  }
+
+  @Override
+  public UpdatePartitionSpec replaceField(String name, Transform<?, ?> transform) {
+    return replaceField(name, transform.toString());
+  }
+
+  @Override
+  public void commit() {
+    TableMetadata update = base.updatePartitionSpec(freshSpecFieldIds(apply()));
+    ops.commit(base, update);
+  }
+
+  private PartitionSpec freshSpecFieldIds(PartitionSpec partitionSpec) {

Review comment:
       @rdblue Yes, that will also work. Current implementation lazily reassign the partition field ids if needed during commit. 
   
   Both ways have the similar complexity. I feel the lazy way is slightly simpler as it does not track the states.  I will check if this is still true if we keep track a list of PartitionField instead of using partition spec builder.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-638562992


   @rdblue  I updated the PR with all change based APIs. Can you please take a look? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441204896



##########
File path: core/src/test/java/org/apache/iceberg/TestManifestReader.java
##########
@@ -103,4 +107,28 @@ public void testManifestReaderWithUpdatedPartitionMetadataForV1Table() throws IO
     }
   }
 
+  @Test
+  public void testManifestReaderWithPartitionMetadataEvolution() throws IOException {
+    table.updateSpec().clear()

Review comment:
       I don't think these test cases should use `clear`. Most evolution should happen by adding and removing fields, so our tests should do the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r446700991



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = new ArrayList<>();
+  private final Map<String, PartitionField> curSpecFields;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+    this.curSpecFields = base.spec().fields().stream().collect(
+        Collectors.toMap(
+            PartitionField::name,
+            Function.identity(),
+            (n1, n2) -> {
+              throw new IllegalStateException(String.format("Duplicate partition field found: %s", n1));
+            },
+            LinkedHashMap::new

Review comment:
       Updated the code to remove the ordering issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441191283



##########
File path: core/src/test/java/org/apache/iceberg/TestPartitionSpecUpdate.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestPartitionSpecUpdate extends TableTestBase {
+
+  private int[] expectedFieldIds;
+
+  private String[] expectedSpecs;
+
+  @Parameterized.Parameters
+  public static Object[][] parameters() {
+    return new Object[][] {
+        new Object[] { 1, new int[]{ 1000, 1001, 1001, 1000, 1000 },
+            new String[]{"  1000: data_bucket_removed: void(2)\n", "  1002: id_bucket_removed: void(1)\n"} },
+        new Object[] { 2, new int[]{ 1001, 1000, 1001, 1002, 1002 }, new String[]{"", ""} },
+    };
+  }
+
+  public TestPartitionSpecUpdate(int formatVersion, int[] expectedFieldIds, String[] expectedSpecs) {
+    super(formatVersion);
+    this.expectedFieldIds = expectedFieldIds;
+    this.expectedSpecs = expectedSpecs;
+  }
+
+  @Test
+  public void testCommitUpdatedSpec() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(1000, table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addBucketField("id", 8)
+        .addBucketField("data", 16)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[0] + ": id_bucket: bucket[8](1)\n  " +
+        expectedFieldIds[1] + ": data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[2], table.spec().lastAssignedFieldId());
+
+    table.updateSpec().clear()
+        .addTruncateField("data", 8)
+        .commit();
+
+    Assert.assertEquals("[\n  " +
+        expectedFieldIds[3] + ": data_trunc: truncate[8](2)\n" +
+        "]", table.spec().toString());
+    Assert.assertEquals(expectedFieldIds[4], table.spec().lastAssignedFieldId());
+  }
+
+  @Test
+  public void testUpdateException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if there is an invalid partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: id_bucket",
+        () -> table.updateSpec().clear()
+            .addBucketField("id", 8)
+            .addBucketField("id", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddDuplicateFieldException() {
+    AssertHelpers.assertThrows(
+        "Should throw IllegalArgumentException if adding a duplicate partition field",
+        IllegalArgumentException.class, "Cannot use partition name more than once: data_bucket",
+        () -> table.updateSpec()
+            .addBucketField("data", 16)
+            .commit());
+  }
+
+  @Test
+  public void testAddSamePartitionField() {
+    Assert.assertEquals("[\n" +
+        "  1000: data_bucket: bucket[16](2)\n" +
+        "]", table.spec().toString());
+
+    if (formatVersion == 1) {
+      table.updateSpec()
+          .addBucketField("data", 16, "data_partition")
+          .commit();
+      Assert.assertEquals("[\n" +
+          "  1000: data_bucket: bucket[16](2)\n" +
+          "  1001: data_partition: bucket[16](2)\n" +
+          "]", table.spec().toString());
+    } else {
+      AssertHelpers.assertThrows(
+          "Should throw IllegalArgumentException if adding a duplicate partition field",
+          IllegalArgumentException.class,
+          "Field Id 1000 has already been used in the existing partition fields",

Review comment:
       I don't see why this would fail with a message about the ID. The field IDs are internal and users should not be exposed to them. The exception in this case should be that the user is trying to add a partition that duplicates another.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/iceberg/pull/922#discussion_r441206150



##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.transforms.Transforms;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private final List<Consumer<PartitionSpec.Builder>> newSpecFields = new ArrayList<>();

Review comment:
       We prefer to use factory methods in `Lists` and `Maps` to instantiate collections, rather than relying on specific classes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #922:
URL: https://github.com/apache/iceberg/pull/922#issuecomment-747148191


   Most of the changes here are included in #1942 so I'll close this. It would be nice to get some of the test updates in another PR, though. Thanks @jun-he!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] jun-he commented on pull request #922: Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/incubator-iceberg/pull/922#issuecomment-633411638


   @rdblue Wondering if it is better to have another separate PR to implement change-based partition spec evolution as this PR has already addressed #836 and added the basic evolution features to Table API. Thanks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #922: [Part 1] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #922:
URL: https://github.com/apache/incubator-iceberg/pull/922#discussion_r430019241



##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -241,11 +241,25 @@ public void testAddPartitionFieldsWithAndWithoutFieldIds() {
         .add(1, "id_partition2", "bucket[5]")
         .add(1, 1005, "id_partition1", "bucket[4]")
         .truncate("s", 1, "custom_truncate")
+        .add(1, 1002, "id_partition3", "bucket[3]")
         .build();
 
     Assert.assertEquals(1000, spec.fields().get(0).fieldId());
     Assert.assertEquals(1005, spec.fields().get(1).fieldId());
     Assert.assertEquals(1006, spec.fields().get(2).fieldId());
+    Assert.assertEquals(1002, spec.fields().get(3).fieldId());
     Assert.assertEquals(1006, spec.lastAssignedFieldId());
   }
+
+  @Test
+  public void testAddPartitionFieldsWithInvalidFieldId() {

Review comment:
       Good call adding this test and the fix.

##########
File path: core/src/main/java/org/apache/iceberg/PartitionSpecUpdate.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import java.util.Map;
+
+/**
+ * PartitionSpec evolution API implementation.
+ */
+class PartitionSpecUpdate implements UpdatePartitionSpec {
+
+  private final TableMetadata base;
+  private final TableOperations ops;
+  private PartitionSpec.Builder newSpecBuilder = null;
+
+  PartitionSpecUpdate(TableOperations ops) {
+    this.ops = ops;
+    this.base = ops.current();
+  }
+
+  @Override
+  public PartitionSpec apply() {
+    Preconditions.checkNotNull(newSpecBuilder, "new partition spec is not set");
+    return newSpecBuilder.build();
+  }
+
+  @Override
+  public UpdatePartitionSpec newSpec() {

Review comment:
       I don't think it is a good idea to have a new spec "mode" in this API. By adding `newSpec`, we are splitting the API into two: one that is change-based and one that is a complete replacement. The methods added for the replacement mode aren't used for the change-based mode, which we can see from the `Preconditions` here. I think this is going to cause unnecessary confusion for callers.
   
   If we want to support the use case of completely replacing the current spec, then all we need is a `clear` method and then we can use the change-based methods:
   
   ```java
   table.updateSpec()
       .clear()
       .addField(date("ts"))
       .addField(bucket("id", 24))
       .commit();
   ```
   
   I'm not sure we want to completely replace a spec, but I don't think we want to have two APIs in this update operation either way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] jun-he commented on pull request #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on pull request #922:
URL: https://github.com/apache/incubator-iceberg/pull/922#issuecomment-619136251


   @rdblue  Thanks for the comments.
   I am thinking we will always create a new spec with a new spec id. So the change-based API will first clone an existing spec (either the latest one or another one specified by its spec id). From there, we mutate the cloned one. This will maintain the history of the spec evolution and keep the existing spec immutable. Then, the change based API will be similar to how to handle new spec.
   But this means the existing data will still use the original spec because we never modify that. 
   
   If we keep the existing spec immutable, 
    - `addField`, `renameField`, and `removeField` will be straightforward for V1 and V2.
    - `replaceField` will be equivalent to `removeField` and then `addField`. One issue is to re-use the same field ID. This should be OK for V1 as V1 always reset the field Ids for a new spec. But for V2, we cannot reuse the same field ID as it has already been used by by a partition field in one or multiple old partition specs. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on pull request #922: [Part 1] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #922:
URL: https://github.com/apache/incubator-iceberg/pull/922#issuecomment-633655191


   @jun-he, thanks for working on this.
   
   I don't think that the API here is the right one. I think that the change-based API is the only one we need, so it doesn't make sense to have a replacement API in addition. My comment above has more context.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #922: [WIP] Add partition spec evolution

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #922:
URL: https://github.com/apache/incubator-iceberg/pull/922#discussion_r414713153



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+
+  /** Update the current partition spec to a new partition spec.
+   * <p>
+   * Partition field IDs of the new partitionSpec may be updated during the commit.
+   *
+   * @param partitionSpec new partition spec to update
+   * @return this for method chaining
+   */
+  UpdatePartitionSpec update(PartitionSpec partitionSpec);

Review comment:
       It makes sense. Removed it.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org