You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/12 01:57:44 UTC

[GitHub] [iceberg] rdblue opened a new pull request #1919: Add UpdatePartitionSpec implementation

rdblue opened a new pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919


   This adds a new API for partition spec evolution and an implementation to produce updated specs. This doesn't include adding the new API to table and transaction, adding a last assigned partition ID to metadata, or documentation for the new API. This is primarily the implementation to be followed with more integration and docs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r542636561



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       The expected result of any combination should be the same as if the two were done as separate changes, except for cases that signal some problem. For example, renaming a field and deleting the original field shows that there are two conflicting changes, even though we could apply one and then fail the second.
   
   I agree that this should have a test for that case, as well as possibly some additional checks for consistency with rename.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r543512901



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       I added quite a few more test cases for situations like this and improved error messages. Now, if you attempt to rename a field that was added you get an error that clearly says you can't rename an added field, instead of an error that says the field is unknown. Same thing for combinations of adds and deletes.
   
   I also added a check to not allow adding duplicate time columns, like days("ts") and hours("ts"). That's allowed if the partition already existed, but not when adding partition fields.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johnclara commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
johnclara commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r542717533



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       Cool, so would order matter?
   is `.removeField("myField").addField("myField")` different from `.addField("myField").removeField("myField")`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#issuecomment-745710832


   > The only open question I have is what about type promotion of partition fields.
   
   Partitions fields store the type output by the partition transform. In a lot of cases, that won't change. For example, bucket always produces ints no matter what the input type. Some transforms, like identity, will produce values of the input type. The only way to change the output type for those transforms is to change the input type through column promotion. When an identity partition column is promoted from int to long or float to double, the partition column is promoted automatically as well and the existing values are promoted on read.
   
   In case you're wondering, we were also careful to make sure type promotion doesn't change the result of any partition functions. If the partition transform doesn't produce the same value when it is promoted after transformation, then promotion must be done before the transformation: `bucket(int_col, width)` is actually `bucket((long) int_col, width)`. (This is also one reason why promotion to String is dangerous and not supported.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541474451



##########
File path: api/src/main/java/org/apache/iceberg/UpdatePartitionSpec.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.exceptions.CommitFailedException;
+import org.apache.iceberg.expressions.Term;
+
+/**
+ * API for partition spec evolution.
+ * <p>
+ * When committing, these changes will be applied to the current table metadata. Commit conflicts
+ * will not be resolved and will result in a {@link CommitFailedException}.
+ */
+public interface UpdatePartitionSpec extends PendingUpdate<PartitionSpec> {
+  UpdatePartitionSpec addField(String sourceName);

Review comment:
       Will add Javadoc for these methods after there is agreement on this API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541474399



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -463,8 +464,8 @@ public Builder truncate(String sourceName, int width) {
     }
 
     public Builder alwaysNull(String sourceName, String targetName) {
-      checkAndAddPartitionName(targetName);
       Types.NestedField sourceColumn = findSourceColumn(sourceName);
+      checkAndAddPartitionName(targetName, sourceColumn.fieldId()); // can duplicate a source column name

Review comment:
       Needed to allow identity partitions that have been deleted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r542844504



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       Yes. Imagine I have partition `shard=bucket(id, 16)` and ran `.removeField("shard").addField("shard", bucket(id, 32))`. That replaces a 16-bin bucket scheme with a 32-bin bucket scheme. The opposite would add a bucket partition and then remove a bucket partition, which looks like a mistake. So the second one would be rejected, the first would be allowed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johnclara commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
johnclara commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541802223



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       Maybe add a test for `.removeField("myField").addField("myField")`? What's the expected outcome for that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johnclara commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
johnclara commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r542717533



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       Cool, so would order mater?
   is `.removeField("myField").addField("myField")` different from `.addField("myField").removeField("myField")`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541474352



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -480,9 +481,13 @@ Builder add(int sourceId, String name, String transform) {
 
     Builder add(int sourceId, int fieldId, String name, String transform) {
       Types.NestedField column = schema.findField(sourceId);
-      checkAndAddPartitionName(name, column.fieldId());
       Preconditions.checkNotNull(column, "Cannot find source column: %s", sourceId);
-      fields.add(new PartitionField(sourceId, fieldId, name, Transforms.fromString(column.type(), transform)));
+      return add(sourceId, fieldId, name, Transforms.fromString(column.type(), transform));
+    }
+
+    Builder add(int sourceId, int fieldId, String name, Transform<?, ?> transform) {

Review comment:
       Needed to add fields using `Transform`, not a string.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#issuecomment-743641137


   @jun-he, could you look at this? It implements what we discussed for partition spec evolution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johnclara commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
johnclara commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541800623



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -0,0 +1,307 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.expressions.BoundReference;
+import org.apache.iceberg.expressions.BoundTerm;
+import org.apache.iceberg.expressions.BoundTransform;
+import org.apache.iceberg.expressions.Expressions;
+import org.apache.iceberg.expressions.Term;
+import org.apache.iceberg.expressions.UnboundTerm;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.relocated.com.google.common.collect.Sets;
+import org.apache.iceberg.transforms.PartitionSpecVisitor;
+import org.apache.iceberg.transforms.Transform;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.transforms.UnknownTransform;
+import org.apache.iceberg.util.Pair;
+
+class BaseUpdatePartitionSpec implements UpdatePartitionSpec {
+  private final TableOperations ops;
+  private final boolean caseSensitive;
+  private final TableMetadata base;
+  private final int formatVersion;
+  private final PartitionSpec spec;
+  private final Schema schema;
+  private final Map<String, PartitionField> nameToField;
+  private final Map<Pair<Integer, String>, PartitionField> transformToField;
+
+  private final List<PartitionField> adds = Lists.newArrayList();
+  private final Map<Pair<Integer, String>, PartitionField> transformToAddedField = Maps.newHashMap();
+  private final Set<Object> deletes = Sets.newHashSet();
+  private final Map<String, String> renames = Maps.newHashMap();
+
+  private int lastAssignedPartitionId;
+
+  BaseUpdatePartitionSpec(TableOperations ops, boolean caseSensitive) {
+    this.ops = ops;
+    this.caseSensitive = caseSensitive;
+    this.base = ops.current();
+    this.formatVersion = base.formatVersion();
+    this.spec = base.spec();
+    this.schema = spec.schema();
+    this.nameToField = indexSpecByName(spec);
+    this.transformToField = indexSpecByTransform(spec);
+    this.lastAssignedPartitionId = spec.fields().stream().mapToInt(PartitionField::fieldId).max().orElse(999);
+
+    spec.fields().stream()
+        .filter(field -> field.transform() instanceof UnknownTransform)
+        .findAny()
+        .ifPresent(field -> {
+          throw new IllegalArgumentException("Cannot update partition spec with unknown transform: " + field);
+        });
+  }
+
+  /**
+   * For testing only.
+   */
+  @VisibleForTesting
+  BaseUpdatePartitionSpec(int formatVersion, PartitionSpec spec) {
+    this(formatVersion, spec, spec.fields().stream().mapToInt(PartitionField::fieldId).max().orElse(999));
+  }
+
+  /**
+   * For testing only.
+   */
+  @VisibleForTesting
+  BaseUpdatePartitionSpec(int formatVersion, PartitionSpec spec, int lastAssignedPartitionId) {
+    this.ops = null;
+    this.base = null;
+    this.formatVersion = formatVersion;
+    this.caseSensitive = true;
+    this.spec = spec;
+    this.schema = spec.schema();
+    this.nameToField = indexSpecByName(spec);
+    this.transformToField = indexSpecByTransform(spec);
+    this.lastAssignedPartitionId = lastAssignedPartitionId;
+  }
+
+  private int assignFieldId() {
+    this.lastAssignedPartitionId += 1;
+    return lastAssignedPartitionId;
+  }
+
+  @Override
+  public BaseUpdatePartitionSpec addField(String sourceName) {
+    return addField(Expressions.ref(sourceName));
+  }
+
+  @Override
+  public BaseUpdatePartitionSpec addField(Term term) {
+    return addField(null, term);
+  }
+
+  @Override
+  public BaseUpdatePartitionSpec addField(String name, Term term) {

Review comment:
       When javadocs are added later, maybe add a warning about not being threadsafe?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#issuecomment-745711250


   Thanks for reviewing, @danielcweeks and @johnclara! I'll merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#issuecomment-745710832


   > The only open question I have is what about type promotion of partition fields.
   
   Partitions fields store the type output by the partition transform. In a lot of cases, that won't change. For example, bucket always produces ints no matter what the input type. Some transforms, like identity, will produce values of the input type. The only way to change the output type for those transforms is to change the input type through column promotion. When an identity partition column is promoted from int to long or float to double, the partition column is promoted automatically as well and the existing values are promoted on read.
   
   In case you're wondering, we were also careful to make sure type promotion doesn't change the result of any partition functions. If the partition transform doesn't produce the same value when it is promoted after transformation, then promotion must be done before the transformation: `bucket(int_col, width)` is actually `bucket((long) int_col, width)`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r541474598



##########
File path: api/src/main/java/org/apache/iceberg/transforms/PartitionSpecVisitor.java
##########
@@ -113,37 +113,43 @@ default T unknown(int fieldId, String sourceName, int sourceId, String transform
    * @deprecated this will be removed in 0.11.0; use {@link #visit(PartitionSpec, PartitionSpecVisitor)} instead.
    */
   @Deprecated
-  @SuppressWarnings("checkstyle:CyclomaticComplexity")
   static <R> List<R> visit(Schema schema, PartitionSpec spec, PartitionSpecVisitor<R> visitor) {
     List<R> results = Lists.newArrayListWithExpectedSize(spec.fields().size());
 
     for (PartitionField field : spec.fields()) {
-      String sourceName = schema.findColumnName(field.sourceId());
-      Transform<?, ?> transform = field.transform();
-
-      if (transform instanceof Identity) {
-        results.add(visitor.identity(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform instanceof Bucket) {
-        int numBuckets = ((Bucket<?>) transform).numBuckets();
-        results.add(visitor.bucket(field.fieldId(), sourceName, field.sourceId(), numBuckets));
-      } else if (transform instanceof Truncate) {
-        int width = ((Truncate<?>) transform).width();
-        results.add(visitor.truncate(field.fieldId(), sourceName, field.sourceId(), width));
-      } else if (transform == Dates.YEAR || transform == Timestamps.YEAR) {
-        results.add(visitor.year(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform == Dates.MONTH || transform == Timestamps.MONTH) {
-        results.add(visitor.month(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform == Dates.DAY || transform == Timestamps.DAY) {
-        results.add(visitor.day(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform == Timestamps.HOUR) {
-        results.add(visitor.hour(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform instanceof VoidTransform) {
-        results.add(visitor.alwaysNull(field.fieldId(), sourceName, field.sourceId()));
-      } else if (transform instanceof UnknownTransform) {
-        results.add(visitor.unknown(field.fieldId(), sourceName, field.sourceId(), transform.toString()));
-      }
+      results.add(visit(schema, field, visitor));
     }
 
     return results;
   }
+
+  static <R> R visit(Schema schema, PartitionField field, PartitionSpecVisitor<R> visitor) {

Review comment:
       Refactored this out of the method above to visit individual fields, for setting default partition names based on the transform.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1919: Add UpdatePartitionSpec implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1919:
URL: https://github.com/apache/iceberg/pull/1919#discussion_r542844504



##########
File path: core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpecV1.java
##########
@@ -0,0 +1,444 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.transforms.Transforms;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.expressions.Expressions.bucket;
+import static org.apache.iceberg.expressions.Expressions.day;
+import static org.apache.iceberg.expressions.Expressions.hour;
+import static org.apache.iceberg.expressions.Expressions.month;
+import static org.apache.iceberg.expressions.Expressions.ref;
+import static org.apache.iceberg.expressions.Expressions.truncate;
+import static org.apache.iceberg.expressions.Expressions.year;
+
+public class TestUpdatePartitionSpecV1 {

Review comment:
       Yes. Imagine I have partition `shard=bucket(id, 16)` and ran `.removeField("shard").addField("shard", bucket(id, 32))`. That replaces a 16-bin bucket scheme with a 32-bin bucket scheme. The opposite would add a bucket partition and then remove it, which looks like a mistake. So the second one would be rejected, the first would be allowed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org