You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/10 05:20:07 UTC

[GitHub] [iceberg] vinson0526 opened a new pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

vinson0526 opened a new pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691


   When use v1 metadata, we cannot add  a new partition that previously existed.
   Further more, exception will thrown when we update partition spec after dropped partition transform more than once on same column.
   
   I use iceberg master branch and spark 3.0.2.
   we can reproduce follow these step.
   ```
   create table test(create_time timestamp) using iceberg;
   alter table test add partition field years(create_time);
   alter table test drop partition field years(create_time);
   alter table test add partition field months(create_time);
   alter table test drop partition field months(create_time);
   alter table test add partition field days(create_time);
   ```
   and exception thrown
   ```
   Multiple entries with same key: (1, void)=1001: create_time_month: void(1) and (1, void)=1000: create_time_year: void(1)
   ```
   
   I try to fix it by this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#issuecomment-864157268


   Thanks for the update, @vinson0526! I kicked off the tests and will merge once they pass. The changes look great!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#issuecomment-858457683


   Started the tests. Let's see if we break something, or not 😄 
   
   Left one question in review, but generally looks good to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r652501182



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       Use Maps.newHashMap instead of ImmutableMap.Builder in newest commit. Please take a look again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r653929734



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -149,7 +151,14 @@ public BaseUpdatePartitionSpec addField(String name, Term term) {
       nameToAddedField.put(name, newField);
     }
 
-    adds.add(newField);
+    String partitionName;
+    if (newField.name() != null) {
+      partitionName = newField.name();
+    } else {
+      partitionName = PartitionSpecVisitor.visit(schema, newField, PartitionNameGenerator.INSTANCE);
+    }
+
+    adds.add(Pair.of(newField, partitionName));

Review comment:
       The `PartitionField` is created in this method, as is `partitionName`. I think that by reordering a couple of statements here, you can use the correct name on the field from the start and avoid more changes.

##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -192,6 +211,12 @@ public BaseUpdatePartitionSpec removeField(Term term) {
 
   @Override
   public BaseUpdatePartitionSpec renameField(String name, String newName) {
+    PartitionField existingField = nameToField.get(newName);
+    if (existingField != null && isVoidTransform(existingField)) {
+      // rename the old deleted field that is being replaced by the new field
+      renameField(existingField.name(), existingField.name() + "_" + UUID.randomUUID());

Review comment:
       Why use `UUID.randomUUID()` instead of the partition field ID? I think it makes more sense to use `existingField.fieldId()`.

##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -145,11 +146,29 @@ public BaseUpdatePartitionSpec addField(String name, Term term) {
     checkForRedundantAddedPartitions(newField);
 
     transformToAddedField.put(validationKey, newField);
-    if (name != null) {
-      nameToAddedField.put(name, newField);
+
+    String partitionName;
+    if (newField.name() != null) {
+      partitionName = newField.name();
+    } else {
+      partitionName = PartitionSpecVisitor.visit(schema, newField, PartitionNameGenerator.INSTANCE);
     }
 
-    adds.add(newField);
+    PartitionField existingField = nameToField.get(partitionName);
+    if (existingField != null) {
+      if (isVoidTransform(existingField)) {
+        // rename the old deleted field that is being replaced by the new field
+        renameField(existingField.name(), existingField.name() + "_" + UUID.randomUUID());

Review comment:
       Here as well, I'd prefer not to use a UUID. This should be able to use the existing field's ID instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r654136628



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -149,7 +151,14 @@ public BaseUpdatePartitionSpec addField(String name, Term term) {
       nameToAddedField.put(name, newField);
     }
 
-    adds.add(newField);
+    String partitionName;
+    if (newField.name() != null) {
+      partitionName = newField.name();
+    } else {
+      partitionName = PartitionSpecVisitor.visit(schema, newField, PartitionNameGenerator.INSTANCE);
+    }
+
+    adds.add(Pair.of(newField, partitionName));

Review comment:
       Regenerate `PartitionField` with `partitionName ` if `name` is null.

##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -192,6 +211,12 @@ public BaseUpdatePartitionSpec removeField(Term term) {
 
   @Override
   public BaseUpdatePartitionSpec renameField(String name, String newName) {
+    PartitionField existingField = nameToField.get(newName);
+    if (existingField != null && isVoidTransform(existingField)) {
+      // rename the old deleted field that is being replaced by the new field
+      renameField(existingField.name(), existingField.name() + "_" + UUID.randomUUID());

Review comment:
       change to `existingField.fieldId()`

##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -145,11 +146,29 @@ public BaseUpdatePartitionSpec addField(String name, Term term) {
     checkForRedundantAddedPartitions(newField);
 
     transformToAddedField.put(validationKey, newField);
-    if (name != null) {
-      nameToAddedField.put(name, newField);
+
+    String partitionName;
+    if (newField.name() != null) {
+      partitionName = newField.name();
+    } else {
+      partitionName = PartitionSpecVisitor.visit(schema, newField, PartitionNameGenerator.INSTANCE);
     }
 
-    adds.add(newField);
+    PartitionField existingField = nameToField.get(partitionName);
+    if (existingField != null) {
+      if (isVoidTransform(existingField)) {
+        // rename the old deleted field that is being replaced by the new field
+        renameField(existingField.name(), existingField.name() + "_" + UUID.randomUUID());

Review comment:
       use the existing field's ID instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary merged pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r652039253



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       I don't think we want to modify that check method. Instead, this update class should be modified to rename void fields if necessary. That isn't very difficult because we can track whether there are existing fields with the same name and rename any field that uses a void transform.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r650968638



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       ImmutableMap.Builder disallow multiple entries with same key, and multiple void transforms with the same source id generate same key here. So this builder will throw an exception with msg such as "Multiple entries with same key:  (1, void)=1001: create_time_month: void(1) and (1, void)=1000: create_time_year: void(1)".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#issuecomment-864959110


   Since every test passed and got +1 from @rdblue, merged the change.
   
   Thanks for the fix @vinson0526 and @rdblue, @marton-bod for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649096633



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       I know wikipedia is not every time the source of truth, but [here](https://en.wikipedia.org/wiki/Universally_unique_identifier) it states:
   
   > A universally unique identifier (UUID) is a 128-bit label used for information in computer systems.
   > [..]
   > When generated according to the standard methods, UUIDs are, for practical purposes, unique."
   
   But reading again the comment in the code, if the fieldId is always autogenerated and never `null` then we are fine with your solution as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
marton-bod commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649845894



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       Just a thought: would it make sense to just do `!(field.transform() instanceof VoidTransform)` here? Then you wouldn't need the extra interface method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649005476



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       I am not an expert with this part of the code, but the comment states: `field IDs were not required for v1`
   Could we end up with the same `field.name() + "_" + null` if the above happens?
   
   Maybe it would be better to add a random generated UUID instead to avoid any chance of conflicts.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649871147



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       yes, in this case, they are same. i add the extra interface method because void transform is a particular transform, this method maybe used in other place later.  if we don't want to add extra interface method, i can change it to `!(field.transform() instanceof VoidTransform)`.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r652036978



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       In that case, I think we should use `Maps.newHashMap` instead of `ImmutableMap.Builder`. That way we don't need to change the `Transform` API to add `isVoid`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#issuecomment-864959110


   Since every test passed and got +1 from @rdblue, merged the change.
   
   Thanks for the fix @vinson0526 and @rdblue, @marton-bod for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649209033



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       thank you for provide useful information about UUID.
   and sorry about ut failed again. I think it will be work with the newest commit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r652501957



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       The newest commit modified update class to rename void fields if necessary. Please take a look again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r651004064



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       We have two choice when a a conflict is detected:
   1. to rename the existing void transform, that will significant add complexity to [the check method](https://github.com/apache/iceberg/blob/35691728df3b6930045271b1e5663c8516b52301/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L339), just to find which void transform conflict with the current added transform.
   2. to rename the new transform, that will either broke default partition name consistency, or change the name assigned by user




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#issuecomment-859362489


   @marton-bod, @lcspinter: Could you please take a look as well?
   
   Thanks,
   Peter 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary merged pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vinson0526 commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
vinson0526 commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r649020719



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       we cannot use constant name for removed field, because partition spec field name must be unique.
   Since removed field is not really used, we need to change it name to some that not conflict with new field.
   UUID cannot ensure generate unique.
   field.fieldId() is unique, because we generate it from 1000 and auto-increment for new field.
   Maybe field.name() + field.fieldId() + UUID is better?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2691: fix: add and remove partition transform on same column failed when use v1 metadata

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2691:
URL: https://github.com/apache/iceberg/pull/2691#discussion_r650442088



##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -223,7 +223,9 @@ public PartitionSpec apply() {
         // field IDs were not required for v1 and were assigned sequentially in each partition spec starting at 1,000.
         // to maintain consistent field ids across partition specs in v1 tables, any partition field that is removed
         // must be replaced with a null transform. null values are always allowed in partition data.
-        builder.add(field.sourceId(), field.fieldId(), field.name(), Transforms.alwaysNull());
+        // To avoid name conflict when add and remove same partition transform multiple times, field name will be
+        // replaced by field name append with field id.
+        builder.add(field.sourceId(), field.fieldId(), field.name() + "_" + field.fieldId(), Transforms.alwaysNull());

Review comment:
       Why do this instead of just renaming the field when a conflict is detected later? That seems like a better way to fix it to me.

##########
File path: core/src/main/java/org/apache/iceberg/BaseUpdatePartitionSpec.java
##########
@@ -290,7 +292,9 @@ private void checkForRedundantAddedPartitions(PartitionField field) {
     ImmutableMap.Builder<Pair<Integer, String>, PartitionField> builder = ImmutableMap.builder();
     List<PartitionField> fields = spec.fields();
     for (PartitionField field : fields) {
-      builder.put(Pair.of(field.sourceId(), field.transform().toString()), field);
+      if (!field.transform().isVoid()) {

Review comment:
       Is this needed?
   
   The map created by this method is only used in `addField` and `removeField`. My guess is that your intent is to avoid blocking multiple void transforms with the same source id because of multiple calls to remove. But the `apply` method works with the builder directly and so this map doesn't need to change if I'm reading the existing code correctly. What is the case where this causes a failure?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org