You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/04 19:52:53 UTC

[GitHub] [iceberg] huaxingao opened a new pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

huaxingao opened a new pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845


   Please see here https://github.com/apache/iceberg/issues/3709 for a complete description of the issue.
   
   Basically, if a partitioned table has a column named "partition", any operation on the metadata table will fail with the following error:
   ```
   spark.table("my.test.table.files").show()
   2021-12-10 15:18:22,405 [main] INFO org.apache.iceberg.BaseTableScan - Scanning table my.test.table snapshot 1482056096771695160 created at 2021-12-10 15:18:15.323 with filter true
   java.lang.IllegalArgumentException: Cannot create identity partition sourced from different field in schema: partition
     at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
     at org.apache.iceberg.PartitionSpec$Builder.checkAndAddPartitionName(PartitionSpec.java:344)
     at org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:380)
     at org.apache.iceberg.BaseMetadataTable.lambda$transformSpec$0(BaseMetadataTable.java:66)
     at org.apache.iceberg.relocated.com.google.common.collect.ImmutableList.forEach(ImmutableList.java:405)
     at org.apache.iceberg.BaseMetadataTable.transformSpec(BaseMetadataTable.java:66)
     at org.apache.iceberg.DataFilesTable$FilesTableScan.planFiles(DataFilesTable.java:118)
     at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:208)
     at org.apache.iceberg.DataFilesTable$FilesTableScan.planFiles(DataFilesTable.java:68)
     at org.apache.iceberg.BaseTableScan.planTasks(BaseTableScan.java:241)
   ```
   This PR fixes the bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780411253



##########
File path: core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
##########
@@ -63,7 +63,8 @@ protected BaseMetadataTable(TableOperations ops, Table table, String name) {
    */
   static PartitionSpec transformSpec(Schema metadataTableSchema, PartitionSpec spec, String partitionPrefix) {
     PartitionSpec.Builder identitySpecBuilder = PartitionSpec.builderFor(metadataTableSchema);

Review comment:
       Fixed. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780399458



##########
File path: core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
##########
@@ -63,7 +63,8 @@ protected BaseMetadataTable(TableOperations ops, Table table, String name) {
    */
   static PartitionSpec transformSpec(Schema metadataTableSchema, PartitionSpec spec, String partitionPrefix) {
     PartitionSpec.Builder identitySpecBuilder = PartitionSpec.builderFor(metadataTableSchema);

Review comment:
       call checkConflict here instead of in the loop, it will also make it all fit on one line below :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779842498



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I added `checkConflict` in Builder. It does look cleaner this way :)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#issuecomment-1007024823


   > > Is it correct to say that we can ignore checkConflict when we know there is no conflict? Like in the transformSpec, we can set checkConflict to false because we're already adding partitionPrefix?
   > > My understanding is that we can ignore checkConflict when we know there is no conflict.
   
   That's what I think is a little broken in the current api, we have a "buildUnchecked" method which I would think should ignore all conflicts and such but since we check for conflicts during the "identity" method we can't get to that point


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778375629



##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestPartitionedTable.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark.sql;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.List;
+import java.util.Map;
+
+public class TestPartitionedTable extends SparkCatalogTestBase {

Review comment:
       I think it is good to have this test, I would probably move it to one of the existing suites though, either TestMetadataTablesWithEvolution or TestSelect or just something that already exists since I don't think this needs it's own class.
   
   In addition I think we really should have a test in "core" since that's where we actually hit this, Spark isn't really required for this particular failure so maybe add a test to TestMetadataTableScans? Or a test on PartitionSpecBuilder itself if you change the builder?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778366763



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I wonder if `checkConflict` should just be a part of the builder? It seems a little odd that just the identity method has this flag. Like maybe you have something like
   
   partitionSpec.builder().checkConflicts(false) ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780420585



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       Done. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#issuecomment-1007022493


   > Is it correct to say that we can ignore checkConflict when we know there is no conflict? Like in the transformSpec, we can set checkConflict to false because we're already adding partitionPrefix?
   My understanding is that we can ignore checkConflict when we know there is no conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780420911



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       Ok let's just wait for tests now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778607288



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);

Review comment:
       Added. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#issuecomment-1007798382


   Thank you very much everyone! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#issuecomment-1006765201


   @RussellSpitzer @kbendick 
   Could you please check one more time when you have a chance? Thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780399458



##########
File path: core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
##########
@@ -63,7 +63,8 @@ protected BaseMetadataTable(TableOperations ops, Table table, String name) {
    */
   static PartitionSpec transformSpec(Schema metadataTableSchema, PartitionSpec spec, String partitionPrefix) {
     PartitionSpec.Builder identitySpecBuilder = PartitionSpec.builderFor(metadataTableSchema);

Review comment:
       call checkConflict here instead of in the loop, it will also make it all fit on one line below :)

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "column x = 5" into where "partition.x = 5"

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "x = 5" into where "partition.x = 5"

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "x = 5" into where "partition.x = 5", normally we wouldn't allow you to make a partition column "partition" on the element of a field named "partition" because it would be ambiguous but here it isn't since we know we must be referring to the nested "partition" and not the parent one.

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       one more tiny nit, but I think this should be checkConflicts. After that I think we are good to go

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       one more tiny nit, but I think this should be checkConflicts. After that I think we are good to go, also good call on keeping this protected.

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       Ok let's just wait for tests now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#issuecomment-1007798382


   Thank you very much everyone! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780412042



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "column x = 5" into where "partition.x = 5"

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "x = 5" into where "partition.x = 5"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780412042



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780413727



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       one more tiny nit, but I think this should be checkConflicts. After that I think we are good to go, also good call on keeping this protected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778373438



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);

Review comment:
       Nit: If we leave it as a boolean argument, please add an in-line comment indicating what it is, like `true /* checkConflict */)`.
   
   That can be done at the end though 😄 

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {
+        if (sourceColumnId != null) {
+          // for identity transform case we allow  conflicts between partition and schema field name as
+          //   long as they are sourced from the same schema field
+          Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
+                  "Cannot create identity partition sourced from different field in schema: %s", name);
+        } else {
+          // for all other transforms we don't allow conflicts between partition name and schema field name
+          Preconditions.checkArgument(schemaField == null,
+                  "Cannot create partition from name that exists in schema: %s", name);

Review comment:
       Nit: Over-indented (can be fixed before merging).

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I do agree that it might make sense to add to the builder. However, since it's a pretty specific occurrence (and really only applies to `identity`) would that be possibly confusing to users about when it should be called and what it would be applied to?
   
   It seems that `BaseMetadataTable#transformSpec` is the only place that uses `identity` with this boolean argument as false. Is it possible to get rid of that and then `checkAndAddPartitionName` could just have some sort of check for whether or not this is `identity` transform?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779782694



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I'm not sure the comment makes it more clear for me. I think it may be better to just remove the comment. I think I'd still rather it was a builder wide flag rather than an internal method with a flag but If you think it is cleaner this way that is ok.
   
   I have a few other possible suggestions
       1. Move the Check for identity added fields to "checkCompatibility" that way you could just do a buildUnchecked  in TransformSpec to ignore the invalid identity
       2. Change transformSpec to just construct the Partition Spec without the builder
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778606923



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I agree it's a little odd that just the identity method has the `checkConflict` flag. But the only place that we don't need to check conflict and allow conflict between partition name and schema field name is `BaseMetadataTable#transformSpec`. All other transforms don't allow conflicts between partition name and schema field so the `checkConflict` flag defaults to true. Since it only can be false in `BaseMetadataTable#transformSpec`, I guess it might be ok to do it in the identity method than add to the builder?
   
   I tried to have a different function but couldn't come with a neat one. There are quite bit some code duplication. 
   
    I added some comment to explain a the flag. Hope it looks a little better now. 

##########
File path: spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/sql/TestPartitionedTable.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.spark.sql;
+
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.spark.SparkCatalogTestBase;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.List;
+import java.util.Map;
+
+public class TestPartitionedTable extends SparkCatalogTestBase {

Review comment:
       I moved the test to `TestMetadataTablesWithEvolution` and also add a test in `TestMetadataTableScans`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780413727



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       one more tiny nit, but I think this should be checkConflicts. After that I think we are good to go




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780557828



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflicts(boolean check) {
+      checkConflicts = check;
+      return this;
+    }
+
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflicts) {

Review comment:
       Looks like there's no other way around this (even using `add` directly) so I think this is the cleanest way to fix it. Thanks, @huaxingao!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780411253



##########
File path: core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
##########
@@ -63,7 +63,8 @@ protected BaseMetadataTable(TableOperations ops, Table table, String name) {
    */
   static PartitionSpec transformSpec(Schema metadataTableSchema, PartitionSpec spec, String partitionPrefix) {
     PartitionSpec.Builder identitySpecBuilder = PartitionSpec.builderFor(metadataTableSchema);

Review comment:
       Fixed. Thanks!

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflict(boolean check) {

Review comment:
       Done. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778385659



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       I do agree that it might make sense to add to the builder. However, since it's a pretty specific occurrence (and really only applies to `identity`) would that be possibly confusing to users about when it should be called and what it would be applied to?
   
   It seems that `BaseMetadataTable#transformSpec` is the only place that uses `identity` with this boolean argument as false. Would it be better to have a different function for metadata tables to use to bypass this issue?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779941363



##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709
+        required(2, "partition", Types.IntegerType.get())
+    );
+    this.metadataDir = new File(tableDir, "metadata");
+    PartitionSpec spec = PartitionSpec.builderFor(schema)
+        .identity("partition")
+        .build();
+
+    DataFile par0 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(0))
+        .withRecordCount(1)
+        .build();
+    DataFile par1 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(1))
+        .withRecordCount(1)
+        .build();
+    DataFile par2 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(2))
+        .withRecordCount(1)
+        .build();
+
+    this.table = create(schema, spec);
+    table.newFastAppend()
+        .appendFile(par0)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par1)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par2)
+        .commit();
+
+    Table partitionsTable = new PartitionsTable(table.ops(), table);
+
+    Expression andEquals = Expressions.and(
+        Expressions.equal("partition.partition", 0),
+        Expressions.greaterThan("record_count", 0));
+    TableScan scanAndEq = partitionsTable.newScan().filter(andEquals);
+    CloseableIterable<FileScanTask> tasksAndEq = PartitionsTable.planFiles((StaticTableScan) scanAndEq);
+    Assert.assertEquals(1, Iterators.size(tasksAndEq.iterator()));
+    validateIncludesPartitionScan(tasksAndEq, 0);
+    TestTables.clearTables();
+  }

Review comment:
       Yes. 
   This is actually not needed. I only need to manually set up table because my `SCHEMA` and `SPEC` are different. No need to manually call `clearTables`. I will remove line 519.

##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709

Review comment:
       I remove the comments. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780412042



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       In this case it's because we arent "really" making a partitioning spec, we are just using it to get the transform so that pushdown will work correctly. The Metadata Table needs a way to convert where "x = 5" into where "partition.x = 5", normally we wouldn't allow you to make a partition column "partition" on the element of a field named "partition" because it would be ambiguous but here it isn't since we know we must be referring to the nested "partition" and not the parent one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r779880513



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {

Review comment:
       Agreed it does look cleaner in the builder!
   
   Question for my own full understanding: Is it correct to say that we can ignore `checkConflict` when we know there is no conflict? Like in the transformSpec, we can set `checkConflict` to `false` because we're already adding `partitionPrefix`?

##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709
+        required(2, "partition", Types.IntegerType.get())
+    );
+    this.metadataDir = new File(tableDir, "metadata");
+    PartitionSpec spec = PartitionSpec.builderFor(schema)
+        .identity("partition")
+        .build();
+
+    DataFile par0 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(0))
+        .withRecordCount(1)
+        .build();
+    DataFile par1 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(1))
+        .withRecordCount(1)
+        .build();
+    DataFile par2 = DataFiles.builder(spec)
+        .withPath("/path/to/data-0.parquet")
+        .withFileSizeInBytes(10)
+        .withPartition(TestHelpers.Row.of(2))
+        .withRecordCount(1)
+        .build();
+
+    this.table = create(schema, spec);
+    table.newFastAppend()
+        .appendFile(par0)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par1)
+        .commit();
+    table.newFastAppend()
+        .appendFile(par2)
+        .commit();
+
+    Table partitionsTable = new PartitionsTable(table.ops(), table);
+
+    Expression andEquals = Expressions.and(
+        Expressions.equal("partition.partition", 0),
+        Expressions.greaterThan("record_count", 0));
+    TableScan scanAndEq = partitionsTable.newScan().filter(andEquals);
+    CloseableIterable<FileScanTask> tasksAndEq = PartitionsTable.planFiles((StaticTableScan) scanAndEq);
+    Assert.assertEquals(1, Iterators.size(tasksAndEq.iterator()));
+    validateIncludesPartitionScan(tasksAndEq, 0);
+    TestTables.clearTables();
+  }

Review comment:
       Nit: Does this remove the `metadataDir` on exit / completion?

##########
File path: core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java
##########
@@ -459,6 +460,65 @@ public void testDataFilesTableSelection() throws IOException {
     Assert.assertEquals(expected, scan.schema().asStruct());
   }
 
+  @Test
+  public void testPartitionColumnNamedPartition() throws Exception {
+    TestTables.clearTables();
+    this.tableDir = temp.newFolder();
+    tableDir.delete();
+
+    Schema schema = new Schema(
+        required(1, "id", Types.IntegerType.get()),
+        // create a column named "partition", to recreate the problem in
+        // https://github.com/apache/iceberg/issues/3709

Review comment:
       Nit: This comment feels a little redundant given the test name. Unlike the Spark repo, we don't usually reference the issue when adding a test (though I think that's a good practice in general).
   
   Maybe just `// See https://github.com/apache/iceberg/issues/3709` towards the top of the test if you feel it's necessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partitition'

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r778607367



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -337,16 +337,22 @@ private void checkAndAddPartitionName(String name) {
     }
 
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
+      checkAndAddPartitionName(name, sourceColumnId, true);
+    }
+
+    private void checkAndAddPartitionName(String name, Integer sourceColumnId, boolean checkConflict) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflict) {
+        if (sourceColumnId != null) {
+          // for identity transform case we allow  conflicts between partition and schema field name as
+          //   long as they are sourced from the same schema field
+          Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
+                  "Cannot create identity partition sourced from different field in schema: %s", name);
+        } else {
+          // for all other transforms we don't allow conflicts between partition name and schema field name
+          Preconditions.checkArgument(schemaField == null,
+                  "Cannot create partition from name that exists in schema: %s", name);

Review comment:
       Fixed. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3845: Partition Metadata table breaks with a partition column named 'partition'

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3845:
URL: https://github.com/apache/iceberg/pull/3845#discussion_r780557828



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -336,17 +338,24 @@ private void checkAndAddPartitionName(String name) {
       checkAndAddPartitionName(name, null);
     }
 
+    Builder checkConflicts(boolean check) {
+      checkConflicts = check;
+      return this;
+    }
+
     private void checkAndAddPartitionName(String name, Integer sourceColumnId) {
       Types.NestedField schemaField = schema.findField(name);
-      if (sourceColumnId != null) {
-        // for identity transform case we allow  conflicts between partition and schema field name as
-        //   long as they are sourced from the same schema field
-        Preconditions.checkArgument(schemaField == null || schemaField.fieldId() == sourceColumnId,
-            "Cannot create identity partition sourced from different field in schema: %s", name);
-      } else {
-        // for all other transforms we don't allow conflicts between partition name and schema field name
-        Preconditions.checkArgument(schemaField == null,
-            "Cannot create partition from name that exists in schema: %s", name);
+      if (checkConflicts) {

Review comment:
       Looks like there's no other way around this (even using `add` directly) so I think this is the cleanest way to fix it. Thanks, @huaxingao!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org