You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ajantha-bhat (via GitHub)" <gi...@apache.org> on 2023/05/08 05:39:41 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #7551: Core: Check for all specs in partitionsTable

ajantha-bhat opened a new pull request, #7551:
URL: https://github.com/apache/iceberg/pull/7551

   Fixes #7533 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1190464022


##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -36,6 +36,8 @@ public class PartitionsTable extends BaseMetadataTable {
 
   private final Schema schema;
 
+  private final boolean isUnpartitionedTable;

Review Comment:
   Nit: can we remove is?  Maybe just 'unpartitioned'?   From: https://iceberg.apache.org/contribute/#java-style-guidelines



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1191837548


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,47 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {

Review Comment:
   ok. Added it back.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7551: Core: Check for all specs in partitionsTable

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#issuecomment-1542181597

   @nastra: I have addressed the comments. Thanks for the review. @szehon-ho: Would you like to take a look at this?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1190463368


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,47 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {

Review Comment:
   Actually we have a test specifically for this: TestMetadataTableScansWithPartitionEvolution, for organization purpose?  Could we move it there? 
   
   Also another question, does this test the case?  I thought it was only failing on V2, because I had seen on V1 the partition transform remains as void on the latest spec, when removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #7551:
URL: https://github.com/apache/iceberg/pull/7551


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1190463368


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,47 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {

Review Comment:
   Actually we have a test specifically for this: TestMetadataTableScansWithPartitionEvolution, for organization purpose?  Could we move it there? 
   
   Also another question, does this test the case?  I thought it was only failing on V2, because in V1 the partition transform remains as void on the latest spec, when removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1188643794


##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -71,6 +73,7 @@ public class PartitionsTable extends BaseMetadataTable {
                 "equality_delete_file_count",
                 Types.IntegerType.get(),
                 "Count of equality delete files"));
+    this.isUnpartitionedTable = Partitioning.partitionType(table).fields().size() < 1;

Review Comment:
   ```suggestion
       this.isUnpartitionedTable = Partitioning.partitionType(table).fields().isEmpty();
   ```



##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,50 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {
+    preparePartitionedTable();
+
+    // Remove partition field
+    table.updateSpec().removeField(Expressions.bucket("data", 16)).commit();
+
+    DataFile data10 =
+        DataFiles.builder(table.spec())
+            .withPath("/path/to/data-10.parquet")
+            .withRecordCount(10)
+            .withFileSizeInBytes(10)
+            .build();
+    table.newFastAppend().appendFile(data10).commit();
+
+    PartitionsTable partitionsTable = new PartitionsTable(table);
+    // must contain the partition column even the current spec is non-partitioned.

Review Comment:
   ```suggestion
       // must contain the partition column even when the current spec is non-partitioned.
   ```



##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,50 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {
+    preparePartitionedTable();
+
+    // Remove partition field
+    table.updateSpec().removeField(Expressions.bucket("data", 16)).commit();
+
+    DataFile data10 =
+        DataFiles.builder(table.spec())
+            .withPath("/path/to/data-10.parquet")
+            .withRecordCount(10)
+            .withFileSizeInBytes(10)
+            .build();
+    table.newFastAppend().appendFile(data10).commit();
+
+    PartitionsTable partitionsTable = new PartitionsTable(table);
+    // must contain the partition column even the current spec is non-partitioned.
+    Assertions.assertThat(partitionsTable.schema().findField("partition")).isNotNull();
+
+    TableScan scanNoFilter = partitionsTable.newScan().select("partition");
+    try (CloseableIterable<ContentFile<?>> files =
+        PartitionsTable.planFiles((StaticTableScan) scanNoFilter)) {
+      if (formatVersion == 2) {
+        // four partitioned data files, four partitioned delete files and one non-partitioned data
+        // file.
+        Assert.assertEquals(9, Iterators.size(files.iterator()));
+      } else {
+        // four partitioned data files and one non-partitioned data file.
+        Assert.assertEquals(5, Iterators.size(files.iterator()));

Review Comment:
   ```suggestion
           Assertions.assertThat(files).hasSize(5);
   ```



##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,50 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {
+    preparePartitionedTable();
+
+    // Remove partition field
+    table.updateSpec().removeField(Expressions.bucket("data", 16)).commit();
+
+    DataFile data10 =
+        DataFiles.builder(table.spec())
+            .withPath("/path/to/data-10.parquet")
+            .withRecordCount(10)
+            .withFileSizeInBytes(10)
+            .build();
+    table.newFastAppend().appendFile(data10).commit();
+
+    PartitionsTable partitionsTable = new PartitionsTable(table);
+    // must contain the partition column even the current spec is non-partitioned.
+    Assertions.assertThat(partitionsTable.schema().findField("partition")).isNotNull();
+
+    TableScan scanNoFilter = partitionsTable.newScan().select("partition");
+    try (CloseableIterable<ContentFile<?>> files =
+        PartitionsTable.planFiles((StaticTableScan) scanNoFilter)) {
+      if (formatVersion == 2) {
+        // four partitioned data files, four partitioned delete files and one non-partitioned data
+        // file.
+        Assert.assertEquals(9, Iterators.size(files.iterator()));

Review Comment:
   ```suggestion
           Assertions.assertThat(files).hasSize(9);
   ```



##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,50 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {
+    preparePartitionedTable();
+
+    // Remove partition field
+    table.updateSpec().removeField(Expressions.bucket("data", 16)).commit();
+
+    DataFile data10 =
+        DataFiles.builder(table.spec())
+            .withPath("/path/to/data-10.parquet")
+            .withRecordCount(10)
+            .withFileSizeInBytes(10)
+            .build();
+    table.newFastAppend().appendFile(data10).commit();
+
+    PartitionsTable partitionsTable = new PartitionsTable(table);
+    // must contain the partition column even the current spec is non-partitioned.
+    Assertions.assertThat(partitionsTable.schema().findField("partition")).isNotNull();
+
+    TableScan scanNoFilter = partitionsTable.newScan().select("partition");
+    try (CloseableIterable<ContentFile<?>> files =
+        PartitionsTable.planFiles((StaticTableScan) scanNoFilter)) {
+      if (formatVersion == 2) {
+        // four partitioned data files, four partitioned delete files and one non-partitioned data
+        // file.
+        Assert.assertEquals(9, Iterators.size(files.iterator()));
+      } else {
+        // four partitioned data files and one non-partitioned data file.
+        Assert.assertEquals(5, Iterators.size(files.iterator()));
+      }
+
+      // check for null partition value.
+      Assert.assertTrue(

Review Comment:
   ```suggestion
         Assertions.assertThat(StreamSupport.stream(files.spliterator(), false))
             .anyMatch(
                 file -> {
                   StructLike partition = file.partition();
   
                   return Objects.equals(null, partition.get(0, Object.class));
                 });
   ```



##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,50 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {
+    preparePartitionedTable();
+
+    // Remove partition field
+    table.updateSpec().removeField(Expressions.bucket("data", 16)).commit();
+
+    DataFile data10 =

Review Comment:
   nit: maybe just `dataFile`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7551: Core: Check for all specs in partitionsTable

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#issuecomment-1537782242

   cc: @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1191661451


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java:
##########
@@ -222,6 +222,14 @@ public void testPositionDeletesPartitionSpecRemoval() {
         constantsMap(posDeleteTask, partitionType).get(MetadataColumns.FILE_PATH.fieldId()));
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() {

Review Comment:
   feels like test name is not super clear, maybe `testPreviouslyPartitionedSpecStillShowPartitionColumnWhenEvolveToUnpartitioned` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1190739364


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,47 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {

Review Comment:
   I moved it now. 
   
   > Also another question, does this test the case? I thought it was only failing on V2, because I had seen on V1 the partition transform remains as void on the latest spec, when removed.
   
   Yes, the testcase fails only for v2 without the PR changes as V1 will have void transform to satisfy the check. But I thought it can run for both.
   
   BTW, now I removed the scan code in the testcase because my intention was to just scan spec_id column and see it is filled instead of null. But I don't know how to achieve it (In sql or spark code I know. But in this table scan I am not sure as instead of rows it returns content files) 



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -36,6 +36,8 @@ public class PartitionsTable extends BaseMetadataTable {
 
   private final Schema schema;
 
+  private final boolean isUnpartitionedTable;

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#issuecomment-1546010021

   Merged, thanks @ajantha-bhat and @nastra for additional review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1191689218


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java:
##########
@@ -222,6 +222,14 @@ public void testPositionDeletesPartitionSpecRemoval() {
         constantsMap(posDeleteTask, partitionType).get(MetadataColumns.FILE_PATH.fieldId()));
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() {

Review Comment:
   Hm. it looks a bit long :)  In that case, Id vote to maybe bring back the original asserts then the first commit so its a more end-to-end test , https://github.com/apache/iceberg/pull/7551/commits/54b95ea8658d7db03eea717e68b09350fff13a9a  as I dont think its covered elsewhere (though realize its not related to this pr )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1191483939


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java:
##########
@@ -805,6 +807,47 @@ public void testPartitionSpecEvolutionRemoval() {
     }
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() throws IOException {

Review Comment:
   Thanks.  
   
   I think the original asserts you had before moving the test , worked pretty well?  I would think it's nice to have those.  But up to you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7551: Core: Check for all specs in partitionsTable

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7551:
URL: https://github.com/apache/iceberg/pull/7551#discussion_r1191689218


##########
core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java:
##########
@@ -222,6 +222,14 @@ public void testPositionDeletesPartitionSpecRemoval() {
         constantsMap(posDeleteTask, partitionType).get(MetadataColumns.FILE_PATH.fieldId()));
   }
 
+  @Test
+  public void testPartitionSpecEvolutionToUnpartitioned() {

Review Comment:
   Hm. it looks a bit long :)  In that case, Id vote to maybe bring back the original asserts then the first commit so its a more end-to-end test and we dont have to change the name too much , https://github.com/apache/iceberg/pull/7551/commits/54b95ea8658d7db03eea717e68b09350fff13a9a  as I dont think its covered elsewhere (though realize its not related to this pr )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org