You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/12 12:04:23 UTC

[GitHub] [iceberg] nastra opened a new pull request, #6411: Core: Don't produce partition summaries on unpartitioned table

nastra opened a new pull request, #6411:
URL: https://github.com/apache/iceberg/pull/6411

   Previously, having an unpartitioned table would produce a `"partitions."` entry in the snapshot summary when the partition summary limit was configured


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051927236


##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -898,7 +898,7 @@ public void testSnapshotsTable() {
                     ImmutableMap.of(
                         "added-records", "1",
                         "added-data-files", "1",
-                        "changed-partition-count", "1",
+                        "changed-partition-count", "0",

Review Comment:
   you're right. I've revisited the taken approach and updated it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683043


##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -898,7 +898,7 @@ public void testSnapshotsTable() {
                     ImmutableMap.of(
                         "added-records", "1",
                         "added-data-files", "1",
-                        "changed-partition-count", "1",
+                        "changed-partition-count", "0",

Review Comment:
   I think it is reasonable to consider the root a changed partition. If any change is made, I'd include at least 1 affected partition. I think of an "unpartitioned" table as a single partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1048106490


##########
core/src/main/java/org/apache/iceberg/SnapshotSummary.java:
##########
@@ -148,7 +148,7 @@ public void set(String property, String value) {
     }
 
     private void updatePartitions(PartitionSpec spec, ContentFile<?> file, boolean isAddition) {
-      if (trustPartitionMetrics) {
+      if (spec.isPartitioned() && trustPartitionMetrics) {

Review Comment:
   I don't think we need to do anything else. This PR is is only so that the snapshot summary doesn't produce a hashmap entry where the key is `partitions.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045769408


##########
core/src/test/java/org/apache/iceberg/TestRowDelta.java:
##########
@@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() {
     Map<String, String> summary = snapshot.summary();
 
     Assert.assertEquals(
-        "Should change 4 partitions", "4", summary.get(CHANGED_PARTITION_COUNT_PROP));
+        "Should change 3 partitions", "3", summary.get(CHANGED_PARTITION_COUNT_PROP));

Review Comment:
   the test above makes the table temporarily unpartitioned and this was wrongly counted here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #6411:
URL: https://github.com/apache/iceberg/pull/6411


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#issuecomment-1358011325

   Merged! Thanks, @nastra.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1046073562


##########
core/src/main/java/org/apache/iceberg/SnapshotSummary.java:
##########
@@ -148,7 +148,7 @@ public void set(String property, String value) {
     }
 
     private void updatePartitions(PartitionSpec spec, ContentFile<?> file, boolean isAddition) {
-      if (trustPartitionMetrics) {
+      if (spec.isPartitioned() && trustPartitionMetrics) {

Review Comment:
   How does this handle void partitions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045770751


##########
core/src/test/java/org/apache/iceberg/TestRowDelta.java:
##########
@@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() {
     Map<String, String> summary = snapshot.summary();
 
     Assert.assertEquals(
-        "Should change 4 partitions", "4", summary.get(CHANGED_PARTITION_COUNT_PROP));
+        "Should change 3 partitions", "3", summary.get(CHANGED_PARTITION_COUNT_PROP));
     Assert.assertEquals("Should add 1 data file", "1", summary.get(ADDED_FILES_PROP));
     Assert.assertEquals("Should have 4 data files", "4", summary.get(TOTAL_DATA_FILES_PROP));
     Assert.assertEquals("Should add 3 delete files", "3", summary.get(ADDED_DELETE_FILES_PROP));
     Assert.assertEquals("Should have 3 delete files", "3", summary.get(TOTAL_DELETE_FILES_PROP));
     Assert.assertEquals("Should add 3 position deletes", "3", summary.get(ADDED_POS_DELETES_PROP));
     Assert.assertEquals("Should have 3 position deletes", "3", summary.get(TOTAL_POS_DELETES_PROP));
 
-    Assert.assertTrue(
-        "Partition metrics must be correct",
-        summary.get(CHANGED_PARTITION_PREFIX).contains(ADDED_DELETE_FILES_PROP + "=1"));

Review Comment:
   this was in fact checking the entry, which we don't produce anymore:
   ```
    "partitions." -> "added-position-delete-files=1,added-delete-files=1,added-files-size=10,added-position-deletes=1"
   ```
   The summaries map also had the following entries for the 3 partitions:
   ```
    "partitions.data_bucket=0" -> "added-position-delete-files=1,added-delete-files=1,added-files-size=10,added-position-deletes=1"
    "partitions.data=abc" -> "added-position-delete-files=1,added-delete-files=1,added-files-size=10,added-position-deletes=1"
    "partitions.data=xyz" -> "added-data-files=1,added-records=1,added-files-size=10"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045769408


##########
core/src/test/java/org/apache/iceberg/TestRowDelta.java:
##########
@@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() {
     Map<String, String> summary = snapshot.summary();
 
     Assert.assertEquals(
-        "Should change 4 partitions", "4", summary.get(CHANGED_PARTITION_COUNT_PROP));
+        "Should change 3 partitions", "3", summary.get(CHANGED_PARTITION_COUNT_PROP));

Review Comment:
   the test does makes the table temporarily unpartitioned and this was wrongly counted here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1046751136


##########
core/src/main/java/org/apache/iceberg/SnapshotSummary.java:
##########
@@ -148,7 +148,7 @@ public void set(String property, String value) {
     }
 
     private void updatePartitions(PartitionSpec spec, ContentFile<?> file, boolean isAddition) {
-      if (trustPartitionMetrics) {
+      if (spec.isPartitioned() && trustPartitionMetrics) {

Review Comment:
   In v1 once a partition field is dropped, the table is considered unpartitioned. The test [here](https://github.com/apache/iceberg/blob/d350c9b8c995a2953aa8b80a0a1fc7cadc4dd16a/core/src/test/java/org/apache/iceberg/TestPartitioning.java#L66-L86) shows this behavior. The `isPartitioned()` handles this case and was also updated by the same PR: https://github.com/apache/iceberg/pull/3059.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045978833


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java:
##########
@@ -953,9 +953,9 @@ public void testDeleteWithMultipleSpecs() {
       // copy-on-write is tested against v1 and such tables have different partition evolution
       // behavior
       // that's why the number of changed partitions is 4 for copy-on-write
-      validateCopyOnWrite(currentSnapshot, "4", "4", "1");
+      validateCopyOnWrite(currentSnapshot, "3", "4", "1");
     } else {
-      validateMergeOnRead(currentSnapshot, "3", "3", null);
+      validateMergeOnRead(currentSnapshot, "2", "3", null);

Review Comment:
   the table is initially unpartitioned, hence it's 1 less in both cases here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683205


##########
core/src/test/java/org/apache/iceberg/TestRowDelta.java:
##########
@@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() {
     Map<String, String> summary = snapshot.summary();
 
     Assert.assertEquals(
-        "Should change 4 partitions", "4", summary.get(CHANGED_PARTITION_COUNT_PROP));
+        "Should change 3 partitions", "3", summary.get(CHANGED_PARTITION_COUNT_PROP));

Review Comment:
   Why was it counted?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1051683043


##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -898,7 +898,7 @@ public void testSnapshotsTable() {
                     ImmutableMap.of(
                         "added-records", "1",
                         "added-data-files", "1",
-                        "changed-partition-count", "1",
+                        "changed-partition-count", "0",

Review Comment:
   I think it is reasonable to consider the root a changed partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045878419


##########
spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -898,7 +898,7 @@ public void testSnapshotsTable() {
                     ImmutableMap.of(
                         "added-records", "1",
                         "added-data-files", "1",
-                        "changed-partition-count", "1",
+                        "changed-partition-count", "0",

Review Comment:
   the table is actually unpartitioned, so this should be 0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6411:
URL: https://github.com/apache/iceberg/pull/6411#discussion_r1047963446


##########
core/src/main/java/org/apache/iceberg/SnapshotSummary.java:
##########
@@ -148,7 +148,7 @@ public void set(String property, String value) {
     }
 
     private void updatePartitions(PartitionSpec spec, ContentFile<?> file, boolean isAddition) {
-      if (trustPartitionMetrics) {
+      if (spec.isPartitioned() && trustPartitionMetrics) {

Review Comment:
   IIRC, `.partitions` metadata table also shows unpartitioned data as null key. Do we need to handle there aswell to make it unified? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org