You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "gaborkaszab (via GitHub)" <gi...@apache.org> on 2023/02/08 12:18:00 UTC

[GitHub] [iceberg] gaborkaszab opened a new pull request, #6771: Docs: Document that partitions metadata table might show 'old' partitions

gaborkaszab opened a new pull request, #6771:
URL: https://github.com/apache/iceberg/pull/6771

   https://github.com/apache/iceberg/issues/6257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102539567


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   Thanks for the analysis @szehon-ho! I'm fine adding the proposed comment if there is consensus.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1106877480


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:

Review Comment:
   Sure, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100067028


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   I think it is better to avoid personal pronouns like `you` and `we`. Hence the above suggestion along with some minor typos fix and nits. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   I actually left a comment on the issue, now I am wondering why we don't use metadata only delete in this case, to go to a codepath where we delete the manifests entry rather than avoid writing delete files.  I wonder if its because of doing an update
   
   That being said, there may be use-cases where we delete enough of the partition using delete files in separate commit, that results in this, but I am not sure if this specific use case should result in this by design and we should put it specifically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1104399202


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   I changed the comment. Thanks for the suggestion, @szehon-ho!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#issuecomment-1434002524

   Merged, thanks @gaborkaszab , @ajantha-bhat for additional review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   Actualy I checked with @aokolnychyi , it seems the expected behavior.  "Delete From" queries may choose metadata-only delete, but "update" does not, as it is kind of an edge case.
   
   
   What do you guys think about something simpler?  Rather than mentioning specific use-case:
   
   The partitions metadata table shows partitions with data files or delete files in the current snapshot.  However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1107525871


##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -170,6 +170,8 @@ static CloseableIterable<FileScanTask> planFiles(StaticTableScan scan) {
                       scan.filter(), transformedSpec, caseSensitive);
                 });
 
+    // Note, the delete files aren't applied to the data files so if there are partition values that
+    // have been updated then both the 'old' and the 'new' values are present in the output.

Review Comment:
   Yea I was trying to say, maybe on the class itself?  (Initially thinking method, but it's a overriden one)  Anyway change looks good to me as is, up to you if you want to put it on there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   I actually left a comment on the issue, now I am wondering why we don't use metadata only delete in this case, to go to a codepath where we delete the manifests entry rather than avoid writing delete files.  I wonder if its because of doing an update
   
   That being said, there may be use-cases where we delete enough of the partition using delete files in separate commit, that results in this, but I am not sure if this specific use case should result in this by design and maybe we should not mention the use-case specifically, but rather the general scenario.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#issuecomment-1422517138

   cc @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   Actualy I checked with @aokolnychyi , it seems the expected behavior.  Delete From may choose metadata-only delete, but update does not, as it is kind of an edge case.
   
   
   What do you guys think about something simpler?  Rather than mentioning specific use-case.
   ```
   The partitions metadata table shows partitions with data files or delete files in the current snapshot.  However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102570224


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   @szehon-ho: 
   
   SGTM 👍🏻



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   I actually left a comment on the issue, now I am wondering why we don't use metadata only delete in this case, to go to a codepath where we delete the manifests entry rather than avoid writing delete files.  I wonder if its because of doing an update
   
   That being said, there may be use-cases where we delete enough of the partition using delete files in separate commit, that results in this, but I am not sure if this specific use case should result in this by design and maybe we should not put it specifically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1106265362


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:

Review Comment:
   Can you make the two notes a bullet list.  See other notes like Manifests / All Manifests



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -170,6 +170,8 @@ static CloseableIterable<FileScanTask> planFiles(StaticTableScan scan) {
                       scan.filter(), transformedSpec, caseSensitive);
                 });
 
+    // Note, the delete files aren't applied to the data files so if there are partition values that
+    // have been updated then both the 'old' and the 'new' values are present in the output.

Review Comment:
   Im actually not sure it should be inline comment, those to me seems for impl details for specific lines and maybe not read if you are just skimming.  If we need to, how about the javadoc of the class or method, which is more about the overall behavior, as is being commented here?
   
   And also if we do keep it, we should change to document the general case, as mentioned in the comment on the docs page, like:
   
   `Note, the delete files aren't applied to the data files, so partitions with 0 rows may still be present in the output.`
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1106881948


##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -170,6 +170,8 @@ static CloseableIterable<FileScanTask> planFiles(StaticTableScan scan) {
                       scan.filter(), transformedSpec, caseSensitive);
                 });
 
+    // Note, the delete files aren't applied to the data files so if there are partition values that
+    // have been updated then both the 'old' and the 'new' values are present in the output.

Review Comment:
   Understood. I removed the comments from this file for now. However, I feel that we should have a way somehow to document the behaviour of the metadata table APIs that is not Spark specific. I'm open for suggestions, though, as I don't want to duplicate the remarks on the [Spark/Queries](https://iceberg.apache.org/docs/latest/spark-queries/#partitions) page either, but in fact they might be useful in the PartitionsTable.java file as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1102121949


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   Actualy I checked with @aokolnychyi , it seems the expected behavior.  "Delete From" queries may choose metadata-only delete, but "update" does not, as it is kind of an edge case.
   
   
   What do you guys think about something simpler?  Rather than mentioning specific use-case.
   ```
   The partitions metadata table shows partitions with data files or delete files in the current snapshot.  However, delete files are not applied, and so in some cases partitions may be shown even though all their data rows are marked deleted by delete files.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho merged PR #6771:
URL: https://github.com/apache/iceberg/pull/6771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6771:
URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100063008


##########
docs/spark-queries.md:
##########
@@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions;
 Note:
 For unpartitioned tables, the partitions table will contain only the record_count and file_count columns.
 
+Note2:
+The output of the above query might differ between having copy-on-write or merge-on-read strategies. E.g. delete files with MOR strategy aren't applyied when producing the content of the partitions metadata table. As a result if you have renamed a partition (by updating the value of a partition column) then you would see both the 'old' and the 'new' one until you do a rewrite of delete/data files.

Review Comment:
   ```suggestion
   The output of the above query might differ between having copy-on-write or merge-on-read strategies. For example, delete files with merge-on-read strategy aren't applied when producing the content of the partitions metadata table. As a result, if the table has renamed the partitions (by updating the value of a partition column), then the query output would contain both the 'old' and the 'new' partition values until rewrite data files (with delete files) is executed. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org