You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/08 01:35:30 UTC

[GitHub] [iceberg] vincentpoon opened a new issue, #5936: Manifest partition stats include partition values from deleted files

vincentpoon opened a new issue, #5936:
URL: https://github.com/apache/iceberg/issues/5936

   ### Apache Iceberg version
   
   0.14.0
   
   ### Query engine
   
   Trino
   
   ### Please describe the bug 🐞
   
   With a simple table partitioned by one identity column, e.g. `partCol int`
   Insert 2 rows, where partCol=1 and partCol=2
   Now DELETE where partCol=2
   
   The manifest list will show that the new manifest  partition stats for `partCol` are lower=1 upper=2
   That is, it incorporates the stats for all files in the manifest, including those with `status: 2` (i.e. deleted file)
   
   This has query perf implications in that a query `SELECT * FROM table WHERE partCol=2` must still read the manifest.
   
   It is only after the manifest is further rewritten, when the `status: 2` file is evicted, that the column stats are accurately reflected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5936: Manifest partition stats include partition values from deleted files

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5936:
URL: https://github.com/apache/iceberg/issues/5936#issuecomment-1504306594

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #5936: Manifest partition stats include partition values from deleted files

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #5936:
URL: https://github.com/apache/iceberg/issues/5936#issuecomment-1275020789

   I don't think this is a bug. The behavior should be correct, right? Changing this behavior would be a performance improvement?
   
   I'm not sure that we need to do this. I'd rather include a mode where we simply drop files that were deleted rather than keeping references to them. That would achieve the same goal, but would use less space for metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] vincentpoon commented on issue #5936: Manifest partition stats include partition values from deleted files

Posted by GitBox <gi...@apache.org>.
vincentpoon commented on issue #5936:
URL: https://github.com/apache/iceberg/issues/5936#issuecomment-1278209330

   @rdblue  Hmm I guess it depends on what "correct" behavior means here, but if the partition stats reflect values that can never be returned in a query (because the files containing those values have been deleted), then that seems incorrect to me.
   
   And changing the behavior would be a perf improvement, particularly when the manifests are quite large, as they are in our use case.  Filtering using the partition stats at the manifest list level means certain manifests don't have to be read.  With incorrect partition stats, the manifests are read even when they don't have any files that can answer the query.
   
   Agree that a mode to simply drop files rather than keep references would solve the problem.  But then I would ask, what's the functionality of keeping around deleted files in the manifests with "Status: 2" (deleted) ?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5936: Manifest partition stats include partition values from deleted files

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #5936: Manifest partition stats include partition values from deleted files
URL: https://github.com/apache/iceberg/issues/5936


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5936: Manifest partition stats include partition values from deleted files

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5936:
URL: https://github.com/apache/iceberg/issues/5936#issuecomment-1524257835

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org