You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/02 17:14:34 UTC

[GitHub] [iceberg] RussellSpitzer commented on pull request #4588: Spark: Add custom metric for number of deletes applied by a SparkScan

RussellSpitzer commented on PR #4588:
URL: https://github.com/apache/iceberg/pull/4588#issuecomment-1145107100

   > > I think we would also really benefit from @flyrain doing a full review on this as well especially now that we have the "markDelete" pathway as well. I assume for that we probably will just skip counting deletes since we don't really care.
   > 
   > @RussellSpitzer I saw that `Deletes` and `DeleteFilter` have changed in master since I updated this PR (so this PR no longer merges), but I haven't had the time to understand the changes and how to reconcile this PR with them. I don't understand your statement that "we probably will just skip counting deletes since we don't really care." Are you saying that this PR is no longer worthwhile??
   
   No! I think this PR is very important, there is just now a second read path with `_is_deleted` metadata column which actually returns deleted rows. When we follow that path I'm not sure it's important for us to count the deleted rows since we'll be returning them anyway, I could go either way. For the path when we don't actually return deleted rows (the normal read path) we definitely want this metric!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org