You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/30 01:56:40 UTC

[GitHub] [iceberg] chenjunjiedada commented on pull request #2372: Spark: add position delete row reader

chenjunjiedada commented on pull request #2372:
URL: https://github.com/apache/iceberg/pull/2372#issuecomment-809846246

Thanks for the review and comments!

The original thought is to handle equality delete and position delete respectively, which I called a different level of minor compactions. The separate compactions allow users to control the file scan more fine-grained, so as to mitigate overhead to name node. For example, users could monitor the number of equality deletes and position deletes from the snapshot summary and performs a spark or flink action to do the specific compaction.

I didn't consider reading all deleted row because I thought it is major compaction and it may similar to the action remove all deletes. If we want to support one more level compaction which read all deletes and rewrite them to position deletes I think your suggestion definitely works.

So I think it would be better to remove the logic of reading all deleted rows in this PR, and use the suggested way to implement it and also add an action for it. While I'd like to keep the current separate compaction actions for the fine-grained usage. Does that make sense to you?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org