You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "chenjunjiedada (via GitHub)" <gi...@apache.org> on 2023/04/29 00:32:41 UTC

[GitHub] [iceberg] chenjunjiedada commented on pull request #5760: Core: Add minimum data sequence number to ManifestEntry

chenjunjiedada commented on PR #5760:
URL: https://github.com/apache/iceberg/pull/5760#issuecomment-1528310117

   > Did I get it correctly? It only applies to position deletes?
   
   Correct.
   
   >My primary worry is that this would require a spec change and quite a bit of code to populate the new value. For instance, we currently only track file names when writing position deletes. After this, we would have to project and keep track of the sequence number per each referenced data file. Even after all of that, we can still get false positives.
   
   Yes,  it does need a field as added in this PR.  It may need to track the sequence number of reference data files in spark MoR mode since we could populate the null value at first and populate the correct value in a later rewrite action. But anyway, it does get a false positive as you mentioned. While in the Flink upsert case, it always has a lazy value and thus no false positive problem. 
   
   >I am currently working on an alternative planning for position deletes in Spark, where I want to open files in a distributed manner and squash them into a bitmap per data file. This would give us a reliable way to check if delete files apply and would also avoid the need to open the same delete file multiple times for different data files.
   
   Sounds cool and promising, look forward to it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org