You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "kingwind94 (via GitHub)" <gi...@apache.org> on 2023/01/30 08:15:19 UTC

[GitHub] [iceberg] kingwind94 commented on issue #6694: position delete manifest lower_bounds/upper_bounds not correct

kingwind94 commented on issue #6694:
URL: https://github.com/apache/iceberg/issues/6694#issuecomment-1408162400

   > This is because these metrics were truncated, Iceberg's default metrics mode for column metric is `truncate(16)`. This should be fixed by #6313. I think it doesn't cause correctness problems, but it does cause more pos delete files to be scanned because the filtering is less effective.
   
   Thx, you are right! I apply #6613 to the flink 1.12 iceberg connector and it also works. The position delete lower_bounds/upper_bounds now keeps full correct path.
   
   Moreover, this problem wont affect correctness issues, but it will fail the rewrite commit validation. I use flink to write data to iceberg and use spark to rewrite small data files and delete files, the problem is every time flink commit a snapshot it will hinder the  concurrent rewrite operation beacuse of new flink-added position deletes.  But flink's new added position deletes should only appy to the new added data files, not history (rewritting) data files, so this should not hinder the rewrite operation.
   The reason is that DeleteFileIndex.canContainPosDeletesForFile() compares dataFile.path() with posiotion delete's file_path lower_bounds and upper_bounds, which are truncated previous, and this method would always return true for any new position delete, and then hinder the rewrite operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org