You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/20 00:33:23 UTC

[GitHub] [iceberg] rdblue commented on pull request #2303: Core: Remove all delete files in RewriteFiles action.

rdblue commented on pull request #2303:
URL: https://github.com/apache/iceberg/pull/2303#issuecomment-844589947


   Sorry for the delay. I'm back from parental leave now.
   
   I agree with @RussellSpitzer's comments on this. I don't think that we can remove delete files just because data files were rewritten. We need to ensure that there are no data files that are still referenced by the delete files. This is probably going to require some work and may require reading the delete file. We can use the `_file` stats in some cases but we need to be careful.
   
   For now, I'd recommend letting the sequence numbers handle this. Because rewriting files will move them to a newer sequence number, the delete file won't be added to the new file's scan task when reading. It would still be considered during job planning, but I think that is okay and we don't need to aggressively drop them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org