You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ChristinaTech (via GitHub)" <gi...@apache.org> on 2023/04/21 16:47:26 UTC

[GitHub] [iceberg] ChristinaTech commented on issue #6422: How compaction works along side incremental read

ChristinaTech commented on issue #6422:
URL: https://github.com/apache/iceberg/issues/6422#issuecomment-1518090607

   At present, Incremental Read will use the old snapshots and files. The primary limiting factor lies in the fact that `replace` snapshots, which add and remove data files without changing the actual data and are what rewrite procedures use, do not keep close track of what files were used to create what other files and how.
   
   This means that, even if support were added for interpreting `replace` snapshots as is, their replacement files could only be used if every file removed by the replace was included in the interval of the incremental read.
   
   This could be moderately improved if `replace` snapshots stored a map of what specific files were used in the creation of what other files, but even then it still wouldn't be helpful in a lot of cases, as rewrite by default will generally end up merging files from inside the incremental read interval with files from outside the incremental read interval.
   
   I will note that it would be beneficial if Iceberg could support this behavior, as it would help mitigate the performance impact of micro-batch file ingestion on incremental reads that take place after compaction. Need to spend some time brainstorming technical solutions to the problems preventing this from happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org