You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/16 03:44:50 UTC

[GitHub] [iceberg] jackye1995 commented on issue #3118: Read delete files in parallel.

jackye1995 commented on issue #3118:
URL: https://github.com/apache/iceberg/issues/3118#issuecomment-920552114


   Just to confirm, @Reo-LEI are you mostly doing this through Flink?
   
   I am asking because I think currently we have the following dilemma: delete files are mostly generated by CDC pipelines in Flink, but rewrite functionality is not yet ready for delete files, and even if it's ready it's in Spark.
   
   #2867 is another PR that tries to tackle the same root issue in a different way.
   
   We definitely need to speed up the delete compaction progress, making it the top of the top priority. On the other side,  I think we should start considering developing some actions in Flink to run compaction natively. Maybe the second compactor in streaming pipeline approach is not avoidable although a bit complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org