You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/16 02:58:01 UTC

[GitHub] [iceberg] Reo-LEI edited a comment on issue #3118: Read delete files in parallel.

Reo-LEI edited a comment on issue #3118:
URL: https://github.com/apache/iceberg/issues/3118#issuecomment-920536151


   > Are you sure that this is the right approach? It seems to me that if you have so many delete files that you need to read them in parallel, that you should rewrite and merge the delete files into data files.
   
   @rdblue I think this optimization is needed. On the one hand, user usually need to query the latest data in streaming case, And these data may not have been rewritten yet. On the other hand,  as you said, the best way is to rewrite and merge the delete files into data files, but the rewrite is depend on read. So we should optimizing the `DeleteFilter` and read delete files in parallel to speed up the read and rewrite. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org