You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/31 22:37:19 UTC

[GitHub] [iceberg] amogh-jahagirdar commented on issue #4900: spark action expireSnapshots and removeOrphanFiles block in spark local mode

amogh-jahagirdar commented on issue #4900:
URL: https://github.com/apache/iceberg/issues/4900#issuecomment-1142703277

   I think is expected as @RussellSpitzer mentioned in https://github.com/apache/iceberg/issues/4471#issuecomment-1086660475. The spark procedure has to wait until the computation of the files to actually be deleted. Currently, the results of that computation is collected to the spark driver, and deletion of the files occurs on the driver. Even if the procedure did a distributed deletion of files (not currently done due to concerns of more easily hitting rate limiting from the underlying FS), the procedure needs to first compute the files to be deleted. So if that's the bottleneck in the collectAsList, then that's expected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org