You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/30 05:29:19 UTC

[GitHub] [iceberg] kbendick opened a new issue #4007: Document `max_concurrent_deletes` parameter for the Spark procedures `remove_orphan_files` and `expire_snapshots`

kbendick opened a new issue #4007:
URL: https://github.com/apache/iceberg/issues/4007


   As of Iceberg 0.13.0, the Spark stored procedures `remove_orphan_files` and `expire_snapshots` both support controlling the parallelism of deletes via a parameter `max_concurrent_deletes`.
   
    By default, without this, the files of each action are deleted serially in the current thread.
   
   This parameter is used to instantiate a thread pool of the given size, which is passed to the method `executeDeleteWith(ExecutorService executorService)`, causing the deletes to take place in a work thread pool.
   
   Without using this, the file deletes can take significantly longer and we should add documentation for them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #4007: Document `max_concurrent_deletes` parameter for the Spark procedures `remove_orphan_files` and `expire_snapshots`

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #4007:
URL: https://github.com/apache/iceberg/issues/4007


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4007: Document `max_concurrent_deletes` parameter for the Spark procedures `remove_orphan_files` and `expire_snapshots`

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4007:
URL: https://github.com/apache/iceberg/issues/4007#issuecomment-1025074087


   Here's a link to the parameters of the RemoveOrphanFilesProcedure action: https://github.com/apache/iceberg/blob/7249d67c5b5215bd04d51d446de12934d2efcbae/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RemoveOrphanFilesProcedure.java#L52-L58


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4007: Document `max_concurrent_deletes` parameter for the Spark procedures `remove_orphan_files` and `expire_snapshots`

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4007:
URL: https://github.com/apache/iceberg/issues/4007#issuecomment-1025073949


   I'm working on a PR for this right now. 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick removed a comment on issue #4007: Document `max_concurrent_deletes` parameter for the Spark procedures `remove_orphan_files` and `expire_snapshots`

Posted by GitBox <gi...@apache.org>.
kbendick removed a comment on issue #4007:
URL: https://github.com/apache/iceberg/issues/4007#issuecomment-1025073949


   I'm working on a PR for this right now. 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org