You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/27 15:17:15 UTC

[GitHub] [iceberg] bryanck opened a new pull request #3811: [Core] Set the worker pool size to be at least 2 threads

bryanck opened a new pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811


   This small PR sets the minimum size of the worker pool to be 2 threads. Many of the processes that use the pool are doing file I/O and are not necessarily CPU bound, like query planning, and often times people leave the number of Spark driver cores at the default of 1 core. This results in just one thread being used for query planning and other processes by default, which is a bit conservative.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
samredai commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1001802696


   > ...setting the value to be higher than Runtime.getRuntime().availableProcessors() might benefit some systems, but might also cause issues in other systems.
   
   How common is a system where this would cause an issue? If not very common, a default that provides a noticeable performance boost may be worth it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
bryanck commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1001801130


   Sure thanks for taking a look. FWIW I noticed Spark does something similar [here](https://github.com/apache/spark/blob/f4d6a0f27b002c0ca8726cb46530267f04254897/core/src/main/scala/org/apache/spark/rpc/netty/MessageLoop.scala#L114).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1002319751


   I'm actually for making this change.  I can't really see a downside and most systems default to a single core for driver (k8s and YARN).  I feel like any edge case where this would result in worse performance is better to leave to tuning and make the default result in better performance.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1001701634


   But you can already set that through `iceberg.worker.num-threads` right? I think it's conservative by default in purpose because setting the value to be higher than `Runtime.getRuntime().availableProcessors()` might benefit some systems, but might also cause issues in other systems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks merged pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
danielcweeks merged pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1001716648


   > Yes, you can set it, this PR is meant to set what I feel is a more sensible default for the common case where Spark driver cores is left at 1 core and the system property isn't set
   
   Yes totally agree this looks beneficial, I am just not optimistic if it is a better default for everyone while introducing a change of behavior. I will leave it to other people to comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
bryanck commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1001707607


   Yes, you can set it, this PR is meant to set what I feel is a more sensible default for the common case where Spark driver cores is left at 1 core and the system property isn't set, for those who maybe aren't as familiar with Iceberg tuning parameters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #3811: Core: Set the worker pool size to be at least 2 threads

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #3811:
URL: https://github.com/apache/iceberg/pull/3811#issuecomment-1002319936


   Thanks @bryanck !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org