You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/09 09:47:36 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue #700: Improve performance polling / task sharing mechanism in Ballista

Dandandan opened a new issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   In https://github.com/apache/arrow-datafusion/pull/698 some improvements where added to avoid sleeping when there are enough tasks around.
   
   In order to support quicker query responses (e.g. <100ms) and avoid delays in between stages we need to remove calls like `sleep(Duration::from_millis(100))` from executor/scheduler and move away from interval-based polling.
   
   This also will reduce CPU/network waste, as there are less calls being done (currently I am seeing around `1-2%` of CPU being used when idling, not much, but when going to higher polling frequencies this increases. Setting it to 1ms gives it a very high CPU usage.
   
   **Describe the solution you'd like**
   Wait on tasks to be available in the scheduler and only send a response when having tasks for the worker.
   
   Remove calls to `sleep`.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958685830


   AFAIK No @mingmwang . Feel free to work on this - that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958685830


   AFAIK No @mingmwang . Feel free to work on this - that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958685830


   AFAIK No @mingmwang . Feel free to work on this - that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958525993


   Is there any PR related to this ? If not I think I can work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958525993


   Is there any PR related to this ? If not I think I can work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-1008607240


   @mingmwang can update the status?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958525993


   Is there any PR related to this ? If not I think I can work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-890478988


   I tried to to a part of this by moving the polling loop to the scheduler, but an important prerequisite to this is moving the number of free task slots information to the scheduler.
   
   Currently this information is kept in each executor and sent with the info during a polling call (`can_accept_task`).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-958685830


   AFAIK No @mingmwang . Feel free to work on this - that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp closed issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
houqp closed issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on issue #700:
URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-1010665718


   Hi @Dandandan, recently we have implemented an initial version of push-based task scheduling. Here's the design document. Could you help have a review?
   https://docs.google.com/document/d/1Z1GO2A3bo7M_N26w_5t-9h3AhIPC2Huoh0j8jgwFETk/edit?usp=sharing
   
   PR is ongoing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org