You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/07 06:36:57 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new issue #1936: [Ballista] Introduce TaskSet and TaskSetStore for managing tasks of one stage

yahoNanJing opened a new issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*) -->
   
   It's better to manage tasks by stages rather than mess them up for the whole system or the whole job. With TaskSetStore, it will be much easier to maintain the task status changes within a stage, which will also be easier for future task error handling. And it will also be easier to manage the stage status changing and error handling.
   
   **Describe the solution you'd like**
   <!-- A clear and concise description of what you want to happen. -->
   
   - Introduce TaskSet for a bunch of tasks for a stage.
   - Introduce TaskSetStore for managing the task status changing within a TaskSet.
   
   With this change, it will be much efficient to fetch a task to be scheduled from some TaskSetStore. And it also makes it possible to schedule tasks by the priority of stages.
   
   **Additional context**
   <!-- Add any other context or screenshots about the feature request here. -->
   #1704 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing edited a comment on issue #1936: [Ballista] Introduce StageManager for managing tasks stage by stage

Posted by GitBox <gi...@apache.org>.
yahoNanJing edited a comment on issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936#issuecomment-1063121300


   There are three levels of job state: task -> stage -> job. 
   
   For the state machine of task:
   ![task_status_state_machine](https://user-images.githubusercontent.com/90197956/157489437-ccf1a191-b35d-4e50-95a4-9f7a0ccf38b3.png)
   
   
   
   For the state machine of stage:
   ![stage_state_machine](https://user-images.githubusercontent.com/90197956/157489461-bad483dd-b253-49a0-8243-58b90b46aa31.png)
   
   
   
   For the state machine of job:
   ![job_state_machine](https://user-images.githubusercontent.com/90197956/157489487-49812fa4-cd6a-446b-ade5-4f09bfee6145.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing edited a comment on issue #1936: [Ballista] Introduce StageManager for managing tasks stage by stage

Posted by GitBox <gi...@apache.org>.
yahoNanJing edited a comment on issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936#issuecomment-1064399582


   Sequential diagram for Pull-based task scheduling:
   ![Pull-based Task Scheduling](https://user-images.githubusercontent.com/90197956/157738495-f9104052-b59c-4af1-9897-392279df9c24.png)
   
   Sequential diagram for Push-based task scheduling:
   ![Push-based Task Scheduling](https://user-images.githubusercontent.com/90197956/157737074-ced1ffc1-8f7a-4b24-b182-86b7429afba9.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen closed issue #1936: [Ballista] Introduce StageManager for managing tasks stage by stage

Posted by GitBox <gi...@apache.org>.
yjshen closed issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on issue #1936: [Ballista] Introduce StageManager for managing tasks stage by stage

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936#issuecomment-1064399582


   Sequential diagram for Pull-based task scheduling:
   ![Pull-based Task Scheduling](https://user-images.githubusercontent.com/90197956/157736938-360b3e3d-32ed-431e-85f0-d26227369819.png)
   
   Sequential diagram for Push-based task scheduling:
   ![Push-based Task Scheduling](https://user-images.githubusercontent.com/90197956/157737074-ced1ffc1-8f7a-4b24-b182-86b7429afba9.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on issue #1936: [Ballista] Introduce StageManager for managing tasks stage by stage

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on issue #1936:
URL: https://github.com/apache/arrow-datafusion/issues/1936#issuecomment-1063121300


   There are three levels of job state: task -> stage -> job. 
   
   For the state machine of task:
   ![task_status_state_machine](https://user-images.githubusercontent.com/90197956/157484955-e82f76c4-9442-4420-ab9f-d5d3f86bad80.png)
   
   For the state machine of stage:
   ![stage_state_machine](https://user-images.githubusercontent.com/90197956/157485023-f14fecc2-e14d-4048-bfe0-d9d1028aed3c.png)
   
   For the state machine of job:
   ![job_state_machine](https://user-images.githubusercontent.com/90197956/157485082-cebd80cd-d816-4793-8473-f9b5e6371bc6.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org