You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2019/12/31 17:31:32 UTC

[GitHub] [incubator-dolphinscheduler] elonlo commented on issue #1658: Refactor WorkerServer

elonlo commented on issue #1658: Refactor WorkerServer
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1658#issuecomment-569963277
 
 
   Regarding the third point of Failover, my consideration is this. When the MasterServer before assigning tasks to WorkerServer, the first step is to insert the task into the DB to generate an id, then send the task id to the WorkerServer for execution. 
   
   It is assumed that the WorkerServer dies or the network overlaps after receiving the task, the MasterServer does not receive a task execution heartbeat from the WorkerServer within a certain period of time, it indicates that the task execution failed, and the MasterServer modifies the task status in the DB to a failed state.
   
   After that, if the network recovers and receives the heartbeat of the task that has been marked as failed before, the MasterServer directly sends a task termination command to the WorkerServer.
   
   If the user sets the number of retries, the task is retried in the MasterServer, and if not, an alert is send.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services