You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2019/10/26 16:23:34 UTC

[GitHub] [incubator-doris] morningman opened a new issue #2077: Awareness of Backend down when loading data

morningman opened a new issue #2077: Awareness of Backend down when loading data
URL: https://github.com/apache/incubator-doris/issues/2077
 
 
   ## Motivation
   
   In the current implementation, one BE's crash may cause the load job being executed unable to finish and will not be cancelled automatically, and can only wait for a timeout. This is mainly  because, after the BE that is responsible for reporting the load result is down, the FE cannot  obtain the result of the load job, and thus the job job cannot be further processed.
   
   ## How to resolve
   I add a variable called `lastMissingHeartbeatTime` in Backend object. And this variable will be updated every time a heartbeat failed on that Backend. 
   
   Also, before a load job is executed, it will save the current `lastMissingHeartbeatTime` of related BE in its Coordinator. 
   
   And I modify the Coordinator's `join()` method. I divide the entire waiting process(join) into multiple rounds, with a maximum of 30 seconds per round. And after each round of waiting, check the `lastMissingHeartbeatTime` of the BE. If the `lastMissingHeartbeatTime` is larger than what we saved before(which means the BE is down during the load process), the wait is ended and the error result is returned. Otherwise, continue to the next round of waiting.
   
   ## What's next
   
   This modification can only resolve the problem of cancelling a failed load job that cannot be ended for a long time. But another problem is that in current loading framework, a BE's downtime is very likely to cause most load jobs to fail and has to be retried. This is not high available at all!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org