You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/28 04:12:46 UTC

[GitHub] [doris] qzsee opened a new issue, #11282: [Proposal] Fault tolerant handling for single be node breakdown

qzsee opened a new issue, #11282:
URL: https://github.com/apache/doris/issues/11282

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   When the query is executed, the generated execution fragment is sent to the BE node. At this time, the BE node breakdown, and RPC failed will appear(`rpc failed,host:xxx` `send fragment timeout. backend id:xxx,host:xxx`). although the heartbeat of the BE node is normal. RPC failed directly causes the query failure.
   
   Although there is a blacklist mechanism, this does not necessarily solve the problem, because BE nodes with normal heartbeat will not always work
   
   In my opinion, this single point of failure should be fault-tolerant for distributed systems.
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org