You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by "Yingyi Bu (JIRA)" <ji...@apache.org> on 2015/08/18 19:55:45 UTC
[jira] [Created] (ASTERIXDB-1076) False failures triggers denying
new queries
Yingyi Bu created ASTERIXDB-1076:
------------------------------------
Summary: False failures triggers denying new queries
Key: ASTERIXDB-1076
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1076
Project: Apache AsterixDB
Issue Type: Bug
Components: AsterixDB
Reporter: Yingyi Bu
Priority: Critical
When CPUs in the cluster are saturated for computations, the heartbeat from slave nodes to the master node might get delayed. In this case, the master node thinks a node fails, and can no longer adds the node back. Hence, the entire cluster is not usable and an instance restart is needed.
Two things need to be fixed:
1. (at least) expose AsterixDB configuration parameters to allow users to set a large heartbeat threshold;
2. allow a node to leave and re-join a hyracks cluster.
In the long term, we might need to investigate better liveness check strategies.
To reproduce that issue, just let slave nodes' CPUs overloaded and you will see that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)