You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Till (JIRA)" <ji...@apache.org> on 2016/08/17 21:13:20 UTC
[jira] [Updated] (ASTERIXDB-1076) False failures cause denying new
queries
[ https://issues.apache.org/jira/browse/ASTERIXDB-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till updated ASTERIXDB-1076:
----------------------------
Labels: soon (was: )
> False failures cause denying new queries
> ----------------------------------------
>
> Key: ASTERIXDB-1076
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1076
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: AsterixDB
> Reporter: Yingyi Bu
> Assignee: Michael Blow
> Priority: Critical
> Labels: soon
>
> When CPUs in the cluster are saturated for computations, the heartbeat from slave nodes to the master node might get delayed. In this case, the master node thinks a node fails, and can no longer adds the node back. Hence, the entire cluster is not usable and an instance restart is needed.
> Two things need to be fixed:
> 1. (at least) expose AsterixDB configuration parameters to allow users to set a large heartbeat threshold;
> 2. allow a node to leave and re-join a hyracks cluster.
> In the long term, we might need to investigate better liveness check strategies.
> To reproduce that issue, just let slave nodes' CPUs overloaded and you will see that.
> The exception " Asterix Cluster Global recovery is not yet complete and The system is in ACTIVE state" will be thrown for upcoming queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)