You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2019/07/30 15:23:00 UTC

[jira] [Resolved] (IMPALA-8339) Coordinator should be more resilient to fragment instances startup failure

     [ https://issues.apache.org/jira/browse/IMPALA-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Tauber-Marshall resolved IMPALA-8339.
--------------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.3.0

> Coordinator should be more resilient to fragment instances startup failure
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-8339
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8339
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Distributed Exec
>            Reporter: Michael Ho
>            Assignee: Thomas Tauber-Marshall
>            Priority: Critical
>              Labels: Availability, resilience
>             Fix For: Impala 3.3.0
>
>
> Impala currently relies on statestore for cluster membership. When an Impala executor goes offline, it may take a while for statestore to declare that node as unavailable and for that information to be propagated to all coordinator nodes. Within this window, some coordinator nodes may still attempt to issue RPCs to the faulty node, resulting in RPC failures which resulted in query failures. In other words, many queries may fail to start within this window until all coordinator nodes get the latest information on cluster membership.
> Going forward, coordinator may need to fall back to using backup executors for each fragments in case some of the executors are not available. Moreover, *coordinator should treat the cluster membership information from statestore (or any external source of truth e.g. etcd) as hints instead of ground truth* and adjust the scheduling of fragment instances based on the availability of the executors from the coordinator's perspective.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)