You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/05/14 18:07:00 UTC

[jira] [Resolved] (IMPALA-6907) ImpalaServer::MembershipCallback() may not remove all stale connections to disconnected Impalad nodes

     [ https://issues.apache.org/jira/browse/IMPALA-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Ho resolved IMPALA-6907.
--------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> ImpalaServer::MembershipCallback() may not remove all stale connections to disconnected Impalad nodes
> -----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6907
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6907
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Currently, {{ImpalaServer::MembershipCallback()}} will remove stale connections to hosts which were removed from the cluster membership.
> {noformat}
>       while (loc_entry != query_locations_.end()) {
>         if (current_membership.find(loc_entry->first) == current_membership.end()) {
>           unordered_set<TUniqueId>::const_iterator query_id = loc_entry->second.begin();
>           // Add failed backend locations to all queries that ran on that backend.
>           for(; query_id != loc_entry->second.end(); ++query_id) {
>             vector<TNetworkAddress>& failed_hosts = queries_to_cancel[*query_id];
>             failed_hosts.push_back(loc_entry->first);
>           }
>           exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); <<<-----
> {noformat}
> However, it's relies on checking against {{query_locations_}} which is populated only when the Impalad node acts as a coordinator and currently running queries using the disconnected backend. So {{ImpalaServer::MembershipCallback()}} will not reliably remove stale connections to hosts removed from cluster. This may cause stale connections to stay in connection cache for extended period of time, leading to query failure after the removed hosts rejoined the cluster as the stale connections are used.
> Instead, we should remove stale connections regardless of whether this node happens to be currently coordinating a query using that backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)