You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/11/14 19:30:00 UTC

[jira] [Commented] (IMPALA-4069) Introduce startup option to create and cache backend connections on startup

    [ https://issues.apache.org/jira/browse/IMPALA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687018#comment-16687018 ] 

Michael Ho commented on IMPALA-4069:
------------------------------------

Now that IMPALA-7213 and IMPALA-4063 are fixed, the number of connections should scale linearly with # of hosts instead of (# of hosts x # of query fragments per host). May still be good to study whether this is still an issue in a Kerberos enabled cluster or should we eagerly do a staggered warm up in the Impala cluster to pre-create the connections ?

> Introduce startup option to create and cache backend connections on startup
> ---------------------------------------------------------------------------
>
>                 Key: IMPALA-4069
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4069
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>              Labels: scalability
>
> Add impalad startup flag specifying the number of connections per backend to create and cache. 
> After startup impala-server.backends.client-cache.total-clients should reflect number of backends x cached connections per backend. 
> [~jyu@cloudera.com] description of the problem
> {code}
> Internal Impala network connections between nodes for query execution are not multiplexed. This means as the number of queries increase the number of network connections increases between Impala executors. With higher #nodes, the combination of query bursts and number of executors can lead to lots of new connections attempts. For example, a query with 10+joins on a 100-node cluster could require 1000+ connections simultaneously on coordinator.  When the spike is too high or if there is not sufficient CPU available to handle the bursts, this causes connection failures. 
> The total number of connections does not seem to be the issue, but there is currently a practical limit on the number of simultaneous new concurrent connection TCP request spikes at once. 
> Impala caches backend connections and reuse them later. With cache, the simultaneous spikes of new connection request is only those above previous established maximum.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org