You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/04/01 02:33:14 UTC

[jira] [Comment Edited] (TEZ-988) http.maxConnections needs to be configurable in Tez Fetcher & read from errorstream to make the connection reusable

    [ https://issues.apache.org/jira/browse/TEZ-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955871#comment-13955871 ] 

Rajesh Balamohan edited comment on TEZ-988 at 4/1/14 12:31 AM:
---------------------------------------------------------------

>> We read the errorstream mainly for keep-alive reuse.  So it makes sense to wrap the system.setProperty within keepalive check. (Done)
>> Added "&keepAlive=true"

>> http.maxConnections
- This refers to the number connections entries (protocol + host + port) that can be maintained in the keepAlive cache.  Default value is 5 which is very low for a large cluster.  (e.g for a cluster size of 20 or 500 nodes, maintaining only 5 connections in the pool can be very very small).  Also this is a per JVM setting (i.e All HttpURLConnection instances of the same JVM would be internally sharing the keepAliveCache in the JVM).  Ideally this value should be set equal to the number of nodes in the cluster. 



was (Author: rajesh.balamohan):
>> We read the errorstream mainly for keep-alive reuse.  So it makes sense to wrap the system.setProperty within keepalive check. (Done)
>> Added "&keepAlive=true"

>> http.maxConnections
- This refers to the number connections entries (protocol + host + port) that can be maintained in the keepAlive cache.  Default value is 5 which is very low for a large cluster.  (e.g for a cluster size of 20 or 500 nodes, maintaining only 5 connections in the pool can be very very small).  Also this is a per JVM setting (i.e All HttpURLConnection instances of the same JVM would be internally sharing the keepAliveCache in the JVM).  Ideally this value should be set equal to the number of nodes in the cluster. 

- Number of connections per host is determined by the "keep-alive: max" header.  If there is nothing specified, this defaults to 5 as per the JDK's implementation.  NodeManager's shuffle handler does not tweak this at server side.  We do not need to tweak this, as maintaing more than 1 connection to the a host from the same JVM might not be beneficial.

> http.maxConnections needs to be configurable in Tez Fetcher & read from errorstream to make the connection reusable
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-988
>                 URL: https://issues.apache.org/jira/browse/TEZ-988
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-988-v1.patch, TEZ-988-v2.patch, TEZ-988-v3.patch, TEZ-988-v4.patch, TEZ-988-v5.patch
>
>
> 1. Currently http.maxConnections is set to 5 (default).  Make this configurable in Fetcher.java.  This will help in running larger queries
> 2. ErrorStream has to be read completely in order to make the connection reusable (when keepAlive is enabled).  Currently, we do not read error stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)