You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Jim Apple <jb...@cloudera.com> on 2017/11/08 17:12:14 UTC

S3 connections

http://impala.apache.org/docs/build/html/topics/impala_s3.html
recommends "Set the safety valve fs.s3a.connection.maximum to 1500 for
impalad." For best performance, should this be increased for nodes
with very high CPU, RAM, or bandwidth? Or decreased for less-beefy
nodes?

Re: S3 connections

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
It should be safe to apply this setting to all machine sizes.
This setting is mostly to workaround S3 connector timeouts failures that
look like the one below.

The default value is too low to reliably run single user queries.

I1227 19:29:41.471863  1490 AmazonHttpClient.java:496] Unable to execute
HTTP request: Timeout waiting for connection from pool
Java exception follows:
com.cloudera.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout
waiting for connection from pool
at com.cloudera.org.apache.http.impl.conn.PoolingClientConnectionManager
.leaseConnection(PoolingClientConnectionManager.java:232)
at com.cloudera.org.apache.http.impl.conn.PoolingClientConnectionManager
$1.getConnection(PoolingClientConnectionManager.java:199)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.cloudera.com.amazonaws.http.conn.ClientConnectionRequestFactory
$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.cloudera.com.amazonaws.http.conn.$Proxy21.getConnection(Unknown
Source)
at com.cloudera.org.apache.http.impl.client.DefaultRequestDirector.execute(
DefaultRequestDirector.java:456)
at com.cloudera.org.apache.http.impl.client.AbstractHttpClient.execute(
AbstractHttpClient.java:906)
at com.cloudera.org.apache.http.impl.client.AbstractHttpClient.execute(
AbstractHttpClient.java:805)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(
AmazonHttpClient.java:728)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(
AmazonHttpClient.java:489)
at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(
AmazonHttpClient.java:310)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.
invoke(AmazonS3Client.java:3785)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1050)
at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(
AmazonS3Client.java:1027)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(
S3AFileSystem.java:913)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:394)



On Wed, Nov 8, 2017 at 9:12 AM, Jim Apple <jb...@cloudera.com> wrote:

> http://impala.apache.org/docs/build/html/topics/impala_s3.html
> recommends "Set the safety valve fs.s3a.connection.maximum to 1500 for
> impalad." For best performance, should this be increased for nodes
> with very high CPU, RAM, or bandwidth? Or decreased for less-beefy
> nodes?
>