You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2022/08/19 23:30:00 UTC

[jira] [Created] (IMPALA-11514) Workaround s3 functionality issue in HADOOP-18410

Joe McDonnell created IMPALA-11514:
--------------------------------------

             Summary: Workaround s3 functionality issue in HADOOP-18410
                 Key: IMPALA-11514
                 URL: https://issues.apache.org/jira/browse/IMPALA-11514
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 4.2.0
            Reporter: Joe McDonnell
            Assignee: Joe McDonnell


When testing on s3, we see dataload fail when trying to load testcases:
{noformat}
12:00:17 Creating tpcds testcase data (logging to /data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/logs/data_loading/create-tpcds-testcase-data.log)... 
12:00:17     FAILED (Took: 0 min 13 sec)
12:00:30     '/data/jenkins/workspace/impala-asf-master-core-s3/repos/Impala/testdata/bin/create-tpcds-testcase-files.sh' failed. Tail of log:
12:00:30  order by t_s_secyear.customer_id
12:00:30          ,t_s_secyear.customer_first_name
12:00:30          ,t_s_secyear.customer_last_name
12:00:30          ,t_s_secyear.customer_email_address
12:00:30 limit 100
12:00:30 Query submitted at: 2022-08-18 12:00:25 (Coordinator: http://hostname:25000)
12:00:30 ERROR: AnalysisException: getFileStatus on s3a://bucketname/test-warehouse/tpcds-testcase-data: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
12:00:30 CAUSED BY: InterruptedIOException: getFileStatus on s3a://bucketname/test-warehouse/tpcds-testcase-data: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
12:00:30 CAUSED BY: SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
12:00:30 CAUSED BY: ConnectionPoolTimeoutException: Timeout waiting for connection from pool{noformat}
This has been tracked down to https://issues.apache.org/jira/browse/HADOOP-18410

A temporary workaround is to specify fs.s3a.input.async.drain.threshold=512G in core-site.xml.

We should work around this issue until the fix arrives.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)