You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2022/08/23 03:23:20 UTC

[impala] branch master updated: IMPALA-11514: Workaround s3 connection timeout issues

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 8e0482294 IMPALA-11514: Workaround s3 connection timeout issues
8e0482294 is described below

commit 8e0482294975352d3d34d75adb50602d85b3c155
Author: Joe McDonnell <jo...@cloudera.com>
AuthorDate: Fri Aug 19 16:33:17 2022 -0700

    IMPALA-11514: Workaround s3 connection timeout issues
    
    When running on s3, dataload is failing with errors
    like "Timeout waiting for connection from pool". The
    underlying issue is a subtle issue in the async draining
    codepath (HADOOP-18410). As a temporary workaround, this
    adds fs.s3a.input.async.drain.threshold=512G to core-site.xml.
    This disables the async drain codepath.
    
    Testing:
     - An s3 job passed with this setting
    
    Change-Id: I08d03eb653fdcb6955340519b0cf5ba97b10d590
    Reviewed-on: http://gerrit.cloudera.org:8080/18872
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
---
 .../cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py     | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py b/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
index d3777178f..573800232 100644
--- a/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
+++ b/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
@@ -112,6 +112,9 @@ CONFIG = {
 
 if target_filesystem == 's3':
   CONFIG.update({'fs.s3a.connection.maximum': 1500})
+  # As a workaround for HADOOP-18410, set the async drain threshold to an absurdly large
+  # value to turn off the async drain codepath.
+  CONFIG.update({'fs.s3a.input.async.drain.threshold': '512G'})
   s3guard_enabled = os.environ.get("S3GUARD_ENABLED") == 'true'
   if s3guard_enabled:
     CONFIG.update({