You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2022/08/23 03:23:20 UTC
[impala] branch master updated: IMPALA-11514: Workaround s3 connection timeout issues
This is an automated email from the ASF dual-hosted git repository.
joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 8e0482294 IMPALA-11514: Workaround s3 connection timeout issues
8e0482294 is described below
commit 8e0482294975352d3d34d75adb50602d85b3c155
Author: Joe McDonnell <jo...@cloudera.com>
AuthorDate: Fri Aug 19 16:33:17 2022 -0700
IMPALA-11514: Workaround s3 connection timeout issues
When running on s3, dataload is failing with errors
like "Timeout waiting for connection from pool". The
underlying issue is a subtle issue in the async draining
codepath (HADOOP-18410). As a temporary workaround, this
adds fs.s3a.input.async.drain.threshold=512G to core-site.xml.
This disables the async drain codepath.
Testing:
- An s3 job passed with this setting
Change-Id: I08d03eb653fdcb6955340519b0cf5ba97b10d590
Reviewed-on: http://gerrit.cloudera.org:8080/18872
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
---
.../cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py | 3 +++
1 file changed, 3 insertions(+)
diff --git a/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py b/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
index d3777178f..573800232 100644
--- a/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
+++ b/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
@@ -112,6 +112,9 @@ CONFIG = {
if target_filesystem == 's3':
CONFIG.update({'fs.s3a.connection.maximum': 1500})
+ # As a workaround for HADOOP-18410, set the async drain threshold to an absurdly large
+ # value to turn off the async drain codepath.
+ CONFIG.update({'fs.s3a.input.async.drain.threshold': '512G'})
s3guard_enabled = os.environ.get("S3GUARD_ENABLED") == 'true'
if s3guard_enabled:
CONFIG.update({