You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2019/02/07 04:47:27 UTC

[impala] 02/02: IMPALA-7265: Enable caching of remote file handles by default

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 255ec4687ebe6195b20e5566394f3692c07e3b7f
Author: Joe McDonnell <jo...@cloudera.com>
AuthorDate: Wed Feb 6 12:41:23 2019 -0800

    IMPALA-7265: Enable caching of remote file handles by default
    
    This changes the default value of cache_remote_file_handles
    from false to true. Testing shows that this setting has a
    major impact on performance for clusters that do remote HDFS
    reads. Hand testing of the cache did not reveal any problems
    with the semantics of caching remote file handles.
    
    Change-Id: I2fc4a69c6bf721017f4adcdc302db9eace5135a4
    Reviewed-on: http://gerrit.cloudera.org:8080/12387
    Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/io/disk-io-mgr.cc | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/be/src/runtime/io/disk-io-mgr.cc b/be/src/runtime/io/disk-io-mgr.cc
index cad2e65..ce56be0 100644
--- a/be/src/runtime/io/disk-io-mgr.cc
+++ b/be/src/runtime/io/disk-io-mgr.cc
@@ -127,10 +127,9 @@ DEFINE_uint64(unused_file_handle_timeout_sec, 21600, "Maximum time, in seconds,
 DEFINE_uint64(num_file_handle_cache_partitions, 16, "Number of partitions used by the "
     "file handle cache.");
 
-// Given the extra complexity of remote accesses and semantics, caching for remote HDFS
-// file handles is currently not enabled by default. This parameter enables caching
-// for remote HDFS file handles. It does not impact S3, ADLS, or ABFS file handles.
-DEFINE_bool(cache_remote_file_handles, false, "Enable the file handle cache for "
+// This parameter controls whether remote HDFS file handles are cached. It does not impact
+// S3, ADLS, or ABFS file handles. This is enabled by default.
+DEFINE_bool(cache_remote_file_handles, true, "Enable the file handle cache for "
     "remote HDFS files.");
 
 AtomicInt32 DiskIoMgr::next_disk_id_;