You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2017/12/06 01:56:00 UTC

[3/8] impala git commit: IMPALA-6232: Disable file handle cache by default

IMPALA-6232: Disable file handle cache by default

There are scenarios where HDFS file appends or HDFS file
overwrites can lead to HDFS disabling short circuit reads.
Since this can be a performance regression, this changes
the default value for max_cached_file_handles to 0 to
disable the file handle cache by default. This also changes
the default value for unused_file_handle_timeout_sec to 270.
If users enable the file handle cache, this setting will
prevent some of the scenarios that disable short circuit
reads.

Ran existing file handle cache tests to verify that there
is no impact.

Change-Id: Iea7f943f63b72b42286a9e8b9987308baa79d7b0
Reviewed-on: http://gerrit.cloudera.org:8080/8750
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/1f1bff8e
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/1f1bff8e
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/1f1bff8e

Branch: refs/heads/master
Commit: 1f1bff8e8d35b66308a1e865cdc8bce41ce89873
Parents: e4a2f5d
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Mon Dec 4 10:21:33 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Tue Dec 5 21:03:00 2017 +0000

----------------------------------------------------------------------
 be/src/runtime/io/disk-io-mgr.cc | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/1f1bff8e/be/src/runtime/io/disk-io-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/io/disk-io-mgr.cc b/be/src/runtime/io/disk-io-mgr.cc
index 4f4074c..668fc75 100644
--- a/be/src/runtime/io/disk-io-mgr.cc
+++ b/be/src/runtime/io/disk-io-mgr.cc
@@ -94,7 +94,10 @@ DEFINE_int32(max_free_io_buffers, 128,
 // uses about 6kB of memory. 20k file handles will thus reserve ~120MB of memory.
 // The actual amount of memory that is associated with a file handle can be larger
 // or smaller, depending on the replication factor for this file or the path name.
-DEFINE_uint64(max_cached_file_handles, 20000, "Maximum number of HDFS file handles "
+// TODO: This is currently disabled due to HDFS-12528, which can disable short circuit
+// reads when file handle caching is enabled. This should be reenabled by default
+// when that issue is fixed.
+DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file handles "
     "that will be cached. Disabled if set to 0.");
 
 // The unused file handle timeout specifies how long a file handle will remain in the
@@ -106,7 +109,10 @@ DEFINE_uint64(max_cached_file_handles, 20000, "Maximum number of HDFS file handl
 // from being freed. When the metadata sees that a file has been deleted, the file handle
 // will no longer be used by future queries. Aging out this file handle allows the
 // disk space to be freed in an appropriate period of time.
-DEFINE_uint64(unused_file_handle_timeout_sec, 21600, "Maximum time, in seconds, that an "
+// TODO: HDFS-12528 (which can disable short circuit reads) is more likely to happen
+// if file handles are cached for longer than 5 minutes. Use a conservative value for
+// the unused file handle cache timeout until HDFS-12528 is fixed.
+DEFINE_uint64(unused_file_handle_timeout_sec, 270, "Maximum time, in seconds, that an "
     "unused HDFS file handle will remain in the file handle cache. Disabled if set "
     "to 0.");