You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ph...@apache.org on 2018/04/06 16:17:27 UTC
[4/7] impala git commit: IMPALA-6807: [DOCS] Update the known issue
for HDFS-12528
IMPALA-6807: [DOCS] Update the known issue for HDFS-12528
Added a new recommendation for the new setting with the fix version
of HDFS, 2.10 and higher.
Change-Id: If51cb111a9ddc67be4a1cf42502a8a021486b7e4
Reviewed-on: http://gerrit.cloudera.org:8080/9929
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/ab7afa7b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/ab7afa7b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/ab7afa7b
Branch: refs/heads/master
Commit: ab7afa7b4e85d0a2c3801950f529ac6d71f9dd03
Parents: 380e17a
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Wed Apr 4 16:22:42 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Thu Apr 5 23:36:00 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_known_issues.xml | 61 +++++++++++++++++++++++---------
1 file changed, 45 insertions(+), 16 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/ab7afa7b/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index a8a8451..a09188e 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -409,25 +409,54 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
<title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
<conbody>
<p>
- If a data file used by Impala is being continuously appended or overwritten in place by an
- HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction with the
- file handle caching feature in <keyword keyref="impala210_full"/> and higher could cause
- short-circuit reads to sometimes be disabled on some DataNodes. When a mismatch is detected
- between the cached file handle and a data block that was rewritten because of an append,
- short-circuit reads are turned off on the affected host for a 10-minute period.
+ If a data file used by Impala is being continuously appended or
+ overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
+ -appendToFile</cmdname>, interaction with the file handle caching
+ feature in <keyword keyref="impala210_full"/> and higher could cause
+ short-circuit reads to sometimes be disabled on some DataNodes. When a
+ mismatch is detected between the cached file handle and a data block
+ that was rewritten because of an append, short-circuit reads are
+ turned off on the affected host for a 10-minute period.
</p>
<p>
- The possibility of encountering such an issue is the reason why the file handle caching
- feature is currently turned off by default. See <xref keyref="scalability_file_handle_cache"/>
- for information about this feature and how to enable it.
+ The possibility of encountering such an issue is the reason why the
+ file handle caching feature is currently turned off by default. See
+ <xref keyref="scalability_file_handle_cache"/> for information about
+ this feature and how to enable it.
</p>
- <p><b>Bug:</b> <xref href="https://issues.apache.org/jira/browse/HDFS-12528" scope="external" format="html">HDFS-12528</xref></p>
- <p><b>Severity:</b> High</p>
- <!-- <p><b>Resolution:</b> </p> -->
- <p><b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before enabling the file handle caching feature.
- You can set the <cmdname>impalad</cmdname> configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period
- that is shorter than the HDFS setting <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind that
- the HDFS setting is in milliseconds while the Impala setting is in seconds.)
+ <p>
+ <b>Bug:</b>
+ <xref href="https://issues.apache.org/jira/browse/HDFS-12528"
+ scope="external" format="html">HDFS-12528</xref>
+ </p>
+
+ <p>
+ <b>Severity:</b> High
+ </p>
+
+ <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
+ this issue before enabling the file handle caching feature. You can
+ set the <cmdname>impalad</cmdname> configuration option
+ <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+ that is shorter than the HDFS setting
+ <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
+ (Keep in mind that the HDFS setting is in milliseconds while the
+ Impala setting is in seconds.)
+ </p>
+
+ <p>
+ <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
+ parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
+ to specify the amount of time that short circuit reads are disabled on
+ encountering an error. The default value is 10 minutes
+ (<codeph>600</codeph> seconds). It is recommended that you set
+ <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
+ small value, such as <codeph>1</codeph> second, when using the file
+ handle cache. Setting <codeph>
+ dfs.domain.socket.disable.interval.seconds</codeph> to
+ <codeph>0</codeph> is not recommended as a non-zero interval
+ protects the system if there is a persistent problem with short
+ circuit reads.
</p>
</conbody>
</concept>