You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2019/06/27 11:52:45 UTC
[impala] 01/07: IMPALA-8341: [DOCS] Describe the setting for remote
data caching
This is an automated email from the ASF dual-hosted git repository.
csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit e29b387ea10739e78075bac8170e45722d4b9940
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Jun 25 08:41:23 2019 -0700
IMPALA-8341: [DOCS] Describe the setting for remote data caching
Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
Reviewed-on: http://gerrit.cloudera.org:8080/13724
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Michael Ho <kw...@cloudera.com>
---
docs/impala.ditamap | 4 +-
docs/topics/impala_data_cache.xml | 85 +++++++++++++++++++++++++++++++++++++++
2 files changed, 87 insertions(+), 2 deletions(-)
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index ed69762..554430f 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -291,10 +291,10 @@ under the License.
<topicref href="topics/impala_perf_resources.xml"/>
<topicref rev="2.5.0" href="topics/impala_runtime_filtering.xml"/>
<topicref href="topics/impala_perf_hdfs_caching.xml"/>
+ <topicref href="topics/impala_perf_skew.xml"/>
+ <topicref href="topics/impala_data_cache.xml"/>
<topicref href="topics/impala_perf_testing.xml"/>
<topicref href="topics/impala_explain_plan.xml"/>
- <topicref href="topics/impala_perf_skew.xml"/>
- <topicref audience="hidden" href="topics/impala_perf_ddl.xml"/>
</topicref>
<topicref href="topics/impala_scalability.xml">
<topicref href="topics/impala_scaling_limits.xml"/>
diff --git a/docs/topics/impala_data_cache.xml b/docs/topics/impala_data_cache.xml
new file mode 100644
index 0000000..e572041
--- /dev/null
+++ b/docs/topics/impala_data_cache.xml
@@ -0,0 +1,85 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="data_cache">
+
+ <title>Data Cache for Remote Reads</title>
+
+ <conbody>
+
+ <p>
+ When Impala compute nodes and its storage are not co-located, the network bandwidth
+ requirement goes up as the network traffic includes the data fetch as well as the
+ shuffling exchange traffic of intermediate results.
+ </p>
+
+ <p>
+ To mitigate the pressure on the network, you can enable the compute nodes to cache the
+ working set read from remote filesystems, such as, remote HDFS data node, S3, ABFS, ADLS.
+ </p>
+
+ <p>
+ To enable remote data cache, set the <codeph>--data_cache</codeph> Impala Daemon start-up
+ flag as below:
+ </p>
+
+<codeblock>--data_cache=<varname>dir1</varname>,<varname>dir2</varname>,<varname>dir3</varname>,...:<varname>quota</varname></codeblock>
+
+ <p>
+ The flag is set to a list of directories, separated by <codeph>,</codeph>, followed by a
+ <codeph>:</codeph>, and a capacity <codeph><varname>quota</varname></codeph> per
+ directory.
+ </p>
+
+ <p>
+ If set to an empty string, data caching is disabled.
+ </p>
+
+ <p>
+ Cached data is stored in the specified directories.
+ </p>
+
+ <p>
+ The specified directories must exist in the local filesystem of each Impala Daemon.
+ </p>
+
+ <p>
+ In addition, the filesystem which the directory resides in must support hole punching.
+ </p>
+
+ <p>
+ The cache can consume up to the <codeph>quota</codeph> bytes for each of the directories
+ specified.
+ </p>
+
+ <p>
+ The default setting for <codeph>--data_cache</codeph> is an empty string.
+ </p>
+
+ <p>
+ For example, with the following setting, the data cache may use up to 2 TB, with 1 TB max
+ in <codeph>/data/0</codeph> and <codeph>/data/1</codeph> respectively.
+ </p>
+
+<codeblock>--data_cache=/data/0,/data/1:1TB</codeblock>
+
+ </conbody>
+
+</concept>