You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2019/06/27 11:52:45 UTC

[impala] 01/07: IMPALA-8341: [DOCS] Describe the setting for remote data caching

This is an automated email from the ASF dual-hosted git repository.

csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit e29b387ea10739e78075bac8170e45722d4b9940
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Jun 25 08:41:23 2019 -0700

    IMPALA-8341: [DOCS] Describe the setting for remote data caching
    
    Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
    Reviewed-on: http://gerrit.cloudera.org:8080/13724
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Michael Ho <kw...@cloudera.com>
---
 docs/impala.ditamap               |  4 +-
 docs/topics/impala_data_cache.xml | 85 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index ed69762..554430f 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -291,10 +291,10 @@ under the License.
     <topicref href="topics/impala_perf_resources.xml"/>
     <topicref rev="2.5.0" href="topics/impala_runtime_filtering.xml"/>
     <topicref href="topics/impala_perf_hdfs_caching.xml"/>
+    <topicref href="topics/impala_perf_skew.xml"/>
+    <topicref href="topics/impala_data_cache.xml"/>
     <topicref href="topics/impala_perf_testing.xml"/>
     <topicref href="topics/impala_explain_plan.xml"/>
-    <topicref href="topics/impala_perf_skew.xml"/>
-    <topicref audience="hidden" href="topics/impala_perf_ddl.xml"/>
   </topicref>
   <topicref href="topics/impala_scalability.xml">
     <topicref href="topics/impala_scaling_limits.xml"/>
diff --git a/docs/topics/impala_data_cache.xml b/docs/topics/impala_data_cache.xml
new file mode 100644
index 0000000..e572041
--- /dev/null
+++ b/docs/topics/impala_data_cache.xml
@@ -0,0 +1,85 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="data_cache">
+
+  <title>Data Cache for Remote Reads</title>
+
+  <conbody>
+
+    <p>
+      When Impala compute nodes and its storage are not co-located, the network bandwidth
+      requirement goes up as the network traffic includes the data fetch as well as the
+      shuffling exchange traffic of intermediate results.
+    </p>
+
+    <p>
+      To mitigate the pressure on the network, you can enable the compute nodes to cache the
+      working set read from remote filesystems, such as, remote HDFS data node, S3, ABFS, ADLS.
+    </p>
+
+    <p>
+      To enable remote data cache, set the <codeph>--data_cache</codeph> Impala Daemon start-up
+      flag as below:
+    </p>
+
+<codeblock>--data_cache=<varname>dir1</varname>,<varname>dir2</varname>,<varname>dir3</varname>,...:<varname>quota</varname></codeblock>
+
+    <p>
+      The flag is set to a list of directories, separated by <codeph>,</codeph>, followed by a
+      <codeph>:</codeph>, and a capacity <codeph><varname>quota</varname></codeph> per
+      directory.
+    </p>
+
+    <p>
+      If set to an empty string, data caching is disabled.
+    </p>
+
+    <p>
+      Cached data is stored in the specified directories.
+    </p>
+
+    <p>
+      The specified directories must exist in the local filesystem of each Impala Daemon.
+    </p>
+
+    <p>
+      In addition, the filesystem which the directory resides in must support hole punching.
+    </p>
+
+    <p>
+      The cache can consume up to the <codeph>quota</codeph> bytes for each of the directories
+      specified.
+    </p>
+
+    <p>
+      The default setting for <codeph>--data_cache</codeph> is an empty string.
+    </p>
+
+    <p>
+      For example, with the following setting, the data cache may use up to 2 TB, with 1 TB max
+      in <codeph>/data/0</codeph> and <codeph>/data/1</codeph> respectively.
+    </p>
+
+<codeblock>--data_cache=/data/0,/data/1:1TB</codeblock>
+
+  </conbody>
+
+</concept>