You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by tm...@apache.org on 2019/02/13 03:38:57 UTC

[impala] 01/04: IMPALA-7214: [DOCS] More on decoupling impala and DataNodes

This is an automated email from the ASF dual-hosted git repository.

tmarshall pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5b32a0d60110be7c21184819c2dffbb7cbff750f
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Feb 12 12:40:42 2019 -0800

    IMPALA-7214: [DOCS] More on decoupling impala and DataNodes
    
    Change-Id: I4b6f1c704c1e328af9f0beec73f8b6b61fba992e
    Reviewed-on: http://gerrit.cloudera.org:8080/12457
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
 docs/topics/impala_processes.xml       | 10 +++------
 docs/topics/impala_troubleshooting.xml | 39 +++++++++++++++++-----------------
 2 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/docs/topics/impala_processes.xml b/docs/topics/impala_processes.xml
index 71986d3..70366dd 100644
--- a/docs/topics/impala_processes.xml
+++ b/docs/topics/impala_processes.xml
@@ -55,10 +55,7 @@ under the License.
         Start one instance of the Impala catalog service.
       </li>
 
-      <li>
-        Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
-        processing and avoid network traffic due to remote reads.
-      </li>
+      <li> Start the main Impala daemon services. </li>
     </ol>
 
     <p>
@@ -101,9 +98,8 @@ under the License.
 
 <codeblock rev="1.2">$ sudo service impala-catalog start</codeblock>
 
-      <p>
-        Start the Impala service on each DataNode using a command similar to the following:
-      </p>
+      <p> Start the Impala daemon services using a command similar to the
+        following: </p>
 
       <p>
 <codeblock>$ sudo service impala-server start</codeblock>
diff --git a/docs/topics/impala_troubleshooting.xml b/docs/topics/impala_troubleshooting.xml
index 250c899..80b7363 100644
--- a/docs/topics/impala_troubleshooting.xml
+++ b/docs/topics/impala_troubleshooting.xml
@@ -123,17 +123,17 @@ terminate called after throwing an instance of 'boost::exception_detail::clone_i
   <concept id="trouble_io" rev="">
     <title>Troubleshooting I/O Capacity Problems</title>
     <conbody>
-      <p>
-        Impala queries are typically I/O-intensive. If there is an I/O problem with storage devices,
-        or with HDFS itself, Impala queries could show slow response times with no obvious cause
-        on the Impala side. Slow I/O on even a single DataNode could result in an overall slowdown, because
-        queries involving clauses such as <codeph>ORDER BY</codeph>, <codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph>
-        do not start returning results until all DataNodes have finished their work.
-      </p>
-      <p>
-        To test whether the Linux I/O system itself is performing as expected, run Linux commands like
-        the following on each DataNode:
-      </p>
+      <p> Impala queries are typically I/O-intensive. If there is an I/O problem
+        with storage devices, or with HDFS itself, Impala queries could show
+        slow response times with no obvious cause on the Impala side. Slow I/O
+        on even a single Impala daemon could result in an overall slowdown,
+        because queries involving clauses such as <codeph>ORDER BY</codeph>,
+          <codeph>GROUP BY</codeph>, or <codeph>JOIN</codeph> do not start
+        returning results until all executor Impala daemons have finished their
+        work. </p>
+      <p> To test whether the Linux I/O system itself is performing as expected,
+        run Linux commands like the following on each host Impala daemon is
+        running: </p>
 <codeblock>
 $ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
 vm.drop_caches = 3
@@ -265,14 +265,15 @@ $ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
                 </p>
 
                 <p>
-                  <note>
-                    Replace <varname>hostname</varname> and <varname>port</varname> with the hostname and port of
-                    your Impala state store host machine and web server port. The default port is 25010.
-                  </note>
-                  The number of <codeph>impalad</codeph> instances listed should match the expected number of
-                  <codeph>impalad</codeph> instances installed in the cluster. There should also be one
-                  <codeph>impalad</codeph> instance installed on each DataNode
-                </p>
+                  <note> Replace <varname>hostname</varname> and
+                      <varname>port</varname> with the hostname and port of your
+                    Impala state store host machine and web server port. The
+                    default port is 25010. </note> The number of
+                    <codeph>impalad</codeph> instances listed should match the
+                  expected number of <codeph>impalad</codeph> instances
+                  installed in the cluster. There should also be one
+                    <codeph>impalad</codeph> instance installed on each
+                  DataNode.</p>
               </entry>
               <entry>
                 <p>