You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/07/03 20:24:17 UTC

[impala] branch master updated (e7dde15 -> 1943008)

This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from e7dde15  Revert "build: use thin static archives"
     new 7bbf834  IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL
     new 1943008  IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/impala.ditamap                   |  19 +++--
 docs/topics/impala_client.xml         | 142 ++++++++++++++++++++++++++++++++++
 docs/topics/impala_config_options.xml |  47 -----------
 docs/topics/impala_metadata.xml       | 102 ++++++++++++++++--------
 4 files changed, 221 insertions(+), 89 deletions(-)
 create mode 100644 docs/topics/impala_client.xml


[impala] 01/02: IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL

Posted by ar...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7bbf8344e174ef0ff948c6aa9e55e1bd91348f79
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Mon Jul 1 18:06:02 2019 -0700

    IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL
    
    - Also made a few formatting changes.
    - Removed the Preview Release note for Invalidation of Metadata cache.
    
    Change-Id: I36cfc7e592ed2588a8c1f8375033d60492b27a4f
    Reviewed-on: http://gerrit.cloudera.org:8080/13777
    Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/topics/impala_metadata.xml | 102 ++++++++++++++++++++++++++--------------
 1 file changed, 68 insertions(+), 34 deletions(-)

diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 4139e80..98cb6fd 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -44,34 +44,54 @@ under the License.
 
   <concept id="auto_invalidate_metadata">
 
-    <title>Startup Options for Automatic Invalidation of Metadata</title>
+    <title>Automatic Invalidation of Metadata Cache</title>
 
     <conbody>
 
       <p>
         To keep the size of metadata bounded, <codeph>catalogd</codeph> periodically scans all
         the tables and invalidates those not recently used. There are two types of
-        configurations in <codeph>catalogd</codeph>.
+        configurations for <codeph>catalogd</codeph> and <codeph>impalad</codeph>.
       </p>
 
-      <ul>
-        <li>
-          Time-based invalidation with the
-          <codeph>&#8209;&#8209;invalidate_tables_timeout_s</codeph> flag:
-          <codeph>Catalogd</codeph> invalidates tables that are not recently used in the
-          specified time period (in seconds). This flag needs to be applied to both
-          <codeph>impalad</codeph> and <codeph>catalogd</codeph>.
-        </li>
+      <dl>
+        <dlentry>
 
-        <li>
-          Memory-based invalidation with the
-          <codeph>&#8209;&#8209;invalidate_tables_on_memory_pressure</codeph> flag: When the
-          memory pressure reaches 60% of JVM heap size after a Java garbage collection in
-          <codeph>catalogd</codeph>, Impala invalidates 10% of the least recently used tables.
-          This flag needs to be applied to both <codeph>impalad</codeph> and
-          <codeph>catalogd</codeph>.
-        </li>
-      </ul>
+          <dt>
+            Time-based cache invalidation
+          </dt>
+
+          <dd>
+            <codeph>Catalogd</codeph> invalidates tables that are not recently used in the
+            specified time period (in seconds).
+          </dd>
+
+          <dd>
+            The <codeph>&#8209;&#8209;invalidate_tables_timeout_s</codeph> flag needs to be
+            applied to both <codeph>impalad</codeph> and <codeph>catalogd</codeph>.
+          </dd>
+
+        </dlentry>
+
+        <dlentry>
+
+          <dt>
+            Memory-based cache invalidation
+          </dt>
+
+          <dd>
+            When the memory pressure reaches 60% of JVM heap size after a Java garbage
+            collection in <codeph>catalogd</codeph>, Impala invalidates 10% of the least
+            recently used tables.
+          </dd>
+
+          <dd>
+            The <codeph>&#8209;&#8209;invalidate_tables_on_memory_pressure</codeph> flag needs
+            to be applied to both <codeph>impalad</codeph> and <codeph>catalogd</codeph>.
+          </dd>
+
+        </dlentry>
+      </dl>
 
       <p>
         Automatic invalidation of metadata provides more stability with lower chances of running
@@ -79,23 +99,28 @@ under the License.
         require tuning.
       </p>
 
-      <note>
-        This is a preview feature in Impala 3.1 and not generally available.
-      </note>
-
     </conbody>
 
   </concept>
 
   <concept id="auto_poll_hms_notification">
 
-    <title>Automatic Metadata Sync using Hive Metastore Notification Events</title>
+    <title>Automatic Invalidation/Refresh of Metadata</title>
 
     <conbody>
 
       <p>
-        When this feature is enabled, <codeph>catalogd</codeph> polls Hive Metastore (HMS)
-        notification events at a configurable interval and processes the following changes:
+        When tools such as Hive and Spark are used to process the raw data ingested into Hive
+        tables, new HMS metadata (database, tables, partitions) and filesystem metadata (new
+        files in existing partitions/tables) is generated. In previous versions of Impala, in
+        order to pick up this new information, Impala users needed to manually issue an
+        <codeph>INVALIDATE</codeph> or <codeph>REFRESH</codeph> commands.
+      </p>
+
+      <p>
+        When automatic invalidate/refresh of metadata is enabled, <codeph>catalogd</codeph>
+        polls Hive Metastore (HMS) notification events at a configurable interval and processes
+        the following changes:
       </p>
 
       <note>
@@ -109,8 +134,8 @@ under the License.
         </li>
 
         <li>
-          Refreshes the table when it receives the <codeph>ALTER</codeph>, <codeph>ADD</codeph>,
-          or <codeph>DROP</codeph> its partitions.
+          Refreshes the partition when it receives the <codeph>ALTER</codeph>,
+          <codeph>ADD</codeph>, or <codeph>DROP</codeph> partitions.
         </li>
 
         <li>
@@ -176,11 +201,6 @@ under the License.
 
       <ul>
         <li>
-          The operations that do not generate events in HMS, such as adding new data to existing
-          tables/partitions from Spark, are not supported.
-        </li>
-
-        <li>
           When you bypass HMS and add or remove data into table by adding files directly on the
           filesystem, HMS does not generate the <codeph>INSERT</codeph> event, and the event
           processor will not invalidate the corresponding table or refresh the corresponding
@@ -191,6 +211,12 @@ under the License.
             <codeph>LOAD</codeph> command.
           </p>
         </li>
+
+        <li>
+          The Spark APIs that saves data to a specified location does not generate events in
+          HMS, thus is not supported. For example:
+<codeblock>Seq((1, 2)).toDF("i", "j").write.save("/user/hive/warehouse/spark_etl.db/customers/date=01012019")</codeblock>
+        </li>
       </ul>
 
       <p>
@@ -236,7 +262,15 @@ under the License.
           </li>
 
           <li>
-            Restart the HiveServer2 and Hive Metastore services.
+            If applicable, set the <codeph>hive.metastore.dml.events</codeph> configuration key
+            to <codeph>true</codeph> in <codeph>hive-site.xml</codeph> used by the Spark
+            applications (typically, <codeph>/etc/hive/conf/hive-site.xml</codeph>) so that the
+            <codeph>INSERT</codeph> events are generated when the Spark application inserts data
+            into existing tables and partitions.
+          </li>
+
+          <li>
+            Restart the HiveServer2, Hive Metastore, and Spark (if applicable) services.
           </li>
         </ol>
 


[impala] 02/02: IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced

Posted by ar...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 19430082f34ae69c86dd3cd6078f8855e9e8646d
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Fri Jun 28 13:24:34 2019 -0700

    IMPALA-8427: [DOCS] Document the new startup flag IMPALA-7800 introduced
    
    - Added a new doc impala_client.xml as the overview of Impala impala
      client access.
    
    Change-Id: I1a4c1975721c32a78a003d91babc5d2bb90f3949
    Reviewed-on: http://gerrit.cloudera.org:8080/13762
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Michael Ho <kw...@cloudera.com>
---
 docs/impala.ditamap                   |  19 +++--
 docs/topics/impala_client.xml         | 142 ++++++++++++++++++++++++++++++++++
 docs/topics/impala_config_options.xml |  47 -----------
 3 files changed, 153 insertions(+), 55 deletions(-)

diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 554430f..306263f 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -56,8 +56,6 @@ under the License.
   </topicref>
   <topicref audience="standalone" href="topics/impala_config.xml">
     <topicref href="topics/impala_config_performance.xml"/>
-    <topicref href="topics/impala_odbc.xml"/>
-    <topicref href="topics/impala_jdbc.xml"/>
   </topicref>
   <topicref audience="standalone" href="topics/impala_upgrading.xml"/>
   <topicref audience="standalone" href="topics/impala_processes.xml">
@@ -277,12 +275,7 @@ under the License.
     <topicref href="topics/impala_langref_unsupported.xml"/>
     <topicref href="topics/impala_porting.xml"/>
   </topicref>
-  <topicref href="topics/impala_impala_shell.xml">
-    <topicref href="topics/impala_shell_options.xml"/>
-    <topicref href="topics/impala_connecting.xml"/>
-    <topicref href="topics/impala_shell_running_commands.xml"/>
-    <topicref href="topics/impala_shell_commands.xml"/>
-  </topicref>
+  
   <topicref href="topics/impala_performance.xml">
     <topicref href="topics/impala_perf_cookbook.xml"/>
     <topicref href="topics/impala_perf_joins.xml"/>
@@ -320,6 +313,16 @@ under the License.
   <topicref rev="2.9.0" href="topics/impala_adls.xml"/>
   <topicref href="topics/impala_isilon.xml"/>
   <topicref href="topics/impala_logging.xml"/>
+  <topicref href="topics/impala_client.xml">
+    <topicref href="topics/impala_impala_shell.xml">
+      <topicref href="topics/impala_shell_options.xml"/>
+      <topicref href="topics/impala_connecting.xml"/>
+      <topicref href="topics/impala_shell_running_commands.xml"/>
+      <topicref href="topics/impala_shell_commands.xml"/>
+    </topicref>
+    <topicref href="topics/impala_odbc.xml"/>
+    <topicref href="topics/impala_jdbc.xml"/>
+  </topicref>
   <topicref href="topics/impala_troubleshooting.xml">
     <topicref href="topics/impala_webui.xml"/>
     <topicref href="topics/impala_breakpad.xml"/>
diff --git a/docs/topics/impala_client.xml b/docs/topics/impala_client.xml
new file mode 100644
index 0000000..93987e1
--- /dev/null
+++ b/docs/topics/impala_client.xml
@@ -0,0 +1,142 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="intro_client">
+
+  <title>Impala Client Access</title>
+
+  <conbody>
+
+    <p>
+      Application developers have a number of options to interface with Impala. The core
+      development language with Impala is SQL. You can also use Java or other languages to
+      interact with Impala through the standard JDBC and ODBC interfaces used by many business
+      intelligence tools. For specialized kinds of analysis, you can supplement the Impala
+      built-in functions by writing user-defined functions in C++ or Java.
+    </p>
+
+    <p>
+      You can connect and submit requests to the Impala through:
+    </p>
+
+    <ul>
+      <li>
+        The impala-shell interactive command interpreter
+      </li>
+
+      <li>
+        The Hue web-based user interface
+      </li>
+
+      <li>
+        JDBC
+      </li>
+
+      <li>
+        ODBC
+      </li>
+    </ul>
+
+    <p>
+      Each <codeph>impalad</codeph> daemon process, running on separate nodes in a cluster,
+      listens to <xref href="impala_ports.xml#ports">several ports</xref> for incoming requests:
+    </p>
+
+    <ul>
+      <li>
+        Requests from <codeph>impala-shell</codeph> and Hue are routed to the
+        <codeph>impalad</codeph> daemons through the same port.
+      </li>
+
+      <li>
+        The <codeph>impalad</codeph> daemons listen on separate ports for JDBC and ODBC
+        requests.
+      </li>
+    </ul>
+
+    <section id="section_egg_wjt_f3b">
+
+      <title>Impala Startup Options for Client Connections</title>
+
+      <p>
+        The following options control client connections to Impala.
+      </p>
+
+      <dl>
+        <dlentry>
+
+          <dt>
+            --fe_service_threads
+          </dt>
+
+          <dd>
+            Specifies the maximum number of concurrent client connections allowed. The default
+            value is 64 with which 64 queries can run simultaneously.
+            <p>
+              If you have more clients trying to connect to Impala than the value of this
+              setting, the later arriving clients have to wait for the duration specified by
+              <codeph>--accepted_client_cnxn_timeout</codeph>. You can increase this value to
+              allow more client connections. However, a large value means more threads to be
+              maintained even if most of the connections are idle, and it could negatively
+              impact query latency. Client applications should use the connection pool to avoid
+              need for large number of sessions.
+            </p>
+          </dd>
+
+        </dlentry>
+
+        <dlentry>
+
+          <dt>
+            --accepted_client_cnxn_timeout
+          </dt>
+
+          <dd>
+            Controls how Impala treats new connection requests if it has run out of the number
+            of threads configured by <codeph>--fe_service_threads</codeph>.
+            <p>
+              If <codeph>--accepted_client_cnxn_timeout > 0</codeph>, new connection requests
+              are rejected if Impala can't get a server thread within the specified (in seconds)
+              timeout.
+            </p>
+
+            <p>
+              If <codeph>--accepted_client_cnxn_timeout=0</codeph>, i.e. no timeout, clients
+              wait indefinitely to open the new session until more threads are available.
+            </p>
+
+            <p>
+              The default timeout is 5 minutes.
+            </p>
+
+            <p>
+              The timeout applies only to client facing thrift servers, i.e., HS2 and Beeswax
+              servers.
+            </p>
+          </dd>
+
+        </dlentry>
+      </dl>
+
+    </section>
+
+  </conbody>
+
+</concept>
diff --git a/docs/topics/impala_config_options.xml b/docs/topics/impala_config_options.xml
index 469fa62..7ef6612 100644
--- a/docs/topics/impala_config_options.xml
+++ b/docs/topics/impala_config_options.xml
@@ -294,53 +294,6 @@ Starting Impala Catalog Server:                            [  OK  ]</codeblock>
 
   </concept>
 
-  <concept id="config_options_impalad">
-
-    <title>Startup Options for impalad Daemon</title>
-
-    <conbody>
-
-      <p>
-        The <codeph>impalad</codeph> daemon implements the main Impala service, which performs
-        query processing and reads from and writes to the data files. Some of the noteworthy
-        options are:
-        <ul>
-          <li>
-            The <codeph>&#8209;&#8209;fe_service_threads</codeph> option specifies the maximum
-            number of concurrent client connections allowed. The default value is 64 with which
-            64 queries can run simultaneously.
-            <p>
-              If you have more clients trying to connect to Impala than the value of this
-              setting, the later arriving clients have to wait until previous clients
-              disconnect. You can increase this value to allow more client connections. However,
-              a large value means more threads to be maintained even if most of the connections
-              are idle, and it could negatively impact query latency. Client applications should
-              use the connection pool to avoid need for large number of sessions.
-            </p>
-          </li>
-        </ul>
-      </p>
-
-    </conbody>
-
-  </concept>
-
-  <concept id="config_options_statestored">
-
-    <title>Startup Options for statestored Daemon</title>
-
-    <conbody>
-
-      <p>
-        The <cmdname>statestored</cmdname> daemon implements the Impala StateStore service,
-        which monitors the availability of Impala services across the cluster, and handles
-        situations such as nodes becoming unavailable or becoming available again.
-      </p>
-
-    </conbody>
-
-  </concept>
-
   <concept rev="1.2" id="config_options_catalogd">
 
     <title>Startup Options for catalogd Daemon</title>