You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/05/29 23:28:34 UTC

[impala] 01/02: IMPALA-8447: [DOCS] INSERT event is supported in automatic invalidation

This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 879357c8a2924a22ca695c7ad609c11d48bc717e
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Thu May 9 17:52:51 2019 -0700

    IMPALA-8447: [DOCS] INSERT event is supported in automatic invalidation
    
    - Added the INSERT events to the supported events.
    - Noted the limitation with inserts from SparkSQL.
    
    Change-Id: I68133b0beeb15cacc73829b8a8b0838fc7f4b7d8
    Reviewed-on: http://gerrit.cloudera.org:8080/13300
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
---
 docs/topics/impala_metadata.xml | 67 +++++++++++++++++++++++++++++++----------
 1 file changed, 51 insertions(+), 16 deletions(-)

diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 061a83c..43ec69b 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -131,7 +131,7 @@ under the License.
 
       <p>
         When this feature is enabled, <codeph>catalogd</codeph> polls Hive Metastore (HMS)
-        notifications events at a configurable interval and processes the following changes:
+        notification events at a configurable interval and processes the following changes:
       </p>
 
       <note>
@@ -141,9 +141,12 @@ under the License.
 
       <ul>
         <li>
-          Invalidates the tables when it receives the <codeph>ALTER TABLE</codeph> events or the
-          <codeph>ALTER</codeph>, <codeph>ADD</codeph>, or <codeph>DROP</codeph> their
-          partitions.
+          Invalidates the tables when it receives the <codeph>ALTER TABLE</codeph> event.
+        </li>
+
+        <li>
+          Refreshes the table when it receives the <codeph>ALTER</codeph>, <codeph>ADD</codeph>,
+          or <codeph>DROP</codeph> its partitions.
         </li>
 
         <li>
@@ -157,6 +160,15 @@ under the License.
         </li>
 
         <li>
+          Refreshes the table and partitions when it receives the <codeph>INSERT</codeph>
+          events.
+          <p>
+            If the table is not loaded at the time of processing the <codeph>INSERT</codeph>
+            event, the event processor does not need to refresh the table and skips it.
+          </p>
+        </li>
+
+        <li>
           Changes the database and updates <codeph>catalogd</codeph> when it receives the
           <codeph>ALTER DATABASE</codeph> events. The following changes are supported. This
           event does not invalidate the tables in the database.
@@ -190,7 +202,7 @@ under the License.
         This feature is controlled by the
         <codeph>&#8209;&#8209;hms_event_polling_interval_s</codeph> flag. Start the
         <codeph>catalogd</codeph> with the <codeph>‑‑hms_event_polling_interval_s</codeph>
-        flag set to a non-zero value to enable the feature and set the polling frequency in
+        flag set to a positive integer to enable the feature and set the polling frequency in
         seconds. We recommend the value to be less than 5 seconds.
       </p>
 
@@ -205,10 +217,14 @@ under the License.
         </li>
 
         <li>
-          Adding data from one Impala cluster to existing tables/partitions will not synced to
-          another Impala cluster.
+          When you bypass HMS and add or remove data into table by adding files directly on the
+          filesystem, HMS does not generate the <codeph>INSERT</codeph> event, and the event
+          processor will not invalidate the corresponding table or refresh the corresponding
+          partition.
           <p>
-            Only new tables and partitions are synced.
+            It is recommended that you use the <codeph>LOAD DATA</codeph> command to do the data
+            load in such cases, so that event processor can act on the events generated by the
+            <codeph>LOAD</codeph> command.
           </p>
         </li>
       </ul>
@@ -228,18 +244,37 @@ under the License.
       <conbody>
 
         <p>
-          As the first step to use the HMS event based metadata sync, add the following entry to
-          the <codeph>hive-site.xml</codeph> of Hive metastore service.
+          To use the HMS event based metadata sync:
         </p>
 
+        <ol>
+          <li>
+            Add the following entries to the <codeph>hive-site.xml</codeph> of the Hive
+            Metastore service.
 <codeblock> &lt;property>
     &lt;name>hive.metastore.transactional.event.listeners&lt;/name>
     &lt;value>org.apache.hive.hcatalog.listener.DbNotificationListener&lt;/value>
+
+    &lt;name>hive.metastore.dml.events&lt;/name>
+    &lt;value>true&lt;/true>
   &lt;/property></codeblock>
+          </li>
 
-        <p>
-          Save <codeph>hive-site.xml</codeph> and restart Hive.
-        </p>
+          <li>
+            Save <codeph>hive-site.xml</codeph>.
+          </li>
+
+          <li>
+            Set the <codeph>hive.metastore.dml.events</codeph> configuration key to
+            <codeph>true</codeph> in HiveServer2 service's <codeph>hive-site.xml</codeph>. This
+            configuration key needs to be set to <codeph>true</codeph> in both Hive services,
+            HiveServer2 and Hive Metastore.
+          </li>
+
+          <li>
+            Restart the HiveServer2 and Hive Metastore services.
+          </li>
+        </ol>
 
       </conbody>
 
@@ -392,7 +427,7 @@ under the License.
                     events-processor.avg-events-process-duration
                   </entry>
                   <entry>
-                    Average time taken to process a batch of events received from metastore.
+                    Average time taken to process a batch of events received from the Metastore.
                   </entry>
                 </row>
                 <row>
@@ -400,7 +435,7 @@ under the License.
                     events-processor.events-received
                   </entry>
                   <entry>
-                    Total number of metastore events received.
+                    Total number of the Metastore events received.
                   </entry>
                 </row>
                 <row>
@@ -450,7 +485,7 @@ under the License.
                     events-processor.events-skipped
                   </entry>
                   <entry>
-                    Total number of metastore events skipped.
+                    Total number of the Metastore events skipped.
 
                     <p>
                       Events can be skipped based on certain flags are table and database level.