You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/05/29 23:28:34 UTC
[impala] 01/02: IMPALA-8447: [DOCS] INSERT event is supported in
automatic invalidation
This is an automated email from the ASF dual-hosted git repository.
arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 879357c8a2924a22ca695c7ad609c11d48bc717e
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Thu May 9 17:52:51 2019 -0700
IMPALA-8447: [DOCS] INSERT event is supported in automatic invalidation
- Added the INSERT events to the supported events.
- Noted the limitation with inserts from SparkSQL.
Change-Id: I68133b0beeb15cacc73829b8a8b0838fc7f4b7d8
Reviewed-on: http://gerrit.cloudera.org:8080/13300
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
---
docs/topics/impala_metadata.xml | 67 +++++++++++++++++++++++++++++++----------
1 file changed, 51 insertions(+), 16 deletions(-)
diff --git a/docs/topics/impala_metadata.xml b/docs/topics/impala_metadata.xml
index 061a83c..43ec69b 100644
--- a/docs/topics/impala_metadata.xml
+++ b/docs/topics/impala_metadata.xml
@@ -131,7 +131,7 @@ under the License.
<p>
When this feature is enabled, <codeph>catalogd</codeph> polls Hive Metastore (HMS)
- notifications events at a configurable interval and processes the following changes:
+ notification events at a configurable interval and processes the following changes:
</p>
<note>
@@ -141,9 +141,12 @@ under the License.
<ul>
<li>
- Invalidates the tables when it receives the <codeph>ALTER TABLE</codeph> events or the
- <codeph>ALTER</codeph>, <codeph>ADD</codeph>, or <codeph>DROP</codeph> their
- partitions.
+ Invalidates the tables when it receives the <codeph>ALTER TABLE</codeph> event.
+ </li>
+
+ <li>
+ Refreshes the table when it receives the <codeph>ALTER</codeph>, <codeph>ADD</codeph>,
+ or <codeph>DROP</codeph> its partitions.
</li>
<li>
@@ -157,6 +160,15 @@ under the License.
</li>
<li>
+ Refreshes the table and partitions when it receives the <codeph>INSERT</codeph>
+ events.
+ <p>
+ If the table is not loaded at the time of processing the <codeph>INSERT</codeph>
+ event, the event processor does not need to refresh the table and skips it.
+ </p>
+ </li>
+
+ <li>
Changes the database and updates <codeph>catalogd</codeph> when it receives the
<codeph>ALTER DATABASE</codeph> events. The following changes are supported. This
event does not invalidate the tables in the database.
@@ -190,7 +202,7 @@ under the License.
This feature is controlled by the
<codeph>‑‑hms_event_polling_interval_s</codeph> flag. Start the
<codeph>catalogd</codeph> with the <codeph>‑‑hms_event_polling_interval_s</codeph>
- flag set to a non-zero value to enable the feature and set the polling frequency in
+ flag set to a positive integer to enable the feature and set the polling frequency in
seconds. We recommend the value to be less than 5 seconds.
</p>
@@ -205,10 +217,14 @@ under the License.
</li>
<li>
- Adding data from one Impala cluster to existing tables/partitions will not synced to
- another Impala cluster.
+ When you bypass HMS and add or remove data into table by adding files directly on the
+ filesystem, HMS does not generate the <codeph>INSERT</codeph> event, and the event
+ processor will not invalidate the corresponding table or refresh the corresponding
+ partition.
<p>
- Only new tables and partitions are synced.
+ It is recommended that you use the <codeph>LOAD DATA</codeph> command to do the data
+ load in such cases, so that event processor can act on the events generated by the
+ <codeph>LOAD</codeph> command.
</p>
</li>
</ul>
@@ -228,18 +244,37 @@ under the License.
<conbody>
<p>
- As the first step to use the HMS event based metadata sync, add the following entry to
- the <codeph>hive-site.xml</codeph> of Hive metastore service.
+ To use the HMS event based metadata sync:
</p>
+ <ol>
+ <li>
+ Add the following entries to the <codeph>hive-site.xml</codeph> of the Hive
+ Metastore service.
<codeblock> <property>
<name>hive.metastore.transactional.event.listeners</name>
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
+
+ <name>hive.metastore.dml.events</name>
+ <value>true</true>
</property></codeblock>
+ </li>
- <p>
- Save <codeph>hive-site.xml</codeph> and restart Hive.
- </p>
+ <li>
+ Save <codeph>hive-site.xml</codeph>.
+ </li>
+
+ <li>
+ Set the <codeph>hive.metastore.dml.events</codeph> configuration key to
+ <codeph>true</codeph> in HiveServer2 service's <codeph>hive-site.xml</codeph>. This
+ configuration key needs to be set to <codeph>true</codeph> in both Hive services,
+ HiveServer2 and Hive Metastore.
+ </li>
+
+ <li>
+ Restart the HiveServer2 and Hive Metastore services.
+ </li>
+ </ol>
</conbody>
@@ -392,7 +427,7 @@ under the License.
events-processor.avg-events-process-duration
</entry>
<entry>
- Average time taken to process a batch of events received from metastore.
+ Average time taken to process a batch of events received from the Metastore.
</entry>
</row>
<row>
@@ -400,7 +435,7 @@ under the License.
events-processor.events-received
</entry>
<entry>
- Total number of metastore events received.
+ Total number of the Metastore events received.
</entry>
</row>
<row>
@@ -450,7 +485,7 @@ under the License.
events-processor.events-skipped
</entry>
<entry>
- Total number of metastore events skipped.
+ Total number of the Metastore events skipped.
<p>
Events can be skipped based on certain flags are table and database level.