You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/07/22 20:12:17 UTC

[impala] branch master updated: IMPALA-8727: [DOCS] Impala-side changes for Kudu HMS integration

This is an automated email from the ASF dual-hosted git repository.

arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 5466cf6  IMPALA-8727: [DOCS] Impala-side changes for Kudu HMS integration
5466cf6 is described below

commit 5466cf6344af893521b093915aba25fde6b13d4e
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Mon Jul 1 17:05:16 2019 -0700

    IMPALA-8727: [DOCS] Impala-side changes for Kudu HMS integration
    
    Change-Id: Ieec79ac0bbb860c6394a3bf0617b285a7d23ca9e
    Reviewed-on: http://gerrit.cloudera.org:8080/13776
    Reviewed-by: Alexey Serbin <as...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Andrew Wong <aw...@cloudera.com>
    Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
---
 docs/shared/impala_common.xml | 54 ++++++++++++++++++++-----------------
 docs/topics/impala_kudu.xml   | 63 +++++++++++++++++++++++++++----------------
 docs/topics/impala_tables.xml | 32 +++++++++++-----------
 3 files changed, 85 insertions(+), 64 deletions(-)

diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 15972b6..92419ee 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -2102,17 +2102,18 @@ SELECT udf_created_by_hive(c1) FROM ...
         <codeph>Location</codeph> field displays the path of the table directory as an HDFS URI.
       </p>
 
-      <p id="switch_internal_external_table">
-        You can switch a table from internal to external, or from external to internal, by using
-        the <codeph>ALTER TABLE</codeph> statement:
-<codeblock xml:space="preserve">
+      <p id="switch_internal_external_table"> You can switch a table from
+        internal to external, or from external to internal, by using the
+          <codeph>ALTER TABLE</codeph> statement:
+        <codeblock xml:space="preserve">
 -- Switch a table from internal to external.
 ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('EXTERNAL'='TRUE');
 
 -- Switch a table from external to internal.
 ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('EXTERNAL'='FALSE');
-</codeblock>
-      </p>
+</codeblock>If
+        the Kudu service is integrated with the Hive Metastore, the above
+        operations are not supported.</p>
 
 <!-- The data to show sensible output from these queries is in the TPC-DS schema 'CUSTOMER' table.
      If you want to show real output, add a LIMIT 5 or similar clause to each query to avoid
@@ -4536,12 +4537,14 @@ sudo pip-python install ssl</codeblock>
         attribute to that column.
       </p>
 
-      <p id="kudu_metadata_intro" rev="kudu">
-        Much of the metadata for Kudu tables is handled by the underlying storage layer. Kudu
-        tables have less reliance on the metastore database, and require less metadata caching
-        on the Impala side. For example, information about partitions in Kudu tables is managed
-        by Kudu, and Impala does not cache any block locality metadata for Kudu tables.
-      </p>
+      <p id="kudu_metadata_intro" rev="kudu">By default, much of the metadata
+        for Kudu tables is handled by the underlying storage layer. Kudu tables
+        have less reliance on the Metastore database, and require less metadata
+        caching on the Impala side. For example, information about partitions in
+        Kudu tables is managed by Kudu, and Impala does not cache any block
+        locality metadata for Kudu tables. If the Kudu service is not integrated
+        with the Hive Metastore, Impala will manage Kudu table metadata in the
+        Hive Metastore.</p>
 
       <p id="kudu_metadata_details" rev="kudu">
         The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> statements are
@@ -4553,22 +4556,25 @@ sudo pip-python install ssl</codeblock>
         the Kudu table schema, such as adding or dropping a column.
       </p>
 
-      <p id="kudu_internal_external_tables">
-        The distinction between internal and external tables has some special details for Kudu
-        tables. Tables created entirely through Impala are internal tables. The table name as
-        represented within Kudu includes notation such as an <codeph>impala::</codeph> prefix
-        and the Impala database name. External Kudu tables are those created by a non-Impala
-        mechanism, such as a user application calling the Kudu APIs. For these tables, the
-        <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets you establish a mapping from Impala
-        to the existing Kudu table:
-<codeblock>
+      <p id="kudu_internal_external_tables"> If the Kudu service is not
+        integrated with the Hive Metastore, the distinction between internal and
+        external tables has some special details for Kudu tables. Tables created
+        entirely through Impala are internal tables. The table name as
+        represented within Kudu includes notation such as an
+          <codeph>impala::</codeph> prefix and the Impala database name.
+        External Kudu tables are those created by a non-Impala mechanism, such
+        as a user application calling the Kudu APIs. For these tables, the
+          <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets you establish a
+        mapping from Impala to the existing Kudu table:
+        <codeblock>
 CREATE EXTERNAL TABLE impala_name STORED AS KUDU
   TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
 </codeblock>
-        External Kudu tables differ in one important way from other external tables: adding or
-        dropping a column or range partition changes the data in the underlying Kudu table, in
-        contrast to an HDFS-backed external table where existing data files are left untouched.
       </p>
+      <p> External Kudu tables differ in one important way from other external
+        tables: adding or dropping a column or range partition changes the data
+        in the underlying Kudu table, in contrast to an HDFS-backed external
+        table where existing data files are left untouched.</p>
 
       <p id="kudu_sentry_limitations" rev="IMPALA-4000">
         Access to Kudu tables must be granted to and revoked from roles with the following
diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
index 2e0f364..ec4be37 100644
--- a/docs/topics/impala_kudu.xml
+++ b/docs/topics/impala_kudu.xml
@@ -1109,24 +1109,19 @@ select microseconds,
       <title>How Impala Handles Kudu Metadata</title>
 
       <conbody>
-
+        <note>This section only applies the Kudu services that are not
+          integrated with the Hive Metastore (HMS).</note>
         <p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
         <p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
-
-        <p>
-          Because Kudu manages the metadata for its own tables separately from the metastore
-          database, there is a table name stored in the metastore database for Impala to use,
-          and a table name on the Kudu side, and these names can be modified independently
-          through <codeph>ALTER TABLE</codeph> statements.
-        </p>
-
-        <p>
-          To avoid potential name conflicts, the prefix <codeph>impala::</codeph>
-          and the Impala database name are encoded into the underlying Kudu
-          table name:
-        </p>
-
-<codeblock><![CDATA[
+        <p> Because Kudu manages the metadata for its own tables separately from
+          the metastore database, there is a table name stored in the metastore
+          database for Impala to use, and a table name on the Kudu side, and
+          these names can be modified independently through <codeph>ALTER
+            TABLE</codeph> statements. </p>
+        <p> To avoid potential name conflicts, the prefix
+            <codeph>impala::</codeph> and the Impala database name are encoded
+          into the underlying Kudu table name: </p>
+        <codeblock><![CDATA[
 create database some_database;
 use some_database;
 
@@ -1138,18 +1133,40 @@ describe formatted table_name_demo;
 kudu.table_name  | impala::some_database.table_name_demo
 ]]>
 </codeblock>
-
-        <p>
-          See <xref keyref="kudu_tables"/> for examples of how to change the name of
-          the Impala table in the metastore database, the name of the underlying Kudu
-          table, or both.
-        </p>
-
+        <p> See <xref keyref="kudu_tables"/> for examples of how to change the
+          name of the Impala table in the metastore database, the name of the
+          underlying Kudu table, or both. </p>
       </conbody>
 
     </concept>
 
   </concept>
+  <concept id="concept_y4k_x1m_j3b">
+    <title>Working with Kudu Integrated with Hive Metastore</title>
+    <conbody>
+      <p>Starting from Kudu 1.10 and Impala 3.3, Impala supports Kudu services
+        integrated with the Hive Metastore (HMS). See <xref
+          href="https://kudu.apache.org/docs/hive_metastore.html#hive_metastore"
+          format="html" scope="external">the HMS integration
+          documentation</xref> for more details on Kudu’s Hive Metastore
+        integration.</p>
+      <p>The following are some of the changes you need to consider when working
+        with Kudu services integrated with the HMS.<ul>
+          <li> When Kudu is integrated with the Hive Metastore, Impala must be
+            configured to use the same HMS as Kudu.</li>
+          <li> Since there may be no one-to-one mapping between Kudu tables and
+            external tables, only internal tables are automatically
+            synchronized. </li>
+          <li>When you create a table in Kudu, Kudu will create an HMS entry for
+            that table with the internal table type.</li>
+          <li>When the Kudu service is integrated with the HMS, internal table
+            entries will be created automatically in the HMS when tables are
+            created in Kudu without Impala. To access these tables through
+            Impala, run <codeph>INVALIDATE METADATA</codeph> statement so Impala
+            picks up the latest metadata.</li>
+        </ul></p>
+    </conbody>
+  </concept>
 
   <concept id="kudu_etl">
 
diff --git a/docs/topics/impala_tables.xml b/docs/topics/impala_tables.xml
index b2b6a71..978dd3c 100644
--- a/docs/topics/impala_tables.xml
+++ b/docs/topics/impala_tables.xml
@@ -284,28 +284,28 @@ under the License.
       </metadata>
     </prolog>
     <conbody>
-      <p>
-        Tables stored in Apache Kudu are treated specially, because Kudu
-        manages its data independently of HDFS files. Some information about the
-        table is stored in the metastore database for use by Impala. Other table
-        metadata is managed internally by Kudu.
-      </p>
-      <p>
-        When you create a Kudu table through Impala, the table is assigned an
-        internal Kudu table name of the form
+      <p> By default, tables stored in Apache Kudu are treated specially,
+        because Kudu manages its data independently of HDFS files. </p>
+      <p>All metadata that Impala needs is stored in the HMS.</p>
+      <p> When Kudu is not integrated with the HMS, when you create a Kudu table
+        through Impala, the table is assigned an internal Kudu table name of the
+        form
             <codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>.
         You can see the Kudu-assigned name in the output of <codeph>DESCRIBE
           FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of
-        the table properties.
+        the table properties. </p>
+      <p> 
+        For Impala-Kudu managed tables, <codeph>ALTER TABLE ...
+          RENAME</codeph> renames both the Impala and the Kudu table. 
       </p>
-      <p> For Impala-Kudu managed tables, <codeph>ALTER TABLE ...
-          RENAME</codeph> renames both the Impala and the Kudu table. </p>
-      <p> For Impala-Kudu external tables, <codeph>ALTER TABLE ...
+      <p> 
+        For Impala-Kudu external tables, <codeph>ALTER TABLE ...
           RENAME</codeph> renames just the Impala table. To change the Kudu
         table that an Impala external table points to, use <codeph>ALTER TABLE
             <varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' =
             '<varname>different_kudu_table_name</varname>')</codeph>. The
-        underlying Kudu table must already exist. </p>
+        underlying Kudu table must already exist. 
+      </p>
       <p>
         In practice, external tables are typically used to access underlying
         Kudu tables that were created outside of Impala, that is, through the
@@ -323,9 +323,7 @@ under the License.
         distribution of values within each column, is the same for Kudu tables
         as for HDFS-backed tables.
       </p>
-      <p
-        conref="../shared/impala_common.xml#common/kudu_internal_external_tables"
-      />
+      <p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
     </conbody>
   </concept>