You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/07/22 20:12:17 UTC
[impala] branch master updated: IMPALA-8727: [DOCS] Impala-side
changes for Kudu HMS integration
This is an automated email from the ASF dual-hosted git repository.
arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 5466cf6 IMPALA-8727: [DOCS] Impala-side changes for Kudu HMS integration
5466cf6 is described below
commit 5466cf6344af893521b093915aba25fde6b13d4e
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Mon Jul 1 17:05:16 2019 -0700
IMPALA-8727: [DOCS] Impala-side changes for Kudu HMS integration
Change-Id: Ieec79ac0bbb860c6394a3bf0617b285a7d23ca9e
Reviewed-on: http://gerrit.cloudera.org:8080/13776
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
---
docs/shared/impala_common.xml | 54 ++++++++++++++++++++-----------------
docs/topics/impala_kudu.xml | 63 +++++++++++++++++++++++++++----------------
docs/topics/impala_tables.xml | 32 +++++++++++-----------
3 files changed, 85 insertions(+), 64 deletions(-)
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 15972b6..92419ee 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -2102,17 +2102,18 @@ SELECT udf_created_by_hive(c1) FROM ...
<codeph>Location</codeph> field displays the path of the table directory as an HDFS URI.
</p>
- <p id="switch_internal_external_table">
- You can switch a table from internal to external, or from external to internal, by using
- the <codeph>ALTER TABLE</codeph> statement:
-<codeblock xml:space="preserve">
+ <p id="switch_internal_external_table"> You can switch a table from
+ internal to external, or from external to internal, by using the
+ <codeph>ALTER TABLE</codeph> statement:
+ <codeblock xml:space="preserve">
-- Switch a table from internal to external.
ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('EXTERNAL'='TRUE');
-- Switch a table from external to internal.
ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('EXTERNAL'='FALSE');
-</codeblock>
- </p>
+</codeblock>If
+ the Kudu service is integrated with the Hive Metastore, the above
+ operations are not supported.</p>
<!-- The data to show sensible output from these queries is in the TPC-DS schema 'CUSTOMER' table.
If you want to show real output, add a LIMIT 5 or similar clause to each query to avoid
@@ -4536,12 +4537,14 @@ sudo pip-python install ssl</codeblock>
attribute to that column.
</p>
- <p id="kudu_metadata_intro" rev="kudu">
- Much of the metadata for Kudu tables is handled by the underlying storage layer. Kudu
- tables have less reliance on the metastore database, and require less metadata caching
- on the Impala side. For example, information about partitions in Kudu tables is managed
- by Kudu, and Impala does not cache any block locality metadata for Kudu tables.
- </p>
+ <p id="kudu_metadata_intro" rev="kudu">By default, much of the metadata
+ for Kudu tables is handled by the underlying storage layer. Kudu tables
+ have less reliance on the Metastore database, and require less metadata
+ caching on the Impala side. For example, information about partitions in
+ Kudu tables is managed by Kudu, and Impala does not cache any block
+ locality metadata for Kudu tables. If the Kudu service is not integrated
+ with the Hive Metastore, Impala will manage Kudu table metadata in the
+ Hive Metastore.</p>
<p id="kudu_metadata_details" rev="kudu">
The <codeph>REFRESH</codeph> and <codeph>INVALIDATE METADATA</codeph> statements are
@@ -4553,22 +4556,25 @@ sudo pip-python install ssl</codeblock>
the Kudu table schema, such as adding or dropping a column.
</p>
- <p id="kudu_internal_external_tables">
- The distinction between internal and external tables has some special details for Kudu
- tables. Tables created entirely through Impala are internal tables. The table name as
- represented within Kudu includes notation such as an <codeph>impala::</codeph> prefix
- and the Impala database name. External Kudu tables are those created by a non-Impala
- mechanism, such as a user application calling the Kudu APIs. For these tables, the
- <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets you establish a mapping from Impala
- to the existing Kudu table:
-<codeblock>
+ <p id="kudu_internal_external_tables"> If the Kudu service is not
+ integrated with the Hive Metastore, the distinction between internal and
+ external tables has some special details for Kudu tables. Tables created
+ entirely through Impala are internal tables. The table name as
+ represented within Kudu includes notation such as an
+ <codeph>impala::</codeph> prefix and the Impala database name.
+ External Kudu tables are those created by a non-Impala mechanism, such
+ as a user application calling the Kudu APIs. For these tables, the
+ <codeph>CREATE EXTERNAL TABLE</codeph> syntax lets you establish a
+ mapping from Impala to the existing Kudu table:
+ <codeblock>
CREATE EXTERNAL TABLE impala_name STORED AS KUDU
TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
</codeblock>
- External Kudu tables differ in one important way from other external tables: adding or
- dropping a column or range partition changes the data in the underlying Kudu table, in
- contrast to an HDFS-backed external table where existing data files are left untouched.
</p>
+ <p> External Kudu tables differ in one important way from other external
+ tables: adding or dropping a column or range partition changes the data
+ in the underlying Kudu table, in contrast to an HDFS-backed external
+ table where existing data files are left untouched.</p>
<p id="kudu_sentry_limitations" rev="IMPALA-4000">
Access to Kudu tables must be granted to and revoked from roles with the following
diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
index 2e0f364..ec4be37 100644
--- a/docs/topics/impala_kudu.xml
+++ b/docs/topics/impala_kudu.xml
@@ -1109,24 +1109,19 @@ select microseconds,
<title>How Impala Handles Kudu Metadata</title>
<conbody>
-
+ <note>This section only applies the Kudu services that are not
+ integrated with the Hive Metastore (HMS).</note>
<p conref="../shared/impala_common.xml#common/kudu_metadata_intro"/>
<p conref="../shared/impala_common.xml#common/kudu_metadata_details"/>
-
- <p>
- Because Kudu manages the metadata for its own tables separately from the metastore
- database, there is a table name stored in the metastore database for Impala to use,
- and a table name on the Kudu side, and these names can be modified independently
- through <codeph>ALTER TABLE</codeph> statements.
- </p>
-
- <p>
- To avoid potential name conflicts, the prefix <codeph>impala::</codeph>
- and the Impala database name are encoded into the underlying Kudu
- table name:
- </p>
-
-<codeblock><![CDATA[
+ <p> Because Kudu manages the metadata for its own tables separately from
+ the metastore database, there is a table name stored in the metastore
+ database for Impala to use, and a table name on the Kudu side, and
+ these names can be modified independently through <codeph>ALTER
+ TABLE</codeph> statements. </p>
+ <p> To avoid potential name conflicts, the prefix
+ <codeph>impala::</codeph> and the Impala database name are encoded
+ into the underlying Kudu table name: </p>
+ <codeblock><![CDATA[
create database some_database;
use some_database;
@@ -1138,18 +1133,40 @@ describe formatted table_name_demo;
kudu.table_name | impala::some_database.table_name_demo
]]>
</codeblock>
-
- <p>
- See <xref keyref="kudu_tables"/> for examples of how to change the name of
- the Impala table in the metastore database, the name of the underlying Kudu
- table, or both.
- </p>
-
+ <p> See <xref keyref="kudu_tables"/> for examples of how to change the
+ name of the Impala table in the metastore database, the name of the
+ underlying Kudu table, or both. </p>
</conbody>
</concept>
</concept>
+ <concept id="concept_y4k_x1m_j3b">
+ <title>Working with Kudu Integrated with Hive Metastore</title>
+ <conbody>
+ <p>Starting from Kudu 1.10 and Impala 3.3, Impala supports Kudu services
+ integrated with the Hive Metastore (HMS). See <xref
+ href="https://kudu.apache.org/docs/hive_metastore.html#hive_metastore"
+ format="html" scope="external">the HMS integration
+ documentation</xref> for more details on Kudu’s Hive Metastore
+ integration.</p>
+ <p>The following are some of the changes you need to consider when working
+ with Kudu services integrated with the HMS.<ul>
+ <li> When Kudu is integrated with the Hive Metastore, Impala must be
+ configured to use the same HMS as Kudu.</li>
+ <li> Since there may be no one-to-one mapping between Kudu tables and
+ external tables, only internal tables are automatically
+ synchronized. </li>
+ <li>When you create a table in Kudu, Kudu will create an HMS entry for
+ that table with the internal table type.</li>
+ <li>When the Kudu service is integrated with the HMS, internal table
+ entries will be created automatically in the HMS when tables are
+ created in Kudu without Impala. To access these tables through
+ Impala, run <codeph>INVALIDATE METADATA</codeph> statement so Impala
+ picks up the latest metadata.</li>
+ </ul></p>
+ </conbody>
+ </concept>
<concept id="kudu_etl">
diff --git a/docs/topics/impala_tables.xml b/docs/topics/impala_tables.xml
index b2b6a71..978dd3c 100644
--- a/docs/topics/impala_tables.xml
+++ b/docs/topics/impala_tables.xml
@@ -284,28 +284,28 @@ under the License.
</metadata>
</prolog>
<conbody>
- <p>
- Tables stored in Apache Kudu are treated specially, because Kudu
- manages its data independently of HDFS files. Some information about the
- table is stored in the metastore database for use by Impala. Other table
- metadata is managed internally by Kudu.
- </p>
- <p>
- When you create a Kudu table through Impala, the table is assigned an
- internal Kudu table name of the form
+ <p> By default, tables stored in Apache Kudu are treated specially,
+ because Kudu manages its data independently of HDFS files. </p>
+ <p>All metadata that Impala needs is stored in the HMS.</p>
+ <p> When Kudu is not integrated with the HMS, when you create a Kudu table
+ through Impala, the table is assigned an internal Kudu table name of the
+ form
<codeph>impala::<varname>db_name</varname>.<varname>table_name</varname></codeph>.
You can see the Kudu-assigned name in the output of <codeph>DESCRIBE
FORMATTED</codeph>, in the <codeph>kudu.table_name</codeph> field of
- the table properties.
+ the table properties. </p>
+ <p>
+ For Impala-Kudu managed tables, <codeph>ALTER TABLE ...
+ RENAME</codeph> renames both the Impala and the Kudu table.
</p>
- <p> For Impala-Kudu managed tables, <codeph>ALTER TABLE ...
- RENAME</codeph> renames both the Impala and the Kudu table. </p>
- <p> For Impala-Kudu external tables, <codeph>ALTER TABLE ...
+ <p>
+ For Impala-Kudu external tables, <codeph>ALTER TABLE ...
RENAME</codeph> renames just the Impala table. To change the Kudu
table that an Impala external table points to, use <codeph>ALTER TABLE
<varname>impala_name</varname> SET TBLPROPERTIES('kudu.table_name' =
'<varname>different_kudu_table_name</varname>')</codeph>. The
- underlying Kudu table must already exist. </p>
+ underlying Kudu table must already exist.
+ </p>
<p>
In practice, external tables are typically used to access underlying
Kudu tables that were created outside of Impala, that is, through the
@@ -323,9 +323,7 @@ under the License.
distribution of values within each column, is the same for Kudu tables
as for HDFS-backed tables.
</p>
- <p
- conref="../shared/impala_common.xml#common/kudu_internal_external_tables"
- />
+ <p conref="../shared/impala_common.xml#common/kudu_internal_external_tables"/>
</conbody>
</concept>