You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/15 23:51:01 UTC

[jira] [Commented] (IMPALA-9092) Fix "show create table" tests on USE_CDP_HIVE=true to account for HIVE-22158

    [ https://issues.apache.org/jira/browse/IMPALA-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996902#comment-16996902 ] 

ASF subversion and git services commented on IMPALA-9092:
---------------------------------------------------------

Commit 6ebea33a9d249fc0097746c21f04d977fdeaa13c in impala's branch refs/heads/master from Vihang Karajgaonkar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6ebea33 ]

IMPALA-9092: Add support for creating external Kudu table

In HMS-3 the translation layer converts a managed kudu table into an
external kudu table and adds additional table property
'external.table.purge' to 'true'. This means any installation which
is using HMS-3 (or a Hive version which has HIVE-22158) will always
create Kudu tables as external tables. This is problematic since the
output of show create table will now be different and may confuse
the users.

In order to improve the user experience of such synchronized tables
(external tables with external.table.purge property set to true),
this patch adds support in Impala to create
external Kudu tables. Previous versions of Impala disallowed
creating a external Kudu table if the Kudu table did not exist.
After this patch, Impala will check if the Kudu table exists and if
it does not it will create a Kudu table based on the schema provided
in the create table statement. The command will error out if the Kudu
table already exists. However, this applies to only the synchronized
tables. Previous way to create a pure external table behaves the
same.

Following syntax of creating a synchronized table is now allowed:

CREATE EXTERNAL TABLE foo (
  id int PRIMARY KEY,
  name string)
PARTITION BY HASH PARTITIONS 8
STORED AS KUDU
TBLPROPERTIES ('external.table.purge'='true')

The syntax is very similar to creating a managed table, except for
the EXTERNAL keyword and additional table property. A synchronized
table will behave similar to managed Kudu tables (drops and renames
are allowed). The output of show create table on a synchronized
table will display the full column and partition spec similar to the
managed tables.

Testing:
1. After the CDP version bump all of the existing Kudu tables now
create synchronized tables so there is good coverage there.
2. Added additional tests which create synchronized tables and
compares the show create table output.
3. Ran exhaustive tests with both CDP and CDH builds.

Change-Id: I76f81d41db0cf2269ee1b365857164a43677e14d
Reviewed-on: http://gerrit.cloudera.org:8080/14750
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Fix "show create table" tests on USE_CDP_HIVE=true to account for HIVE-22158
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-9092
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9092
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Joe McDonnell
>            Assignee: Vihang Karajgaonkar
>            Priority: Blocker
>             Fix For: Impala 3.4.0
>
>
> Hive changed behavior with HIVE-22158 so that only transactional tables are considered managed and all other are considered external. This means that a regular "create table" will result in an external table with table properties of 'TRANSLATED_TO_EXTERNAL'='TRUE', 'external.table.purge'='TRUE'. This breaks our tests that rely on "show create table", because the table is newly external and has extra table properties. For example:
> {noformat}
> query_test/test_kudu.py:842: in test_primary_key_and_distribution
>     db=cursor.conn.db_name, kudu_addr=KUDU_MASTER_HOSTS))
> query_test/test_kudu.py:824: in assert_show_create_equals
>     assert cursor.fetchall()[0][0] == \
> E   assert "CREATE EXTER...='localhost')" == "CREATE TABLE ...='localhost')"
> E     - CREATE EXTERNAL TABLE testshowcreatetable_15312_ggn1hk.nvbpxfuxze
> E     ?        ---------
> E     + CREATE TABLE testshowcreatetable_15312_ggn1hk.nvbpxfuxze (
> E     ?                                                         ++
> E     +   c INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> E     +   PRIMARY KEY (c)
> E     + )
> E     + PARTITION BY HASH (c) PARTITIONS 3
> E       STORED AS KUDU
> E     - TBLPROPERTIES ('TRANSLATED_TO_EXTERNAL'='TRUE', 'external.table.purge'='TRUE', 'kudu.master_addresses'='localhost')
> E     + TBLPROPERTIES ('kudu.master_addresses'='localhost'){noformat}
> We need to decide on the right behavior for "show create table" and update the tests. 
> For Kudu tables, tables with TRANSLATED_TO_EXTERNAL=true and external.table.purge=TRUE should be equivalent to a non-external Kudu table, and we can just detect this case and generate the same SQL as before.
> Other cases may need new logic. I think it makes sense to also address other tests due to MANAGED vs EXTERNAL distinction or extra table properties with this JIRA. Here is a list of tests that seem to have this problem:
> {noformat}
> metadata/test_ddl.py TestDdlStatements.test_create_alter_tbl_properties
> metadata/test_show_create_table.py *
> query_test/test_kudu.py TestShowCreateTable*
> org.apache.impala.catalog.CatalogTest.testCreateTableMetadata
> org.apache.impala.catalog.local.LocalCatalogTest.testKuduTable{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org