You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2022/01/04 22:35:00 UTC
[jira] [Created] (IMPALA-11069) Avoid copying Storage Desc Params 'path' on "CREATE TABLE ... LIKE" query.

Riza Suminto created IMPALA-11069:
-------------------------------------

             Summary: Avoid copying Storage Desc Params 'path' on "CREATE TABLE ... LIKE" query.
                 Key: IMPALA-11069
                 URL: https://issues.apache.org/jira/browse/IMPALA-11069
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Riza Suminto
         Attachments: desc_table.txt

When running "CREATE TABLE ... LIKE" with Impala where the original table was created by Spark, Impala will fully copy the Storage Desc Params of the original table. The 'path' value in new table will mistakenly still refer to the path of the original table.

Here are the steps to reproduce the issue:


1) Create a datasource table in Spark and save it as an external table ('testTable_ext'):
{noformat}
scala> val df = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000")).toDF("language", "users_count")
scala> df.write.mode("append").option("path","/tmp/testtable_ext").saveAsTable("testTable_ext")
{noformat}
2) In Impala create a new table (using LIKE). This table expected to be empty just the schema is required to be copied:
{noformat}
[nightly7x-us-fd-1.nightly7x-us-fd.root.hwx.site:21050] default> create external table testtable_ext1 LIKE testTable_ext;
[nightly7x-us-fd-1.nightly7x-us-fd.root.hwx.site:21050] default> select * from testTable_ext1;
Query: select * from testTable_ext1
Query submitted at: 2021-12-21 08:32:45 (Coordinator: http://nightly7x-us-fd-1.nightly7x-us-fd.root.hwx.site:25000)
Query progress can be monitored at: http://nightly7x-us-fd-1.nightly7x-us-fd.root.hwx.site:25000/query_plan?query_id=734f6a48ff279070:c9907b8600000000
Fetched 0 row(s) in 0.11s
{noformat}
3) Now going back to Spark we will see the new table is not empty, but contains data from the original table:
{noformat}
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.8.7.2.13.0-218
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("select * from testtable_ext1").show(false)
21/12/21 08:36:36 WARN conf.HiveConf: HiveConf of name hive.server2.http.exclude.ciphersuites does not exist
21/12/21 08:36:36 WARN conf.HiveConf: HiveConf of name hive.server2.binary.include.ciphersuites does not exist
21/12/21 08:36:36 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = b7e81c13-8ace-4b86-b72d-caa7dc75db09
+--------+-----------+
|language|users_count|
+--------+-----------+
|Python  |100000     |
|Scala   |3000       |
|Java    |20000      |
+--------+-----------+
{noformat}
 

Attached desc_table.txt shows the description of the new table. There, Storage Desc Params 'path' is different from table properties 'Location'.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)