You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Kim Jaechang (Jira)" <ji...@apache.org> on 2019/10/18 15:30:00 UTC

[jira] [Created] (HIVE-22371) CTAS not working with non-ACID managed tables

Kim Jaechang created HIVE-22371:
-----------------------------------

             Summary: CTAS not working with non-ACID managed tables
                 Key: HIVE-22371
                 URL: https://issues.apache.org/jira/browse/HIVE-22371
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 4.0.0
            Reporter: Kim Jaechang


I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in the master branch, which was committed on 12 Oct.

When I submit a query below, the query was finished without any errors.
{code:sql}
create table call_center
stored as orc 
 as select * from tpcds_text_2.call_center;
{code}
However, "select count( * ) from call_center" returned 0, and data in HDFS looks strange.
 * Two tables were created, one in the warehouse directory and another in the external warehouse directory.
 * Table `call_center` in the external warehouse is empty.

{code:java}
 > hdfs dfs -du -h $WAREHOUSE_PATH
 5.0 K 14.9 K $WAREHOUSE_PATH/call_center
 0 0 $WAREHOUSE_PATH/tpcds_text_2.db

> hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH
 2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2
 0 0 $EXTERNAL_WAREHOUSE_PATH/call_center
{code}
After a few hours of digging, I guess this bug was introduced in HIVE-22158, which creates every non-ACID managed table in the external warehouse directory by default. In the example above, call_center is intended as a managed table, but not explicitly specified as ACID. Hence, it should created in the external warehouse directory.

However, the table call_center created in the external warehouse directory is empty, while another non-empty table of the same name is created in the warehouse directory. This is because in the current implementation, the (buggy) compiled query plan proceeds as follows:

1. Write data to a temporary directory
 2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center)
 3. Create a table using data in the warehouse directory

Without the bug, step 2 would move the data to the external warehouse directory, and step 3 would create a table using the data in the external warehouse directory. The crux of the problem is that the query compiler checks only whether the query does not include the keyword "external" or not. In other words, the query compiler should also be aware of the changes made in HIVE-22158 and updated accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)