You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (Jira)" <ji...@apache.org> on 2020/06/02 11:19:00 UTC

[jira] [Assigned] (HIVE-22371) CTAS not working with non-ACID managed tables

     [ https://issues.apache.org/jira/browse/HIVE-22371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sankar Hariappan reassigned HIVE-22371:
---------------------------------------

    Assignee: Nishant Goel

> CTAS not working with non-ACID managed tables
> ---------------------------------------------
>
>                 Key: HIVE-22371
>                 URL: https://issues.apache.org/jira/browse/HIVE-22371
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Jaechang Kim
>            Assignee: Nishant Goel
>            Priority: Major
>
> I used Hive commit HIVE-21344 (f16509a5c9187f592c48c253ee001fc3a5e0d508) in the master branch, which was committed on 12 Oct.
> When I submit a query below, the query was finished without any errors.
> {code:sql}
> create table call_center
> stored as orc 
>  as select * from tpcds_text_2.call_center;
> {code}
> However, "select count( * ) from call_center" returned 0, and data in HDFS looks strange.
>  * Two tables were created, one in the warehouse directory and another in the external warehouse directory.
>  * Table `call_center` in the external warehouse is empty.
> {code:java}
>  > hdfs dfs -du -h $WAREHOUSE_PATH
>  5.0 K 14.9 K $WAREHOUSE_PATH/call_center
>  0 0 $WAREHOUSE_PATH/tpcds_text_2.db
> > hdfs dfs -du -h $EXTERNAL_WAREHOUSE_PATH
>  2.1 G 2.1 G $EXTERNAL_WAREHOUSE_PATH/2
>  0 0 $EXTERNAL_WAREHOUSE_PATH/call_center
> {code}
> After a few hours of digging, I guess this bug was introduced in HIVE-22158, which creates every non-ACID managed table in the external warehouse directory by default. In the example above, call_center is intended as a managed table, but not explicitly specified as ACID. Hence, it should created in the external warehouse directory.
> However, the table call_center created in the external warehouse directory is empty, while another non-empty table of the same name is created in the warehouse directory. This is because in the current implementation, the (buggy) compiled query plan proceeds as follows:
> 1. Write data to a temporary directory
>  2. Move the data to the warehouse directory ($WAREHOUSE_PATH/call_center)
>  3. Create a table using data in the warehouse directory
> Without the bug, step 2 would move the data to the external warehouse directory, and step 3 would create a table using the data in the external warehouse directory. The crux of the problem is that the query compiler checks only whether the query does not include the keyword "external" or not. In other words, the query compiler should also be aware of the changes made in HIVE-22158 and updated accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)