You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Marta Kuczora (Jira)" <ji...@apache.org> on 2020/11/10 19:19:00 UTC

[jira] [Commented] (HIVE-24336) Turn off the direct insert for EXPLAIN ANALYZE queries

    [ https://issues.apache.org/jira/browse/HIVE-24336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229456#comment-17229456 ] 

Marta Kuczora commented on HIVE-24336:
--------------------------------------

Pushed to master. Thanks a lot [~szita] for the review!

> Turn off the direct insert for EXPLAIN ANALYZE queries
> ------------------------------------------------------
>
>                 Key: HIVE-24336
>                 URL: https://issues.apache.org/jira/browse/HIVE-24336
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we do an EXPLAIN ANALYZE for an INSERT query with direct insert on, the new files will be created in the table directory, and they won't be cleaned-up when the EXPLAIN query is finished.
> Example: 
> {noformat}
> create table analyze_table (id int) stored as orc tblproperties('transactional'='true');
> explain analyze insert into analyze_table values (1),(2),(3),(4);
> select * from analyze_table;
> 1
> 2
> 3
> 4
> Time taken: 0.1 seconds, Fetched: 4 row(s)
> The result should be empty after the explain command.
> {noformat}
> An EXPLAIN ANALYZE query will execute the actual query and the files will be created within the staging directory, but the MoveTask won't move them to the final location. So when the EXPLAIN ANALYZE query is finished, the staging directory will be deleted, and the table directory will be the same as before the EXPLAIN query. But with direct insert on the files will be written into the table directory, so an additional cleanup would be necessary in order to restore the files within the table directory to the state before the EXPLAIN ANALYZE query. This could be avoided by turning off the direct insert for an EXPLAIN ANALYZE query. Since the direct insert improves the performance by eliminating the file movements within the MoveTask, but it has no affect on the query execution plan it can be safely turned off for explain queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)