You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Azuryy Yu <az...@gmail.com> on 2014/07/16 05:36:00 UTC

Hive insert overwrite strange behavior

Hi,

I think the following two sql have the same effect.

1) hive -e "insert overwrite local directory 'output' select * from test
limit 10;"
2) hive -e "select * from test limit 10;" > output


but the second one read HDFS directly only takes two seconds, but the first
one submit a MR job, which has one reduce.

why there is such difference? Thanks.

Re: Hive insert overwrite strange behavior

Posted by Lianhui Wang <li...@gmail.com>.
the operator plan of two sql is different.first one:
TableScanOperator--SelectOperator--ReduceOutputOperator--FileSinkOperator--MoveOperator
second one:TableScanOperator--SelectOperator--FetchOperator
in second one,FetchOperator work on client and directly output to local
directory.
but first one, result sink to tmp hdfs and then move tmp hdfs to local
directory.
you can add explain to to sql and then look at operator plan of sql.
example:
explain insert overwrite local directory 'output' select * from test limit
10;


2014-07-16 11:36 GMT+08:00 Azuryy Yu <az...@gmail.com>:

> Hi,
>
> I think the following two sql have the same effect.
>
> 1) hive -e "insert overwrite local directory 'output' select * from test
> limit 10;"
> 2) hive -e "select * from test limit 10;" > output
>
>
> but the second one read HDFS directly only takes two seconds, but the first
> one submit a MR job, which has one reduce.
>
> why there is such difference? Thanks.
>



-- 
thanks

王联辉(Lianhui Wang)
blog; http://blog.csdn.net/lance_123
兴趣方向:数据库,分布式,数据挖掘,编程语言,互联网技术等