You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jeff Zhang <zj...@gmail.com> on 2015/03/02 11:17:35 UTC

Where does hive do sampling in order by ?

Order by usually invoke 2 steps (sampling job and repartition job) but hive
only run one mr job for order by, so wondering when and where does hive do
sampling ? client side ?


-- 
Best Regards

Jeff Zhang

Re: Where does hive do sampling in order by ?

Posted by Xuefu Zhang <xz...@cloudera.com>.
there is no sampling for order by in Hive. Hive uses a single reducer for
order by (if you're talking about MR execution engine).

Hive on Spark is different for this, thought.

Thanks,
Xuefu

On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang <zj...@gmail.com> wrote:

> Order by usually invoke 2 steps (sampling job and repartition job) but
> hive only run one mr job for order by, so wondering when and where does
> hive do sampling ? client side ?
>
>
> --
> Best Regards
>
> Jeff Zhang
>