You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "r7raul1984@163.com" <r7...@163.com> on 2015/04/22 10:45:41 UTC

hive on tez optimize MRR to MR?


select userid,count(*) from u_data group by userid order by userid    will product MRR.

I think when the result of  userid,count(*) is small(one reduce can process the result) . This query plan can optimize to MR ?




r7raul1984@163.com

Re: hive on tez optimize MRR to MR?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

To prevent bad reducer merging, the reducer merging only kicks in when the
optimizer thinks it gets a perf boost.

MR -> MRR is not a big win when it comes Tez, due to container-reuse -
going wide on the large cardinality in case of missing map-side
aggregation will be safer.

If hive.map.aggr=true and the userid set fits within memory, then smushing
the reducers would be nicer.

To reset the wide-narrow checks, do

set hive.optimize.reducededuplication.min.reducer=1;

But be aware that it will fail (I¹ve seen full disks) as you scale upwards
to the 10+ Tb cases.

Cheers,
Gopal

On 4/22/15, 2:15 PM, "r7raul1984@163.com" <r7...@163.com> wrote:

>
>
>select userid,count(*) from u_data group by userid order by userid
>will product MRR.
>
>I think when the result of  userid,count(*) is small(one reduce can
>process the result) . This query plan can optimize to MR ?
>
>
>
>
>r7raul1984@163.com