You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Raghunath, Ranjith" <Ra...@usaa.com> on 2012/05/23 02:43:07 UTC
Map side aggregations
I have the parameter hive.map.aggr set to true. However, when I look at the counters associated with the map tasks I notice the following "Combine input records 0". I am interpreting this as a failure to perform the map side aggregation. Is that accurate? Is this option not working in hive 0.7.1?
Thanks,
Ranjith
Re: Map side aggregations
Posted by Ranjith <ra...@gmail.com>.
Thanks philip.
Thanks,
Ranjith
On May 23, 2012, at 4:15 AM, Philip Tromans <ph...@gmail.com> wrote:
> Hi Ranjith,
>
> I haven't checked the code (so this might not be true), but I think that the map side aggregation stuff uses it's own hash map within the map phase to do the aggregation, instead of using a combiner, so you wouldn't expect to see any combine input records. Have a look for parameters like hive.groupby.mapaggr.checkinterval, and the associated documentation will explain how it all works.
>
> Cheers,
>
> Phil.
>
> On 23 May 2012 02:44, Ranjith <ra...@gmail.com> wrote:
> Thanks Matt. I am not performing a join so does that matter? What does this local task do?
>
> Thanks,
> Ranjith
>
> On May 22, 2012, at 8:17 PM, "Tucker, Matt" <Ma...@disney.com> wrote:
>
>> Try setting hive.auto.convert.join to true. The CLI will have a local task before it starts a map-reduce job on the cluster.
>>
>> Matt
>>
>>
>>
>> On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith" <Ra...@usaa.com> wrote:
>>
>>> I have the parameter hive.map.aggr set to true. However, when I look at the counters associated with the map tasks I notice the following “Combine input records 0”. I am interpreting this as a failure to perform the map side aggregation. Is that accurate? Is this option not working in hive 0.7.1?
>>>
>>> Thanks,
>>> Ranjith
>>>
>>>
>>>
>
Re: Map side aggregations
Posted by Philip Tromans <ph...@gmail.com>.
Hi Ranjith,
I haven't checked the code (so this might not be true), but I think that
the map side aggregation stuff uses it's own hash map within the map phase
to do the aggregation, instead of using a combiner, so you wouldn't expect
to see any combine input records. Have a look for parameters
like hive.groupby.mapaggr.checkinterval, and the associated documentation
will explain how it all works.
Cheers,
Phil.
On 23 May 2012 02:44, Ranjith <ra...@gmail.com> wrote:
> Thanks Matt. I am not performing a join so does that matter? What does
> this local task do?
>
> Thanks,
> Ranjith
>
> On May 22, 2012, at 8:17 PM, "Tucker, Matt" <Ma...@disney.com>
> wrote:
>
> Try setting hive.auto.convert.join to true. The CLI will have a local
> task before it starts a map-reduce job on the cluster.
>
> Matt
>
>
>
> On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith" <
> Ranjith.Raghunath1@usaa.com> wrote:
>
> I have the parameter hive.map.aggr set to true. However, when I look at
> the counters associated with the map tasks I notice the following “Combine
> input records 0”. I am interpreting this as a failure to perform the map
> side aggregation. Is that accurate? Is this option not working in hive
> 0.7.1?
>
> Thanks,
> Ranjith
>
>
>
>
>
Re: Map side aggregations
Posted by Ranjith <ra...@gmail.com>.
Thanks Matt. I am not performing a join so does that matter? What does this local task do?
Thanks,
Ranjith
On May 22, 2012, at 8:17 PM, "Tucker, Matt" <Ma...@disney.com> wrote:
> Try setting hive.auto.convert.join to true. The CLI will have a local task before it starts a map-reduce job on the cluster.
>
> Matt
>
>
>
> On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith" <Ra...@usaa.com> wrote:
>
>> I have the parameter hive.map.aggr set to true. However, when I look at the counters associated with the map tasks I notice the following “Combine input records 0”. I am interpreting this as a failure to perform the map side aggregation. Is that accurate? Is this option not working in hive 0.7.1?
>>
>> Thanks,
>> Ranjith
>>
>>
>>
Re: Map side aggregations
Posted by "Tucker, Matt" <Ma...@disney.com>.
Try setting hive.auto.convert.join to true. The CLI will have a local task before it starts a map-reduce job on the cluster.
Matt
On May 22, 2012, at 8:43 PM, "Raghunath, Ranjith" <Ra...@usaa.com>> wrote:
I have the parameter hive.map.aggr set to true. However, when I look at the counters associated with the map tasks I notice the following “Combine input records 0”. I am interpreting this as a failure to perform the map side aggregation. Is that accurate? Is this option not working in hive 0.7.1?
Thanks,
Ranjith