You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ryan LeCompte <le...@gmail.com> on 2009/11/04 19:13:14 UTC

HIVE-878

Hello all,

Can anyone describe the implications of the following bug? Would most
general queries that use GROUP BY produce erroneous results due to
this? I know it's fixed in 0.4.1-rc2, which I've just upgraded to.

Thanks,
Ryan

    HIVE-878. Update the hash table entry before flushing in Group By
    hash aggregation (Zheng Shao via namit)

Re: HIVE-878

Posted by Zheng Shao <zs...@gmail.com>.
Hi Ryan,

This bug will affect GroupBy results in the following conditions:
1. Map-side aggregation is turned on.
2. The number of different keys in the map-side aggregation is very big and
goes out of the capacity of the hash table.
3. A new key comes and asks for a new entry in the Hashtable, when it goes
out of the capacity
4. The hashtable flushes 10% of its size and unfortunately the new entry got
flushed.

Most general GROUP BY queries may or may not satisfy all 4 conditions, but
if it matches, the result will be incorrect.

Zheng

On Wed, Nov 4, 2009 at 11:13 AM, Ryan LeCompte <le...@gmail.com> wrote:

> Hello all,
>
> Can anyone describe the implications of the following bug? Would most general queries that use GROUP BY produce erroneous results due to this? I know it's fixed in 0.4.1-rc2, which I've just upgraded to.
>
> Thanks,
> Ryan
>
>     HIVE-878. Update the hash table entry before flushing in Group By
>     hash aggregation (Zheng Shao via namit)
>
>
>
>


-- 
Yours,
Zheng