You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ryan LeCompte <le...@gmail.com> on 2009/11/04 19:13:14 UTC
HIVE-878
Hello all,
Can anyone describe the implications of the following bug? Would most
general queries that use GROUP BY produce erroneous results due to
this? I know it's fixed in 0.4.1-rc2, which I've just upgraded to.
Thanks,
Ryan
HIVE-878. Update the hash table entry before flushing in Group By
hash aggregation (Zheng Shao via namit)
Re: HIVE-878
Posted by Zheng Shao <zs...@gmail.com>.
Hi Ryan,
This bug will affect GroupBy results in the following conditions:
1. Map-side aggregation is turned on.
2. The number of different keys in the map-side aggregation is very big and
goes out of the capacity of the hash table.
3. A new key comes and asks for a new entry in the Hashtable, when it goes
out of the capacity
4. The hashtable flushes 10% of its size and unfortunately the new entry got
flushed.
Most general GROUP BY queries may or may not satisfy all 4 conditions, but
if it matches, the result will be incorrect.
Zheng
On Wed, Nov 4, 2009 at 11:13 AM, Ryan LeCompte <le...@gmail.com> wrote:
> Hello all,
>
> Can anyone describe the implications of the following bug? Would most general queries that use GROUP BY produce erroneous results due to this? I know it's fixed in 0.4.1-rc2, which I've just upgraded to.
>
> Thanks,
> Ryan
>
> HIVE-878. Update the hash table entry before flushing in Group By
> hash aggregation (Zheng Shao via namit)
>
>
>
>
--
Yours,
Zheng