You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Mehant Baid <ba...@gmail.com> on 2013/10/07 21:04:43 UTC

Bug in map join optimization causing "OutOfMemory" error

Hey Folks,

We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The 
problem seems to be in CommonJoinResolver.java (processCurrentTask()), 
in this function we try and convert a map-reduce join to a map join if 
'n-1' tables involved in a 'n' way join have a size below a certain 
threshold.

If the tables are maintained by hive then we have accurate sizes of each 
table and can apply this optimization but if the tables are created 
using storage handlers, HBaseStorageHanlder in our case then the size is 
set to be zero. Due to this we assume that we can apply the optimization 
and convert the map-reduce join to a map join. So we build a in-memory 
hash table for all the keys, since our table created using the storage 
handler is large, it does not fit in memory and we hit the error.

Should I open a JIRA for this? One way to fix this is to set the size of 
the table (created using storage handler) to be equal to the map join 
threshold. This way the table would be selected as the big table and we 
can proceed with the optimization if other tables in the join have size 
below the threshold. If we have multiple big tables then the 
optimization would be turned off.

Thanks
Mehant

Re: Bug in map join optimization causing "OutOfMemory" error

Posted by Brock Noland <br...@cloudera.com>.
Hi,

Thank you for the report!

Can you open a JIRA for this issue? It sounds like a bug.

Brock


On Fri, Nov 1, 2013 at 2:23 AM, Mehant Baid <ba...@gmail.com> wrote:

> Hey Folks,
>
> Could you please take a look at the below problem. We are hitting
> OutOfMemoryErrors while joining tables that are not managed by Hive.
>
> Would appreciate any feedback.
>
> Thanks
> Mehant
>
> On 10/7/13 12:04 PM, Mehant Baid wrote:
>
>> Hey Folks,
>>
>> We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The
>> problem seems to be in CommonJoinResolver.java (processCurrentTask()), in
>> this function we try and convert a map-reduce join to a map join if 'n-1'
>> tables involved in a 'n' way join have a size below a certain threshold.
>>
>> If the tables are maintained by hive then we have accurate sizes of each
>> table and can apply this optimization but if the tables are created using
>> storage handlers, HBaseStorageHanlder in our case then the size is set to
>> be zero. Due to this we assume that we can apply the optimization and
>> convert the map-reduce join to a map join. So we build a in-memory hash
>> table for all the keys, since our table created using the storage handler
>> is large, it does not fit in memory and we hit the error.
>>
>> Should I open a JIRA for this? One way to fix this is to set the size of
>> the table (created using storage handler) to be equal to the map join
>> threshold. This way the table would be selected as the big table and we can
>> proceed with the optimization if other tables in the join have size below
>> the threshold. If we have multiple big tables then the optimization would
>> be turned off.
>>
>> Thanks
>> Mehant
>>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: Bug in map join optimization causing "OutOfMemory" error

Posted by Mehant Baid <ba...@gmail.com>.
Hey Folks,

Could you please take a look at the below problem. We are hitting 
OutOfMemoryErrors while joining tables that are not managed by Hive.

Would appreciate any feedback.

Thanks
Mehant
On 10/7/13 12:04 PM, Mehant Baid wrote:
> Hey Folks,
>
> We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The 
> problem seems to be in CommonJoinResolver.java (processCurrentTask()), 
> in this function we try and convert a map-reduce join to a map join if 
> 'n-1' tables involved in a 'n' way join have a size below a certain 
> threshold.
>
> If the tables are maintained by hive then we have accurate sizes of 
> each table and can apply this optimization but if the tables are 
> created using storage handlers, HBaseStorageHanlder in our case then 
> the size is set to be zero. Due to this we assume that we can apply 
> the optimization and convert the map-reduce join to a map join. So we 
> build a in-memory hash table for all the keys, since our table created 
> using the storage handler is large, it does not fit in memory and we 
> hit the error.
>
> Should I open a JIRA for this? One way to fix this is to set the size 
> of the table (created using storage handler) to be equal to the map 
> join threshold. This way the table would be selected as the big table 
> and we can proceed with the optimization if other tables in the join 
> have size below the threshold. If we have multiple big tables then the 
> optimization would be turned off.
>
> Thanks
> Mehant