You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by 李响 <wa...@gmail.com> on 2015/09/15 18:07:10 UTC

Out of memory when Pig LEFT OUTER JOIN using replicated with a large input file

Hi all,

I used the following in the project

JOIN a1 BY xxx LEFT OUTER, a2 BY xxxx USING 'replicated'


after loading a large file into a2, I hit out-of-memory.

The Pig Latin doc says that the replidated join is to put the right-hand
side table into the memory for each mapper, allowing the join computed
without reducers.

1. May I resolve this by increaing the heap size ? is it
mapred.child.java.opts ?

2. As the input file is becoming larger and larger, I think the increasing
of heap mem is not a long term solution. Can I use other operators to
refactor that LEFT OUTER JOIN ?  Does anyone has experience on it?

3. Or any other suggestions ?

Thanks @_@

-- 

                                               李响

E-mail             ：waterlx@gmail.com