You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by 李响 <wa...@gmail.com> on 2015/09/15 18:07:10 UTC
Out of memory when Pig LEFT OUTER JOIN using replicated with a large
input file
Hi all,
I used the following in the project
JOIN a1 BY xxx LEFT OUTER, a2 BY xxxx USING 'replicated'
after loading a large file into a2, I hit out-of-memory.
The Pig Latin doc says that the replidated join is to put the right-hand
side table into the memory for each mapper, allowing the join computed
without reducers.
1. May I resolve this by increaing the heap size ? is it
mapred.child.java.opts ?
2. As the input file is becoming larger and larger, I think the increasing
of heap mem is not a long term solution. Can I use other operators to
refactor that LEFT OUTER JOIN ? Does anyone has experience on it?
3. Or any other suggestions ?
Thanks @_@
--
李响
E-mail :waterlx@gmail.com