You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Rohini U <ro...@gmail.com> on 2012/03/21 20:34:00 UTC

Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Hi,

I have a pig script which does a simple GROUPing followed by couting and I
get this error.  My data is certaining not that big for it to cause this
out of memory error. Is there a chance that this is because of some bug ?
Did any one come across this kind of error before?

I am using pig 0.9.1 with hadoop 0.20.205

My script:
rawItems = LOAD 'in' as (item1, item2, item3, type, count);

grouped = GROUP rawItems BY (item1, item2, item3, type);

counts = FOREACH grouped {
                     selectedFields = FILTER rawItems  BY type="EMPLOYER";
                     GENERATE
                             FLATTEN(group) as (item1, item2, item3, type) ,
                              SUM(selectedFields.count) as count

              }

Stack Trace:

2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main): Error
running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Thanks
-Rohini

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Thanks.

On Fri, Mar 23, 2012 at 12:50 PM, Rohini U <ro...@gmail.com> wrote:

> Prashant,
>
> I just added the stackhere as comment to the opened jira.
>
>
> Thanks for the help.
>
> -Rohini
>
> On Fri, Mar 23, 2012 at 12:46 PM, Prashant Kommireddi
> <pr...@gmail.com>wrote:
>
> > Rohini, it's fine even if you could reply with the stacktrace here. I can
> > add it to JIRA.
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Rohini,
> > >
> > > Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
> > >
> > > Can you please post the stacktrace as a comment to it?
> > >
> > > Thanks,
> > > Prashant
> > >
> > >
> > > On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <jcoveney@gmail.com
> > >wrote:
> > >
> > >> Rohini,
> > >>
> > >> In the meantime, something like the following should work:
> > >>
> > >> aw = LOAD 'input' using MyCustomLoader();
> > >>
> > >> searches = FOREACH raw GENERATE
> > >>               day, searchType,
> > >>               FLATTEN(impBag) AS (adType, clickCount)
> > >>           ;
> > >>
> > >> searches_2 = foreach searches generate *, ( adType == 'type1' ?
> > clickCount
> > >> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
> > >> type2_clickCount;
> > >>
> > >> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
> > >> counts = FOREACH groupedSearches{
> > >>                GENERATE
> > >>                   FLATTEN(group) AS (day, searchType),
> > >>                   COUNT(searches) numSearches,
> > >>                   SUM(clickCount) AS clickCountPerSearchType,
> > >>                    SUM(searches_2. type1_clickCount) AS
> type1ClickCount,
> > >>                   SUM(searches_2. type2_clickCount) AS
> type2ClickCount;
> > >>       }
> > >> ;
> > >>
> > >> 2012/3/22 Rohini U <ro...@gmail.com>
> > >>
> > >> > Thanks Prashant,
> > >> > I am using Pig 0.9.1 and hadoop 0.20.205
> > >> >
> > >> > Thanks,
> > >> > Rohini
> > >> >
> > >> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
> > >> prash1784@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > This makes more sense, grouping and filter are on different
> > columns. I
> > >> > will
> > >> > > open a JIRA soon.
> > >> > >
> > >> > > What version of Pig and Hadoop are you using?
> > >> > >
> > >> > > Thanks,
> > >> > > Prashant
> > >> > >
> > >> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > Hi Prashant,
> > >> > > >
> > >> > > > Here is my script in full.
> > >> > > >
> > >> > > >
> > >> > > > raw = LOAD 'input' using MyCustomLoader();
> > >> > > >
> > >> > > > searches = FOREACH raw GENERATE
> > >> > > >                day, searchType,
> > >> > > >                FLATTEN(impBag) AS (adType, clickCount)
> > >> > > >            ;
> > >> > > >
> > >> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL
> 50;
> > >> > > > counts = FOREACH groupedSearches{
> > >> > > >                type1 = FILTER searches BY adType == 'type1';
> > >> > > >                type2 = FILTER searches BY adType == 'type2';
> > >> > > >                GENERATE
> > >> > > >                    FLATTEN(group) AS (day, searchType),
> > >> > > >                    COUNT(searches) numSearches,
> > >> > > >                    SUM(clickCount) AS clickCountPerSearchType,
> > >> > > >                    SUM(type1.clickCount) AS type1ClickCount,
> > >> > > >                    SUM(type2.clickCount) AS type2ClickCount;
> > >> > > >        }
> > >> > > > ;
> > >> > > >
> > >> > > > As you can see above, I am counting the counts by the day and
> > search
> > >> > type
> > >> > > > in clickCountPerSearchType and for each of them i need the
> counts
> > >> > broken
> > >> > > by
> > >> > > > the ad type.
> > >> > > >
> > >> > > > Thanks for your help!
> > >> > > > Thanks,
> > >> > > > Rohini
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> > >> > > > <pr...@gmail.com>wrote:
> > >> > > >
> > >> > > > > Hi Rohini,
> > >> > > > >
> > >> > > > > From your query it looks like you are already grouping it by
> > >> TYPE, so
> > >> > > not
> > >> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
> > >> "LOCATION"
> > >> > > and
> > >> > > > > vice-versa. Your output is already broken down by TYPE.
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Prashant
> > >> > > > >
> > >> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <rohini.u@gmail.com
> >
> > >> > wrote:
> > >> > > > >
> > >> > > > > > Thanks for the suggestion Prashant. However, that will not
> > work
> > >> in
> > >> > my
> > >> > > > > case.
> > >> > > > > >
> > >> > > > > > If I filter before the group and include the new field in
> > group
> > >> as
> > >> > > you
> > >> > > > > > suggested, I get the individual counts broken by the select
> > >> field
> > >> > > > > > critieria. However, I want the totals also without taking
> the
> > >> > select
> > >> > > > > fields
> > >> > > > > > into account. That is why I took the approach I described in
> > my
> > >> > > earlier
> > >> > > > > > emails.
> > >> > > > > >
> > >> > > > > > Thanks
> > >> > > > > > Rohini
> > >> > > > > >
> > >> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > >> > > > > prash1784@gmail.com
> > >> > > > > > >wrote:
> > >> > > > > >
> > >> > > > > > > Please pull your FILTER out of GROUP BY and do it earlier
> > >> > > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > >> > > > > > >
> > >> > > > > > > In this case, you could use a FILTER followed by a bincond
> > to
> > >> > > > > introduce a
> > >> > > > > > > new field "employerOrLocation", then do a group by and
> > include
> > >> > the
> > >> > > > new
> > >> > > > > > > field in the GROUP BY clause.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Prashant
> > >> > > > > > >
> > >> > > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <
> > rohini.u@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > My input data size is 9GB and I am using 20 machines.
> > >> > > > > > > >
> > >> > > > > > > > My grouped criteria has two cases so I want 1) counts by
> > the
> > >> > > > > criteria I
> > >> > > > > > > > have grouped 2) counts of the two inviduals cases in
> each
> > >> of my
> > >> > > > > group.
> > >> > > > > > > >
> > >> > > > > > > > So my script in detail is:
> > >> > > > > > > >
> > >> > > > > > > > counts = FOREACH grouped {
> > >> > > > > > > >                     selectedFields1 = FILTER rawItems
>  BY
> > >> > > > > > > type="EMPLOYER";
> > >> > > > > > > >                   selectedFields2 = FILTER rawItems  BY
> > >> > > > > > type="LOCATION";
> > >> > > > > > > >                      GENERATE
> > >> > > > > > > >                             FLATTEN(group) as (item1,
> > item2,
> > >> > > item3,
> > >> > > > > > > type) ,
> > >> > > > > > > >                               SUM(selectedFields1.count)
> > as
> > >> > > > > > > > selectFields1Count,
> > >> > > > > > > >                              SUM(selectedFields2.count)
> as
> > >> > > > > > > > selectFields2Count,
> > >> > > > > > > >                             COUNT(rawItems) as
> > >> > groupCriteriaCount
> > >> > > > > > > >
> > >> > > > > > > >              }
> > >> > > > > > > >
> > >> > > > > > > > Is there a way way to do this?
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> > >> > > > dvryaboy@gmail.com>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > you are not doing grouping followed by counting. You
> are
> > >> > doing
> > >> > > > > > grouping
> > >> > > > > > > > > followed by filtering followed by counting.
> > >> > > > > > > > > Try filtering before grouping.
> > >> > > > > > > > >
> > >> > > > > > > > > D
> > >> > > > > > > > >
> > >> > > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
> > >> > rohini.u@gmail.com
> > >> > > >
> > >> > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I have a pig script which does a simple GROUPing
> > >> followed
> > >> > by
> > >> > > > > > couting
> > >> > > > > > > > and
> > >> > > > > > > > > I
> > >> > > > > > > > > > get this error.  My data is certaining not that big
> > for
> > >> it
> > >> > to
> > >> > > > > cause
> > >> > > > > > > > this
> > >> > > > > > > > > > out of memory error. Is there a chance that this is
> > >> because
> > >> > > of
> > >> > > > > some
> > >> > > > > > > > bug ?
> > >> > > > > > > > > > Did any one come across this kind of error before?
> > >> > > > > > > > > >
> > >> > > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > >> > > > > > > > > >
> > >> > > > > > > > > > My script:
> > >> > > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type,
> > >> count);
> > >> > > > > > > > > >
> > >> > > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3,
> > type);
> > >> > > > > > > > > >
> > >> > > > > > > > > > counts = FOREACH grouped {
> > >> > > > > > > > > >                     selectedFields = FILTER rawItems
> >  BY
> > >> > > > > > > > type="EMPLOYER";
> > >> > > > > > > > > >                     GENERATE
> > >> > > > > > > > > >                             FLATTEN(group) as
> (item1,
> > >> > item2,
> > >> > > > > item3,
> > >> > > > > > > > > type) ,
> > >> > > > > > > > > >
>  SUM(selectedFields.count)
> > >> as
> > >> > > count
> > >> > > > > > > > > >
> > >> > > > > > > > > >              }
> > >> > > > > > > > > >
> > >> > > > > > > > > > Stack Trace:
> > >> > > > > > > > > >
> > >> > > > > > > > > > 2012-03-21 19:19:59,346 FATAL
> > >> > org.apache.hadoop.mapred.Child
> > >> > > > > > (main):
> > >> > > > > > > > > Error
> > >> > > > > > > > > > running child : java.lang.OutOfMemoryError: GC
> > overhead
> > >> > limit
> > >> > > > > > > exceeded
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > >> > > > > > > > > >        at
> > >> > > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > >
> > >> > > >
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > >> > > > > > > > > >        at
> > >> > > > > > >
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > >> > > > > > > > > >        at
> > >> > > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > >> > > > > > > > > >        at
> > >> > java.security.AccessController.doPrivileged(Native
> > >> > > > > > Method)
> > >> > > > > > > > > >        at
> > >> > javax.security.auth.Subject.doAs(Subject.java:396)
> > >> > > > > > > > > >        at
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > >> > > > > > > > > >        at
> > >> > org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks
> > >> > > > > > > > > > -Rohini
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > Regards
> > >> > > > > > > > -Rohini
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > **
> > >> > > > > > > > People of accomplishment rarely sat back & let things
> > >> happen to
> > >> > > > them.
> > >> > > > > > > They
> > >> > > > > > > > went out & happened to things - Leonardo Da Vinci
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Regards
> > >> > -Rohini
> > >> >
> > >> > --
> > >> > **
> > >> > People of accomplishment rarely sat back & let things happen to
> them.
> > >> They
> > >> > went out & happened to things - Leonardo Da Vinci
> > >> >
> > >>
> > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

Prashant,

I just added the stackhere as comment to the opened jira.


Thanks for the help.

-Rohini

On Fri, Mar 23, 2012 at 12:46 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Rohini, it's fine even if you could reply with the stacktrace here. I can
> add it to JIRA.
>
> Thanks,
> Prashant
>
> On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Rohini,
> >
> > Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
> >
> > Can you please post the stacktrace as a comment to it?
> >
> > Thanks,
> > Prashant
> >
> >
> > On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
> >
> >> Rohini,
> >>
> >> In the meantime, something like the following should work:
> >>
> >> aw = LOAD 'input' using MyCustomLoader();
> >>
> >> searches = FOREACH raw GENERATE
> >>               day, searchType,
> >>               FLATTEN(impBag) AS (adType, clickCount)
> >>           ;
> >>
> >> searches_2 = foreach searches generate *, ( adType == 'type1' ?
> clickCount
> >> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
> >> type2_clickCount;
> >>
> >> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
> >> counts = FOREACH groupedSearches{
> >>                GENERATE
> >>                   FLATTEN(group) AS (day, searchType),
> >>                   COUNT(searches) numSearches,
> >>                   SUM(clickCount) AS clickCountPerSearchType,
> >>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
> >>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
> >>       }
> >> ;
> >>
> >> 2012/3/22 Rohini U <ro...@gmail.com>
> >>
> >> > Thanks Prashant,
> >> > I am using Pig 0.9.1 and hadoop 0.20.205
> >> >
> >> > Thanks,
> >> > Rohini
> >> >
> >> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
> >> prash1784@gmail.com
> >> > >wrote:
> >> >
> >> > > This makes more sense, grouping and filter are on different
> columns. I
> >> > will
> >> > > open a JIRA soon.
> >> > >
> >> > > What version of Pig and Hadoop are you using?
> >> > >
> >> > > Thanks,
> >> > > Prashant
> >> > >
> >> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com>
> wrote:
> >> > >
> >> > > > Hi Prashant,
> >> > > >
> >> > > > Here is my script in full.
> >> > > >
> >> > > >
> >> > > > raw = LOAD 'input' using MyCustomLoader();
> >> > > >
> >> > > > searches = FOREACH raw GENERATE
> >> > > >                day, searchType,
> >> > > >                FLATTEN(impBag) AS (adType, clickCount)
> >> > > >            ;
> >> > > >
> >> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> >> > > > counts = FOREACH groupedSearches{
> >> > > >                type1 = FILTER searches BY adType == 'type1';
> >> > > >                type2 = FILTER searches BY adType == 'type2';
> >> > > >                GENERATE
> >> > > >                    FLATTEN(group) AS (day, searchType),
> >> > > >                    COUNT(searches) numSearches,
> >> > > >                    SUM(clickCount) AS clickCountPerSearchType,
> >> > > >                    SUM(type1.clickCount) AS type1ClickCount,
> >> > > >                    SUM(type2.clickCount) AS type2ClickCount;
> >> > > >        }
> >> > > > ;
> >> > > >
> >> > > > As you can see above, I am counting the counts by the day and
> search
> >> > type
> >> > > > in clickCountPerSearchType and for each of them i need the counts
> >> > broken
> >> > > by
> >> > > > the ad type.
> >> > > >
> >> > > > Thanks for your help!
> >> > > > Thanks,
> >> > > > Rohini
> >> > > >
> >> > > >
> >> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> >> > > > <pr...@gmail.com>wrote:
> >> > > >
> >> > > > > Hi Rohini,
> >> > > > >
> >> > > > > From your query it looks like you are already grouping it by
> >> TYPE, so
> >> > > not
> >> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
> >> "LOCATION"
> >> > > and
> >> > > > > vice-versa. Your output is already broken down by TYPE.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Prashant
> >> > > > >
> >> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Thanks for the suggestion Prashant. However, that will not
> work
> >> in
> >> > my
> >> > > > > case.
> >> > > > > >
> >> > > > > > If I filter before the group and include the new field in
> group
> >> as
> >> > > you
> >> > > > > > suggested, I get the individual counts broken by the select
> >> field
> >> > > > > > critieria. However, I want the totals also without taking the
> >> > select
> >> > > > > fields
> >> > > > > > into account. That is why I took the approach I described in
> my
> >> > > earlier
> >> > > > > > emails.
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > > Rohini
> >> > > > > >
> >> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> >> > > > > prash1784@gmail.com
> >> > > > > > >wrote:
> >> > > > > >
> >> > > > > > > Please pull your FILTER out of GROUP BY and do it earlier
> >> > > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >> > > > > > >
> >> > > > > > > In this case, you could use a FILTER followed by a bincond
> to
> >> > > > > introduce a
> >> > > > > > > new field "employerOrLocation", then do a group by and
> include
> >> > the
> >> > > > new
> >> > > > > > > field in the GROUP BY clause.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Prashant
> >> > > > > > >
> >> > > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <
> rohini.u@gmail.com
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > > My input data size is 9GB and I am using 20 machines.
> >> > > > > > > >
> >> > > > > > > > My grouped criteria has two cases so I want 1) counts by
> the
> >> > > > > criteria I
> >> > > > > > > > have grouped 2) counts of the two inviduals cases in each
> >> of my
> >> > > > > group.
> >> > > > > > > >
> >> > > > > > > > So my script in detail is:
> >> > > > > > > >
> >> > > > > > > > counts = FOREACH grouped {
> >> > > > > > > >                     selectedFields1 = FILTER rawItems  BY
> >> > > > > > > type="EMPLOYER";
> >> > > > > > > >                   selectedFields2 = FILTER rawItems  BY
> >> > > > > > type="LOCATION";
> >> > > > > > > >                      GENERATE
> >> > > > > > > >                             FLATTEN(group) as (item1,
> item2,
> >> > > item3,
> >> > > > > > > type) ,
> >> > > > > > > >                               SUM(selectedFields1.count)
> as
> >> > > > > > > > selectFields1Count,
> >> > > > > > > >                              SUM(selectedFields2.count) as
> >> > > > > > > > selectFields2Count,
> >> > > > > > > >                             COUNT(rawItems) as
> >> > groupCriteriaCount
> >> > > > > > > >
> >> > > > > > > >              }
> >> > > > > > > >
> >> > > > > > > > Is there a way way to do this?
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> >> > > > dvryaboy@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > you are not doing grouping followed by counting. You are
> >> > doing
> >> > > > > > grouping
> >> > > > > > > > > followed by filtering followed by counting.
> >> > > > > > > > > Try filtering before grouping.
> >> > > > > > > > >
> >> > > > > > > > > D
> >> > > > > > > > >
> >> > > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
> >> > rohini.u@gmail.com
> >> > > >
> >> > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi,
> >> > > > > > > > > >
> >> > > > > > > > > > I have a pig script which does a simple GROUPing
> >> followed
> >> > by
> >> > > > > > couting
> >> > > > > > > > and
> >> > > > > > > > > I
> >> > > > > > > > > > get this error.  My data is certaining not that big
> for
> >> it
> >> > to
> >> > > > > cause
> >> > > > > > > > this
> >> > > > > > > > > > out of memory error. Is there a chance that this is
> >> because
> >> > > of
> >> > > > > some
> >> > > > > > > > bug ?
> >> > > > > > > > > > Did any one come across this kind of error before?
> >> > > > > > > > > >
> >> > > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> >> > > > > > > > > >
> >> > > > > > > > > > My script:
> >> > > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type,
> >> count);
> >> > > > > > > > > >
> >> > > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3,
> type);
> >> > > > > > > > > >
> >> > > > > > > > > > counts = FOREACH grouped {
> >> > > > > > > > > >                     selectedFields = FILTER rawItems
>  BY
> >> > > > > > > > type="EMPLOYER";
> >> > > > > > > > > >                     GENERATE
> >> > > > > > > > > >                             FLATTEN(group) as (item1,
> >> > item2,
> >> > > > > item3,
> >> > > > > > > > > type) ,
> >> > > > > > > > > >                              SUM(selectedFields.count)
> >> as
> >> > > count
> >> > > > > > > > > >
> >> > > > > > > > > >              }
> >> > > > > > > > > >
> >> > > > > > > > > > Stack Trace:
> >> > > > > > > > > >
> >> > > > > > > > > > 2012-03-21 19:19:59,346 FATAL
> >> > org.apache.hadoop.mapred.Child
> >> > > > > > (main):
> >> > > > > > > > > Error
> >> > > > > > > > > > running child : java.lang.OutOfMemoryError: GC
> overhead
> >> > limit
> >> > > > > > > exceeded
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> >> > > > > > > > > >        at
> >> > > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > >
> >> > > >
> >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> >> > > > > > > > > >        at
> >> > > > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> >> > > > > > > > > >        at
> >> > > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >> > > > > > > > > >        at
> >> > java.security.AccessController.doPrivileged(Native
> >> > > > > > Method)
> >> > > > > > > > > >        at
> >> > javax.security.auth.Subject.doAs(Subject.java:396)
> >> > > > > > > > > >        at
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >> > > > > > > > > >        at
> >> > org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > > -Rohini
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > Regards
> >> > > > > > > > -Rohini
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > **
> >> > > > > > > > People of accomplishment rarely sat back & let things
> >> happen to
> >> > > > them.
> >> > > > > > > They
> >> > > > > > > > went out & happened to things - Leonardo Da Vinci
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > -Rohini
> >> >
> >> > --
> >> > **
> >> > People of accomplishment rarely sat back & let things happen to them.
> >> They
> >> > went out & happened to things - Leonardo Da Vinci
> >> >
> >>
> >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Rohini, it's fine even if you could reply with the stacktrace here. I can
add it to JIRA.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 7:10 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Rohini,
>
> Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610
>
> Can you please post the stacktrace as a comment to it?
>
> Thanks,
> Prashant
>
>
> On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <jc...@gmail.com>wrote:
>
>> Rohini,
>>
>> In the meantime, something like the following should work:
>>
>> aw = LOAD 'input' using MyCustomLoader();
>>
>> searches = FOREACH raw GENERATE
>>               day, searchType,
>>               FLATTEN(impBag) AS (adType, clickCount)
>>           ;
>>
>> searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
>> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
>> type2_clickCount;
>>
>> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
>> counts = FOREACH groupedSearches{
>>                GENERATE
>>                   FLATTEN(group) AS (day, searchType),
>>                   COUNT(searches) numSearches,
>>                   SUM(clickCount) AS clickCountPerSearchType,
>>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
>>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
>>       }
>> ;
>>
>> 2012/3/22 Rohini U <ro...@gmail.com>
>>
>> > Thanks Prashant,
>> > I am using Pig 0.9.1 and hadoop 0.20.205
>> >
>> > Thanks,
>> > Rohini
>> >
>> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
>> prash1784@gmail.com
>> > >wrote:
>> >
>> > > This makes more sense, grouping and filter are on different columns. I
>> > will
>> > > open a JIRA soon.
>> > >
>> > > What version of Pig and Hadoop are you using?
>> > >
>> > > Thanks,
>> > > Prashant
>> > >
>> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com> wrote:
>> > >
>> > > > Hi Prashant,
>> > > >
>> > > > Here is my script in full.
>> > > >
>> > > >
>> > > > raw = LOAD 'input' using MyCustomLoader();
>> > > >
>> > > > searches = FOREACH raw GENERATE
>> > > >                day, searchType,
>> > > >                FLATTEN(impBag) AS (adType, clickCount)
>> > > >            ;
>> > > >
>> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
>> > > > counts = FOREACH groupedSearches{
>> > > >                type1 = FILTER searches BY adType == 'type1';
>> > > >                type2 = FILTER searches BY adType == 'type2';
>> > > >                GENERATE
>> > > >                    FLATTEN(group) AS (day, searchType),
>> > > >                    COUNT(searches) numSearches,
>> > > >                    SUM(clickCount) AS clickCountPerSearchType,
>> > > >                    SUM(type1.clickCount) AS type1ClickCount,
>> > > >                    SUM(type2.clickCount) AS type2ClickCount;
>> > > >        }
>> > > > ;
>> > > >
>> > > > As you can see above, I am counting the counts by the day and search
>> > type
>> > > > in clickCountPerSearchType and for each of them i need the counts
>> > broken
>> > > by
>> > > > the ad type.
>> > > >
>> > > > Thanks for your help!
>> > > > Thanks,
>> > > > Rohini
>> > > >
>> > > >
>> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
>> > > > <pr...@gmail.com>wrote:
>> > > >
>> > > > > Hi Rohini,
>> > > > >
>> > > > > From your query it looks like you are already grouping it by
>> TYPE, so
>> > > not
>> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
>> "LOCATION"
>> > > and
>> > > > > vice-versa. Your output is already broken down by TYPE.
>> > > > >
>> > > > > Thanks,
>> > > > > Prashant
>> > > > >
>> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Thanks for the suggestion Prashant. However, that will not work
>> in
>> > my
>> > > > > case.
>> > > > > >
>> > > > > > If I filter before the group and include the new field in group
>> as
>> > > you
>> > > > > > suggested, I get the individual counts broken by the select
>> field
>> > > > > > critieria. However, I want the totals also without taking the
>> > select
>> > > > > fields
>> > > > > > into account. That is why I took the approach I described in my
>> > > earlier
>> > > > > > emails.
>> > > > > >
>> > > > > > Thanks
>> > > > > > Rohini
>> > > > > >
>> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
>> > > > > prash1784@gmail.com
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Please pull your FILTER out of GROUP BY and do it earlier
>> > > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
>> > > > > > >
>> > > > > > > In this case, you could use a FILTER followed by a bincond to
>> > > > > introduce a
>> > > > > > > new field "employerOrLocation", then do a group by and include
>> > the
>> > > > new
>> > > > > > > field in the GROUP BY clause.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Prashant
>> > > > > > >
>> > > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <rohini.u@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > My input data size is 9GB and I am using 20 machines.
>> > > > > > > >
>> > > > > > > > My grouped criteria has two cases so I want 1) counts by the
>> > > > > criteria I
>> > > > > > > > have grouped 2) counts of the two inviduals cases in each
>> of my
>> > > > > group.
>> > > > > > > >
>> > > > > > > > So my script in detail is:
>> > > > > > > >
>> > > > > > > > counts = FOREACH grouped {
>> > > > > > > >                     selectedFields1 = FILTER rawItems  BY
>> > > > > > > type="EMPLOYER";
>> > > > > > > >                   selectedFields2 = FILTER rawItems  BY
>> > > > > > type="LOCATION";
>> > > > > > > >                      GENERATE
>> > > > > > > >                             FLATTEN(group) as (item1, item2,
>> > > item3,
>> > > > > > > type) ,
>> > > > > > > >                               SUM(selectedFields1.count) as
>> > > > > > > > selectFields1Count,
>> > > > > > > >                              SUM(selectedFields2.count) as
>> > > > > > > > selectFields2Count,
>> > > > > > > >                             COUNT(rawItems) as
>> > groupCriteriaCount
>> > > > > > > >
>> > > > > > > >              }
>> > > > > > > >
>> > > > > > > > Is there a way way to do this?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
>> > > > dvryaboy@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > you are not doing grouping followed by counting. You are
>> > doing
>> > > > > > grouping
>> > > > > > > > > followed by filtering followed by counting.
>> > > > > > > > > Try filtering before grouping.
>> > > > > > > > >
>> > > > > > > > > D
>> > > > > > > > >
>> > > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
>> > rohini.u@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi,
>> > > > > > > > > >
>> > > > > > > > > > I have a pig script which does a simple GROUPing
>> followed
>> > by
>> > > > > > couting
>> > > > > > > > and
>> > > > > > > > > I
>> > > > > > > > > > get this error.  My data is certaining not that big for
>> it
>> > to
>> > > > > cause
>> > > > > > > > this
>> > > > > > > > > > out of memory error. Is there a chance that this is
>> because
>> > > of
>> > > > > some
>> > > > > > > > bug ?
>> > > > > > > > > > Did any one come across this kind of error before?
>> > > > > > > > > >
>> > > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
>> > > > > > > > > >
>> > > > > > > > > > My script:
>> > > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type,
>> count);
>> > > > > > > > > >
>> > > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
>> > > > > > > > > >
>> > > > > > > > > > counts = FOREACH grouped {
>> > > > > > > > > >                     selectedFields = FILTER rawItems  BY
>> > > > > > > > type="EMPLOYER";
>> > > > > > > > > >                     GENERATE
>> > > > > > > > > >                             FLATTEN(group) as (item1,
>> > item2,
>> > > > > item3,
>> > > > > > > > > type) ,
>> > > > > > > > > >                              SUM(selectedFields.count)
>> as
>> > > count
>> > > > > > > > > >
>> > > > > > > > > >              }
>> > > > > > > > > >
>> > > > > > > > > > Stack Trace:
>> > > > > > > > > >
>> > > > > > > > > > 2012-03-21 19:19:59,346 FATAL
>> > org.apache.hadoop.mapred.Child
>> > > > > > (main):
>> > > > > > > > > Error
>> > > > > > > > > > running child : java.lang.OutOfMemoryError: GC overhead
>> > limit
>> > > > > > > exceeded
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>> > > > > > > > > >        at
>> > > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > >
>> > > >
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
>> > > > > > > > > >        at
>> > > > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
>> > > > > > > > > >        at
>> > > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> > > > > > > > > >        at
>> > java.security.AccessController.doPrivileged(Native
>> > > > > > Method)
>> > > > > > > > > >        at
>> > javax.security.auth.Subject.doAs(Subject.java:396)
>> > > > > > > > > >        at
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> > > > > > > > > >        at
>> > org.apache.hadoop.mapred.Child.main(Child.java:249)
>> > > > > > > > > >
>> > > > > > > > > > Thanks
>> > > > > > > > > > -Rohini
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Regards
>> > > > > > > > -Rohini
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > **
>> > > > > > > > People of accomplishment rarely sat back & let things
>> happen to
>> > > > them.
>> > > > > > > They
>> > > > > > > > went out & happened to things - Leonardo Da Vinci
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Regards
>> > -Rohini
>> >
>> > --
>> > **
>> > People of accomplishment rarely sat back & let things happen to them.
>> They
>> > went out & happened to things - Leonardo Da Vinci
>> >
>>
>
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Rohini,

Here is the JIRA. https://issues.apache.org/jira/browse/PIG-2610

Can you please post the stacktrace as a comment to it?

Thanks,
Prashant

On Thu, Mar 22, 2012 at 2:37 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> Rohini,
>
> In the meantime, something like the following should work:
>
> aw = LOAD 'input' using MyCustomLoader();
>
> searches = FOREACH raw GENERATE
>               day, searchType,
>               FLATTEN(impBag) AS (adType, clickCount)
>           ;
>
> searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
> : 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
> type2_clickCount;
>
> groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
>                GENERATE
>                   FLATTEN(group) AS (day, searchType),
>                   COUNT(searches) numSearches,
>                   SUM(clickCount) AS clickCountPerSearchType,
>                    SUM(searches_2. type1_clickCount) AS type1ClickCount,
>                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
>       }
> ;
>
> 2012/3/22 Rohini U <ro...@gmail.com>
>
> > Thanks Prashant,
> > I am using Pig 0.9.1 and hadoop 0.20.205
> >
> > Thanks,
> > Rohini
> >
> > On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > This makes more sense, grouping and filter are on different columns. I
> > will
> > > open a JIRA soon.
> > >
> > > What version of Pig and Hadoop are you using?
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > Hi Prashant,
> > > >
> > > > Here is my script in full.
> > > >
> > > >
> > > > raw = LOAD 'input' using MyCustomLoader();
> > > >
> > > > searches = FOREACH raw GENERATE
> > > >                day, searchType,
> > > >                FLATTEN(impBag) AS (adType, clickCount)
> > > >            ;
> > > >
> > > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> > > > counts = FOREACH groupedSearches{
> > > >                type1 = FILTER searches BY adType == 'type1';
> > > >                type2 = FILTER searches BY adType == 'type2';
> > > >                GENERATE
> > > >                    FLATTEN(group) AS (day, searchType),
> > > >                    COUNT(searches) numSearches,
> > > >                    SUM(clickCount) AS clickCountPerSearchType,
> > > >                    SUM(type1.clickCount) AS type1ClickCount,
> > > >                    SUM(type2.clickCount) AS type2ClickCount;
> > > >        }
> > > > ;
> > > >
> > > > As you can see above, I am counting the counts by the day and search
> > type
> > > > in clickCountPerSearchType and for each of them i need the counts
> > broken
> > > by
> > > > the ad type.
> > > >
> > > > Thanks for your help!
> > > > Thanks,
> > > > Rohini
> > > >
> > > >
> > > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> > > > <pr...@gmail.com>wrote:
> > > >
> > > > > Hi Rohini,
> > > > >
> > > > > From your query it looks like you are already grouping it by TYPE,
> so
> > > not
> > > > > sure why you would want the SUM of, say "EMPLOYER" type in
> "LOCATION"
> > > and
> > > > > vice-versa. Your output is already broken down by TYPE.
> > > > >
> > > > > Thanks,
> > > > > Prashant
> > > > >
> > > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks for the suggestion Prashant. However, that will not work
> in
> > my
> > > > > case.
> > > > > >
> > > > > > If I filter before the group and include the new field in group
> as
> > > you
> > > > > > suggested, I get the individual counts broken by the select field
> > > > > > critieria. However, I want the totals also without taking the
> > select
> > > > > fields
> > > > > > into account. That is why I took the approach I described in my
> > > earlier
> > > > > > emails.
> > > > > >
> > > > > > Thanks
> > > > > > Rohini
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > > > > prash1784@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > > > > >
> > > > > > > In this case, you could use a FILTER followed by a bincond to
> > > > > introduce a
> > > > > > > new field "employerOrLocation", then do a group by and include
> > the
> > > > new
> > > > > > > field in the GROUP BY clause.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prashant
> > > > > > >
> > > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > My input data size is 9GB and I am using 20 machines.
> > > > > > > >
> > > > > > > > My grouped criteria has two cases so I want 1) counts by the
> > > > > criteria I
> > > > > > > > have grouped 2) counts of the two inviduals cases in each of
> my
> > > > > group.
> > > > > > > >
> > > > > > > > So my script in detail is:
> > > > > > > >
> > > > > > > > counts = FOREACH grouped {
> > > > > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > > > > type="EMPLOYER";
> > > > > > > >                   selectedFields2 = FILTER rawItems  BY
> > > > > > type="LOCATION";
> > > > > > > >                      GENERATE
> > > > > > > >                             FLATTEN(group) as (item1, item2,
> > > item3,
> > > > > > > type) ,
> > > > > > > >                               SUM(selectedFields1.count) as
> > > > > > > > selectFields1Count,
> > > > > > > >                              SUM(selectedFields2.count) as
> > > > > > > > selectFields2Count,
> > > > > > > >                             COUNT(rawItems) as
> > groupCriteriaCount
> > > > > > > >
> > > > > > > >              }
> > > > > > > >
> > > > > > > > Is there a way way to do this?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> > > > dvryaboy@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > you are not doing grouping followed by counting. You are
> > doing
> > > > > > grouping
> > > > > > > > > followed by filtering followed by counting.
> > > > > > > > > Try filtering before grouping.
> > > > > > > > >
> > > > > > > > > D
> > > > > > > > >
> > > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
> > rohini.u@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I have a pig script which does a simple GROUPing followed
> > by
> > > > > > couting
> > > > > > > > and
> > > > > > > > > I
> > > > > > > > > > get this error.  My data is certaining not that big for
> it
> > to
> > > > > cause
> > > > > > > > this
> > > > > > > > > > out of memory error. Is there a chance that this is
> because
> > > of
> > > > > some
> > > > > > > > bug ?
> > > > > > > > > > Did any one come across this kind of error before?
> > > > > > > > > >
> > > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > > > > > >
> > > > > > > > > > My script:
> > > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type,
> count);
> > > > > > > > > >
> > > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > > > > > >
> > > > > > > > > > counts = FOREACH grouped {
> > > > > > > > > >                     selectedFields = FILTER rawItems  BY
> > > > > > > > type="EMPLOYER";
> > > > > > > > > >                     GENERATE
> > > > > > > > > >                             FLATTEN(group) as (item1,
> > item2,
> > > > > item3,
> > > > > > > > > type) ,
> > > > > > > > > >                              SUM(selectedFields.count) as
> > > count
> > > > > > > > > >
> > > > > > > > > >              }
> > > > > > > > > >
> > > > > > > > > > Stack Trace:
> > > > > > > > > >
> > > > > > > > > > 2012-03-21 19:19:59,346 FATAL
> > org.apache.hadoop.mapred.Child
> > > > > > (main):
> > > > > > > > > Error
> > > > > > > > > > running child : java.lang.OutOfMemoryError: GC overhead
> > limit
> > > > > > > exceeded
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > > > > > >        at
> > > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > >
> > > >
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > > > > > >        at
> > > > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > > > > > >        at
> > > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > > > > > >        at
> > java.security.AccessController.doPrivileged(Native
> > > > > > Method)
> > > > > > > > > >        at
> > javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > > > > > >        at
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > > > > > >        at
> > org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > -Rohini
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards
> > > > > > > > -Rohini
> > > > > > > >
> > > > > > > > --
> > > > > > > > **
> > > > > > > > People of accomplishment rarely sat back & let things happen
> to
> > > > them.
> > > > > > > They
> > > > > > > > went out & happened to things - Leonardo Da Vinci
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards
> > -Rohini
> >
> > --
> > **
> > People of accomplishment rarely sat back & let things happen to them.
> They
> > went out & happened to things - Leonardo Da Vinci
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Jonathan Coveney <jc...@gmail.com>.

Rohini,

In the meantime, something like the following should work:

aw = LOAD 'input' using MyCustomLoader();

searches = FOREACH raw GENERATE
               day, searchType,
               FLATTEN(impBag) AS (adType, clickCount)
           ;

searches_2 = foreach searches generate *, ( adType == 'type1' ? clickCount
: 0 ) as type1_clickCount, ( adType == 'type2' ? clickCount : 0 ) as
type2_clickCount;

groupedSearches = GROUP searches_2 BY (day, searchType) PARALLEL 50;
counts = FOREACH groupedSearches{
               GENERATE
                   FLATTEN(group) AS (day, searchType),
                   COUNT(searches) numSearches,
                   SUM(clickCount) AS clickCountPerSearchType,
                   SUM(searches_2. type1_clickCount) AS type1ClickCount,
                   SUM(searches_2. type2_clickCount) AS type2ClickCount;
       }
;

2012/3/22 Rohini U <ro...@gmail.com>

> Thanks Prashant,
> I am using Pig 0.9.1 and hadoop 0.20.205
>
> Thanks,
> Rohini
>
> On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > This makes more sense, grouping and filter are on different columns. I
> will
> > open a JIRA soon.
> >
> > What version of Pig and Hadoop are you using?
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com> wrote:
> >
> > > Hi Prashant,
> > >
> > > Here is my script in full.
> > >
> > >
> > > raw = LOAD 'input' using MyCustomLoader();
> > >
> > > searches = FOREACH raw GENERATE
> > >                day, searchType,
> > >                FLATTEN(impBag) AS (adType, clickCount)
> > >            ;
> > >
> > > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> > > counts = FOREACH groupedSearches{
> > >                type1 = FILTER searches BY adType == 'type1';
> > >                type2 = FILTER searches BY adType == 'type2';
> > >                GENERATE
> > >                    FLATTEN(group) AS (day, searchType),
> > >                    COUNT(searches) numSearches,
> > >                    SUM(clickCount) AS clickCountPerSearchType,
> > >                    SUM(type1.clickCount) AS type1ClickCount,
> > >                    SUM(type2.clickCount) AS type2ClickCount;
> > >        }
> > > ;
> > >
> > > As you can see above, I am counting the counts by the day and search
> type
> > > in clickCountPerSearchType and for each of them i need the counts
> broken
> > by
> > > the ad type.
> > >
> > > Thanks for your help!
> > > Thanks,
> > > Rohini
> > >
> > >
> > > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> > > <pr...@gmail.com>wrote:
> > >
> > > > Hi Rohini,
> > > >
> > > > From your query it looks like you are already grouping it by TYPE, so
> > not
> > > > sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION"
> > and
> > > > vice-versa. Your output is already broken down by TYPE.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for the suggestion Prashant. However, that will not work in
> my
> > > > case.
> > > > >
> > > > > If I filter before the group and include the new field in group as
> > you
> > > > > suggested, I get the individual counts broken by the select field
> > > > > critieria. However, I want the totals also without taking the
> select
> > > > fields
> > > > > into account. That is why I took the approach I described in my
> > earlier
> > > > > emails.
> > > > >
> > > > > Thanks
> > > > > Rohini
> > > > >
> > > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > > > prash1784@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > > > >
> > > > > > In this case, you could use a FILTER followed by a bincond to
> > > > introduce a
> > > > > > new field "employerOrLocation", then do a group by and include
> the
> > > new
> > > > > > field in the GROUP BY clause.
> > > > > >
> > > > > > Thanks,
> > > > > > Prashant
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > My input data size is 9GB and I am using 20 machines.
> > > > > > >
> > > > > > > My grouped criteria has two cases so I want 1) counts by the
> > > > criteria I
> > > > > > > have grouped 2) counts of the two inviduals cases in each of my
> > > > group.
> > > > > > >
> > > > > > > So my script in detail is:
> > > > > > >
> > > > > > > counts = FOREACH grouped {
> > > > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > > > type="EMPLOYER";
> > > > > > >                   selectedFields2 = FILTER rawItems  BY
> > > > > type="LOCATION";
> > > > > > >                      GENERATE
> > > > > > >                             FLATTEN(group) as (item1, item2,
> > item3,
> > > > > > type) ,
> > > > > > >                               SUM(selectedFields1.count) as
> > > > > > > selectFields1Count,
> > > > > > >                              SUM(selectedFields2.count) as
> > > > > > > selectFields2Count,
> > > > > > >                             COUNT(rawItems) as
> groupCriteriaCount
> > > > > > >
> > > > > > >              }
> > > > > > >
> > > > > > > Is there a way way to do this?
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> > > dvryaboy@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > you are not doing grouping followed by counting. You are
> doing
> > > > > grouping
> > > > > > > > followed by filtering followed by counting.
> > > > > > > > Try filtering before grouping.
> > > > > > > >
> > > > > > > > D
> > > > > > > >
> > > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <
> rohini.u@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have a pig script which does a simple GROUPing followed
> by
> > > > > couting
> > > > > > > and
> > > > > > > > I
> > > > > > > > > get this error.  My data is certaining not that big for it
> to
> > > > cause
> > > > > > > this
> > > > > > > > > out of memory error. Is there a chance that this is because
> > of
> > > > some
> > > > > > > bug ?
> > > > > > > > > Did any one come across this kind of error before?
> > > > > > > > >
> > > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > > > > >
> > > > > > > > > My script:
> > > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > > > > >
> > > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > > > > >
> > > > > > > > > counts = FOREACH grouped {
> > > > > > > > >                     selectedFields = FILTER rawItems  BY
> > > > > > > type="EMPLOYER";
> > > > > > > > >                     GENERATE
> > > > > > > > >                             FLATTEN(group) as (item1,
> item2,
> > > > item3,
> > > > > > > > type) ,
> > > > > > > > >                              SUM(selectedFields.count) as
> > count
> > > > > > > > >
> > > > > > > > >              }
> > > > > > > > >
> > > > > > > > > Stack Trace:
> > > > > > > > >
> > > > > > > > > 2012-03-21 19:19:59,346 FATAL
> org.apache.hadoop.mapred.Child
> > > > > (main):
> > > > > > > > Error
> > > > > > > > > running child : java.lang.OutOfMemoryError: GC overhead
> limit
> > > > > > exceeded
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > > > > >        at
> > > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > > > > >        at
> > > > > > > > >
> > > > > >
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > > > > >        at
> > > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > > > > >        at
> > org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > > > > >        at
> java.security.AccessController.doPrivileged(Native
> > > > > Method)
> > > > > > > > >        at
> javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > > > > >        at
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > > > > >        at
> org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > -Rohini
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards
> > > > > > > -Rohini
> > > > > > >
> > > > > > > --
> > > > > > > **
> > > > > > > People of accomplishment rarely sat back & let things happen to
> > > them.
> > > > > > They
> > > > > > > went out & happened to things - Leonardo Da Vinci
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards
> -Rohini
>
> --
> **
> People of accomplishment rarely sat back & let things happen to them. They
> went out & happened to things - Leonardo Da Vinci
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

Thanks Prashant,
I am using Pig 0.9.1 and hadoop 0.20.205

Thanks,
Rohini

On Thu, Mar 22, 2012 at 1:27 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> This makes more sense, grouping and filter are on different columns. I will
> open a JIRA soon.
>
> What version of Pig and Hadoop are you using?
>
> Thanks,
> Prashant
>
> On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com> wrote:
>
> > Hi Prashant,
> >
> > Here is my script in full.
> >
> >
> > raw = LOAD 'input' using MyCustomLoader();
> >
> > searches = FOREACH raw GENERATE
> >                day, searchType,
> >                FLATTEN(impBag) AS (adType, clickCount)
> >            ;
> >
> > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> > counts = FOREACH groupedSearches{
> >                type1 = FILTER searches BY adType == 'type1';
> >                type2 = FILTER searches BY adType == 'type2';
> >                GENERATE
> >                    FLATTEN(group) AS (day, searchType),
> >                    COUNT(searches) numSearches,
> >                    SUM(clickCount) AS clickCountPerSearchType,
> >                    SUM(type1.clickCount) AS type1ClickCount,
> >                    SUM(type2.clickCount) AS type2ClickCount;
> >        }
> > ;
> >
> > As you can see above, I am counting the counts by the day and search type
> > in clickCountPerSearchType and for each of them i need the counts broken
> by
> > the ad type.
> >
> > Thanks for your help!
> > Thanks,
> > Rohini
> >
> >
> > On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> > <pr...@gmail.com>wrote:
> >
> > > Hi Rohini,
> > >
> > > From your query it looks like you are already grouping it by TYPE, so
> not
> > > sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION"
> and
> > > vice-versa. Your output is already broken down by TYPE.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > Thanks for the suggestion Prashant. However, that will not work in my
> > > case.
> > > >
> > > > If I filter before the group and include the new field in group as
> you
> > > > suggested, I get the individual counts broken by the select field
> > > > critieria. However, I want the totals also without taking the select
> > > fields
> > > > into account. That is why I took the approach I described in my
> earlier
> > > > emails.
> > > >
> > > > Thanks
> > > > Rohini
> > > >
> > > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > > prash1784@gmail.com
> > > > >wrote:
> > > >
> > > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > > >
> > > > > In this case, you could use a FILTER followed by a bincond to
> > > introduce a
> > > > > new field "employerOrLocation", then do a group by and include the
> > new
> > > > > field in the GROUP BY clause.
> > > > >
> > > > > Thanks,
> > > > > Prashant
> > > > >
> > > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com>
> > wrote:
> > > > >
> > > > > > My input data size is 9GB and I am using 20 machines.
> > > > > >
> > > > > > My grouped criteria has two cases so I want 1) counts by the
> > > criteria I
> > > > > > have grouped 2) counts of the two inviduals cases in each of my
> > > group.
> > > > > >
> > > > > > So my script in detail is:
> > > > > >
> > > > > > counts = FOREACH grouped {
> > > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > > type="EMPLOYER";
> > > > > >                   selectedFields2 = FILTER rawItems  BY
> > > > type="LOCATION";
> > > > > >                      GENERATE
> > > > > >                             FLATTEN(group) as (item1, item2,
> item3,
> > > > > type) ,
> > > > > >                               SUM(selectedFields1.count) as
> > > > > > selectFields1Count,
> > > > > >                              SUM(selectedFields2.count) as
> > > > > > selectFields2Count,
> > > > > >                             COUNT(rawItems) as groupCriteriaCount
> > > > > >
> > > > > >              }
> > > > > >
> > > > > > Is there a way way to do this?
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> > dvryaboy@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > you are not doing grouping followed by counting. You are doing
> > > > grouping
> > > > > > > followed by filtering followed by counting.
> > > > > > > Try filtering before grouping.
> > > > > > >
> > > > > > > D
> > > > > > >
> > > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <rohini.u@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I have a pig script which does a simple GROUPing followed by
> > > > couting
> > > > > > and
> > > > > > > I
> > > > > > > > get this error.  My data is certaining not that big for it to
> > > cause
> > > > > > this
> > > > > > > > out of memory error. Is there a chance that this is because
> of
> > > some
> > > > > > bug ?
> > > > > > > > Did any one come across this kind of error before?
> > > > > > > >
> > > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > > > >
> > > > > > > > My script:
> > > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > > > >
> > > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > > > >
> > > > > > > > counts = FOREACH grouped {
> > > > > > > >                     selectedFields = FILTER rawItems  BY
> > > > > > type="EMPLOYER";
> > > > > > > >                     GENERATE
> > > > > > > >                             FLATTEN(group) as (item1, item2,
> > > item3,
> > > > > > > type) ,
> > > > > > > >                              SUM(selectedFields.count) as
> count
> > > > > > > >
> > > > > > > >              }
> > > > > > > >
> > > > > > > > Stack Trace:
> > > > > > > >
> > > > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > > > (main):
> > > > > > > Error
> > > > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > > > exceeded
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > > > >        at
> > > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > > > >        at
> > > > > > > >
> > > > >
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > > > >        at
> > > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > > > >        at
> org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > > > >        at java.security.AccessController.doPrivileged(Native
> > > > Method)
> > > > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > > > >        at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > -Rohini
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards
> > > > > > -Rohini
> > > > > >
> > > > > > --
> > > > > > **
> > > > > > People of accomplishment rarely sat back & let things happen to
> > them.
> > > > > They
> > > > > > went out & happened to things - Leonardo Da Vinci
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Regards
-Rohini

--
**
People of accomplishment rarely sat back & let things happen to them. They
went out & happened to things - Leonardo Da Vinci

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

This makes more sense, grouping and filter are on different columns. I will
open a JIRA soon.

What version of Pig and Hadoop are you using?

Thanks,
Prashant

On Thu, Mar 22, 2012 at 1:12 PM, Rohini U <ro...@gmail.com> wrote:

> Hi Prashant,
>
> Here is my script in full.
>
>
> raw = LOAD 'input' using MyCustomLoader();
>
> searches = FOREACH raw GENERATE
>                day, searchType,
>                FLATTEN(impBag) AS (adType, clickCount)
>            ;
>
> groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
> counts = FOREACH groupedSearches{
>                type1 = FILTER searches BY adType == 'type1';
>                type2 = FILTER searches BY adType == 'type2';
>                GENERATE
>                    FLATTEN(group) AS (day, searchType),
>                    COUNT(searches) numSearches,
>                    SUM(clickCount) AS clickCountPerSearchType,
>                    SUM(type1.clickCount) AS type1ClickCount,
>                    SUM(type2.clickCount) AS type2ClickCount;
>        }
> ;
>
> As you can see above, I am counting the counts by the day and search type
> in clickCountPerSearchType and for each of them i need the counts broken by
> the ad type.
>
> Thanks for your help!
> Thanks,
> Rohini
>
>
> On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
> <pr...@gmail.com>wrote:
>
> > Hi Rohini,
> >
> > From your query it looks like you are already grouping it by TYPE, so not
> > sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
> > vice-versa. Your output is already broken down by TYPE.
> >
> > Thanks,
> > Prashant
> >
> > On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com> wrote:
> >
> > > Thanks for the suggestion Prashant. However, that will not work in my
> > case.
> > >
> > > If I filter before the group and include the new field in group as you
> > > suggested, I get the individual counts broken by the select field
> > > critieria. However, I want the totals also without taking the select
> > fields
> > > into account. That is why I took the approach I described in my earlier
> > > emails.
> > >
> > > Thanks
> > > Rohini
> > >
> > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > prash1784@gmail.com
> > > >wrote:
> > >
> > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > >
> > > > In this case, you could use a FILTER followed by a bincond to
> > introduce a
> > > > new field "employerOrLocation", then do a group by and include the
> new
> > > > field in the GROUP BY clause.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com>
> wrote:
> > > >
> > > > > My input data size is 9GB and I am using 20 machines.
> > > > >
> > > > > My grouped criteria has two cases so I want 1) counts by the
> > criteria I
> > > > > have grouped 2) counts of the two inviduals cases in each of my
> > group.
> > > > >
> > > > > So my script in detail is:
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > >                   selectedFields2 = FILTER rawItems  BY
> > > type="LOCATION";
> > > > >                      GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                               SUM(selectedFields1.count) as
> > > > > selectFields1Count,
> > > > >                              SUM(selectedFields2.count) as
> > > > > selectFields2Count,
> > > > >                             COUNT(rawItems) as groupCriteriaCount
> > > > >
> > > > >              }
> > > > >
> > > > > Is there a way way to do this?
> > > > >
> > > > >
> > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> dvryaboy@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > you are not doing grouping followed by counting. You are doing
> > > grouping
> > > > > > followed by filtering followed by counting.
> > > > > > Try filtering before grouping.
> > > > > >
> > > > > > D
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have a pig script which does a simple GROUPing followed by
> > > couting
> > > > > and
> > > > > > I
> > > > > > > get this error.  My data is certaining not that big for it to
> > cause
> > > > > this
> > > > > > > out of memory error. Is there a chance that this is because of
> > some
> > > > > bug ?
> > > > > > > Did any one come across this kind of error before?
> > > > > > >
> > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > > >
> > > > > > > My script:
> > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > > >
> > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > > >
> > > > > > > counts = FOREACH grouped {
> > > > > > >                     selectedFields = FILTER rawItems  BY
> > > > > type="EMPLOYER";
> > > > > > >                     GENERATE
> > > > > > >                             FLATTEN(group) as (item1, item2,
> > item3,
> > > > > > type) ,
> > > > > > >                              SUM(selectedFields.count) as count
> > > > > > >
> > > > > > >              }
> > > > > > >
> > > > > > > Stack Trace:
> > > > > > >
> > > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > > (main):
> > > > > > Error
> > > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > > exceeded
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > > >        at
> > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > > >        at
> > > > > > >
> > > >
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > > >        at
> > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > > >        at java.security.AccessController.doPrivileged(Native
> > > Method)
> > > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > > >
> > > > > > > Thanks
> > > > > > > -Rohini
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards
> > > > > -Rohini
> > > > >
> > > > > --
> > > > > **
> > > > > People of accomplishment rarely sat back & let things happen to
> them.
> > > > They
> > > > > went out & happened to things - Leonardo Da Vinci
> > > > >
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

Hi Prashant,

Here is my script in full.


raw = LOAD 'input' using MyCustomLoader();

searches = FOREACH raw GENERATE
                day, searchType,
                FLATTEN(impBag) AS (adType, clickCount)
            ;

groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
counts = FOREACH groupedSearches{
                type1 = FILTER searches BY adType == 'type1';
                type2 = FILTER searches BY adType == 'type2';
                GENERATE
                    FLATTEN(group) AS (day, searchType),
                    COUNT(searches) numSearches,
                    SUM(clickCount) AS clickCountPerSearchType,
                    SUM(type1.clickCount) AS type1ClickCount,
                    SUM(type2.clickCount) AS type2ClickCount;
        }
;

As you can see above, I am counting the counts by the day and search type
in clickCountPerSearchType and for each of them i need the counts broken by
the ad type.

Thanks for your help!
Thanks,
Rohini


On Thu, Mar 22, 2012 at 12:44 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Hi Rohini,
>
> From your query it looks like you are already grouping it by TYPE, so not
> sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
> vice-versa. Your output is already broken down by TYPE.
>
> Thanks,
> Prashant
>
> On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com> wrote:
>
> > Thanks for the suggestion Prashant. However, that will not work in my
> case.
> >
> > If I filter before the group and include the new field in group as you
> > suggested, I get the individual counts broken by the select field
> > critieria. However, I want the totals also without taking the select
> fields
> > into account. That is why I took the approach I described in my earlier
> > emails.
> >
> > Thanks
> > Rohini
> >
> > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Please pull your FILTER out of GROUP BY and do it earlier
> > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > >
> > > In this case, you could use a FILTER followed by a bincond to
> introduce a
> > > new field "employerOrLocation", then do a group by and include the new
> > > field in the GROUP BY clause.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > My input data size is 9GB and I am using 20 machines.
> > > >
> > > > My grouped criteria has two cases so I want 1) counts by the
> criteria I
> > > > have grouped 2) counts of the two inviduals cases in each of my
> group.
> > > >
> > > > So my script in detail is:
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields1 = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > >                   selectedFields2 = FILTER rawItems  BY
> > type="LOCATION";
> > > >                      GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                               SUM(selectedFields1.count) as
> > > > selectFields1Count,
> > > >                              SUM(selectedFields2.count) as
> > > > selectFields2Count,
> > > >                             COUNT(rawItems) as groupCriteriaCount
> > > >
> > > >              }
> > > >
> > > > Is there a way way to do this?
> > > >
> > > >
> > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > > wrote:
> > > >
> > > > > you are not doing grouping followed by counting. You are doing
> > grouping
> > > > > followed by filtering followed by counting.
> > > > > Try filtering before grouping.
> > > > >
> > > > > D
> > > > >
> > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a pig script which does a simple GROUPing followed by
> > couting
> > > > and
> > > > > I
> > > > > > get this error.  My data is certaining not that big for it to
> cause
> > > > this
> > > > > > out of memory error. Is there a chance that this is because of
> some
> > > > bug ?
> > > > > > Did any one come across this kind of error before?
> > > > > >
> > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > >
> > > > > > My script:
> > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > >
> > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > >
> > > > > > counts = FOREACH grouped {
> > > > > >                     selectedFields = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > > >                     GENERATE
> > > > > >                             FLATTEN(group) as (item1, item2,
> item3,
> > > > > type) ,
> > > > > >                              SUM(selectedFields.count) as count
> > > > > >
> > > > > >              }
> > > > > >
> > > > > > Stack Trace:
> > > > > >
> > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > (main):
> > > > > Error
> > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > exceeded
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > >        at
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > >        at
> > > > > >
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > >        at
> > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > >        at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > >
> > > > > > Thanks
> > > > > > -Rohini
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > -Rohini
> > > >
> > > > --
> > > > **
> > > > People of accomplishment rarely sat back & let things happen to them.
> > > They
> > > > went out & happened to things - Leonardo Da Vinci
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Hi Rohini,

>From your query it looks like you are already grouping it by TYPE, so not
sure why you would want the SUM of, say "EMPLOYER" type in "LOCATION" and
vice-versa. Your output is already broken down by TYPE.

Thanks,
Prashant

On Thu, Mar 22, 2012 at 9:03 AM, Rohini U <ro...@gmail.com> wrote:

> Thanks for the suggestion Prashant. However, that will not work in my case.
>
> If I filter before the group and include the new field in group as you
> suggested, I get the individual counts broken by the select field
> critieria. However, I want the totals also without taking the select fields
> into account. That is why I took the approach I described in my earlier
> emails.
>
> Thanks
> Rohini
>
> On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Please pull your FILTER out of GROUP BY and do it earlier
> > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >
> > In this case, you could use a FILTER followed by a bincond to introduce a
> > new field "employerOrLocation", then do a group by and include the new
> > field in the GROUP BY clause.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
> >
> > > My input data size is 9GB and I am using 20 machines.
> > >
> > > My grouped criteria has two cases so I want 1) counts by the criteria I
> > > have grouped 2) counts of the two inviduals cases in each of my group.
> > >
> > > So my script in detail is:
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields1 = FILTER rawItems  BY
> > type="EMPLOYER";
> > >                   selectedFields2 = FILTER rawItems  BY
> type="LOCATION";
> > >                      GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                               SUM(selectedFields1.count) as
> > > selectFields1Count,
> > >                              SUM(selectedFields2.count) as
> > > selectFields2Count,
> > >                             COUNT(rawItems) as groupCriteriaCount
> > >
> > >              }
> > >
> > > Is there a way way to do this?
> > >
> > >
> > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > wrote:
> > >
> > > > you are not doing grouping followed by counting. You are doing
> grouping
> > > > followed by filtering followed by counting.
> > > > Try filtering before grouping.
> > > >
> > > > D
> > > >
> > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a pig script which does a simple GROUPing followed by
> couting
> > > and
> > > > I
> > > > > get this error.  My data is certaining not that big for it to cause
> > > this
> > > > > out of memory error. Is there a chance that this is because of some
> > > bug ?
> > > > > Did any one come across this kind of error before?
> > > > >
> > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > >
> > > > > My script:
> > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > >
> > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > > >                     GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                              SUM(selectedFields.count) as count
> > > > >
> > > > >              }
> > > > >
> > > > > Stack Trace:
> > > > >
> > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> (main):
> > > > Error
> > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > >        at
> > > > >
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > >        at
> > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > >        at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > >
> > > > > Thanks
> > > > > -Rohini
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards
> > > -Rohini
> > >
> > > --
> > > **
> > > People of accomplishment rarely sat back & let things happen to them.
> > They
> > > went out & happened to things - Leonardo Da Vinci
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

Thanks for the suggestion Prashant. However, that will not work in my case.

If I filter before the group and include the new field in group as you
suggested, I get the individual counts broken by the select field
critieria. However, I want the totals also without taking the select fields
into account. That is why I took the approach I described in my earlier
emails.

Thanks
Rohini

On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Please pull your FILTER out of GROUP BY and do it earlier
> http://pig.apache.org/docs/r0.9.1/perf.html#filter
>
> In this case, you could use a FILTER followed by a bincond to introduce a
> new field "employerOrLocation", then do a group by and include the new
> field in the GROUP BY clause.
>
> Thanks,
> Prashant
>
> On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
>
> > My input data size is 9GB and I am using 20 machines.
> >
> > My grouped criteria has two cases so I want 1) counts by the criteria I
> > have grouped 2) counts of the two inviduals cases in each of my group.
> >
> > So my script in detail is:
> >
> > counts = FOREACH grouped {
> >                     selectedFields1 = FILTER rawItems  BY
> type="EMPLOYER";
> >                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
> >                      GENERATE
> >                             FLATTEN(group) as (item1, item2, item3,
> type) ,
> >                               SUM(selectedFields1.count) as
> > selectFields1Count,
> >                              SUM(selectedFields2.count) as
> > selectFields2Count,
> >                             COUNT(rawItems) as groupCriteriaCount
> >
> >              }
> >
> > Is there a way way to do this?
> >
> >
> > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > you are not doing grouping followed by counting. You are doing grouping
> > > followed by filtering followed by counting.
> > > Try filtering before grouping.
> > >
> > > D
> > >
> > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a pig script which does a simple GROUPing followed by couting
> > and
> > > I
> > > > get this error.  My data is certaining not that big for it to cause
> > this
> > > > out of memory error. Is there a chance that this is because of some
> > bug ?
> > > > Did any one come across this kind of error before?
> > > >
> > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > >
> > > > My script:
> > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > >
> > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields = FILTER rawItems  BY
> > type="EMPLOYER";
> > > >                     GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                              SUM(selectedFields.count) as count
> > > >
> > > >              }
> > > >
> > > > Stack Trace:
> > > >
> > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> > > Error
> > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > >        at
> > > >
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > >        at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > >        at java.security.AccessController.doPrivileged(Native Method)
> > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > >
> > > > Thanks
> > > > -Rohini
> > > >
> > >
> >
> >
> >
> > --
> > Regards
> > -Rohini
> >
> > --
> > **
> > People of accomplishment rarely sat back & let things happen to them.
> They
> > went out & happened to things - Leonardo Da Vinci
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

It's done for some cases, but this one is different since the group key
needs to change.

D

On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Sure I can do that. Isn't this something that should be done already? Or
> does it not work if the filter is working on a field that is part of the
> group?
>
> On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > Prashant, mind filing a jira with this example? Technically, this is
> > something we could do automatically.
> >
> > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Please pull your FILTER out of GROUP BY and do it earlier
> > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > >
> > > In this case, you could use a FILTER followed by a bincond to
> introduce a
> > > new field "employerOrLocation", then do a group by and include the new
> > > field in the GROUP BY clause.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > My input data size is 9GB and I am using 20 machines.
> > > >
> > > > My grouped criteria has two cases so I want 1) counts by the
> criteria I
> > > > have grouped 2) counts of the two inviduals cases in each of my
> group.
> > > >
> > > > So my script in detail is:
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields1 = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > >                   selectedFields2 = FILTER rawItems  BY
> > type="LOCATION";
> > > >                      GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                               SUM(selectedFields1.count) as
> > > > selectFields1Count,
> > > >                              SUM(selectedFields2.count) as
> > > > selectFields2Count,
> > > >                             COUNT(rawItems) as groupCriteriaCount
> > > >
> > > >              }
> > > >
> > > > Is there a way way to do this?
> > > >
> > > >
> > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > > wrote:
> > > >
> > > > > you are not doing grouping followed by counting. You are doing
> > grouping
> > > > > followed by filtering followed by counting.
> > > > > Try filtering before grouping.
> > > > >
> > > > > D
> > > > >
> > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a pig script which does a simple GROUPing followed by
> > couting
> > > > and
> > > > > I
> > > > > > get this error.  My data is certaining not that big for it to
> cause
> > > > this
> > > > > > out of memory error. Is there a chance that this is because of
> some
> > > > bug ?
> > > > > > Did any one come across this kind of error before?
> > > > > >
> > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > >
> > > > > > My script:
> > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > >
> > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > >
> > > > > > counts = FOREACH grouped {
> > > > > >                     selectedFields = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > > >                     GENERATE
> > > > > >                             FLATTEN(group) as (item1, item2,
> item3,
> > > > > type) ,
> > > > > >                              SUM(selectedFields.count) as count
> > > > > >
> > > > > >              }
> > > > > >
> > > > > > Stack Trace:
> > > > > >
> > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > (main):
> > > > > Error
> > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > exceeded
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > >        at
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > >        at
> > > > > >
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > >        at
> > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > >        at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > >
> > > > > > Thanks
> > > > > > -Rohini
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > -Rohini
> > > >
> > > > --
> > > > **
> > > > People of accomplishment rarely sat back & let things happen to them.
> > > They
> > > > went out & happened to things - Leonardo Da Vinci
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

So, as explained earlier, the reason you are running out of memory is that
you are loading all records into memory when you want to do non-algebraic
things to results of grouping.

Can you come up with ways to achieve what you need without having to have
the raw records at the reducer?

One way has been suggested. It's reasonably straightforward to figure out
the solution to your question given advice already provided.

D

On Thu, Mar 22, 2012 at 9:06 AM, Rohini U <ro...@gmail.com> wrote:

> Has a Jira been filed for this? I can send my example I am trying if that
> helps.
>
> Thanks,
> Rohini
>
> On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
> <pr...@gmail.com>wrote:
>
> > Sure I can do that. Isn't this something that should be done already? Or
> > does it not work if the filter is working on a field that is part of the
> > group?
> >
> > On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > Prashant, mind filing a jira with this example? Technically, this is
> > > something we could do automatically.
> > >
> > > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> > prash1784@gmail.com
> > > >wrote:
> > >
> > > > Please pull your FILTER out of GROUP BY and do it earlier
> > > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > > >
> > > > In this case, you could use a FILTER followed by a bincond to
> > introduce a
> > > > new field "employerOrLocation", then do a group by and include the
> new
> > > > field in the GROUP BY clause.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
> > > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com>
> wrote:
> > > >
> > > > > My input data size is 9GB and I am using 20 machines.
> > > > >
> > > > > My grouped criteria has two cases so I want 1) counts by the
> > criteria I
> > > > > have grouped 2) counts of the two inviduals cases in each of my
> > group.
> > > > >
> > > > > So my script in detail is:
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields1 = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > >                   selectedFields2 = FILTER rawItems  BY
> > > type="LOCATION";
> > > > >                      GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                               SUM(selectedFields1.count) as
> > > > > selectFields1Count,
> > > > >                              SUM(selectedFields2.count) as
> > > > > selectFields2Count,
> > > > >                             COUNT(rawItems) as groupCriteriaCount
> > > > >
> > > > >              }
> > > > >
> > > > > Is there a way way to do this?
> > > > >
> > > > >
> > > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <
> dvryaboy@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > you are not doing grouping followed by counting. You are doing
> > > grouping
> > > > > > followed by filtering followed by counting.
> > > > > > Try filtering before grouping.
> > > > > >
> > > > > > D
> > > > > >
> > > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have a pig script which does a simple GROUPing followed by
> > > couting
> > > > > and
> > > > > > I
> > > > > > > get this error.  My data is certaining not that big for it to
> > cause
> > > > > this
> > > > > > > out of memory error. Is there a chance that this is because of
> > some
> > > > > bug ?
> > > > > > > Did any one come across this kind of error before?
> > > > > > >
> > > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > > >
> > > > > > > My script:
> > > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > > >
> > > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > > >
> > > > > > > counts = FOREACH grouped {
> > > > > > >                     selectedFields = FILTER rawItems  BY
> > > > > type="EMPLOYER";
> > > > > > >                     GENERATE
> > > > > > >                             FLATTEN(group) as (item1, item2,
> > item3,
> > > > > > type) ,
> > > > > > >                              SUM(selectedFields.count) as count
> > > > > > >
> > > > > > >              }
> > > > > > >
> > > > > > > Stack Trace:
> > > > > > >
> > > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > > (main):
> > > > > > Error
> > > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > > exceeded
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > > >        at
> > org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > > >        at
> > > > > > >
> > > >
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > > >        at
> > > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > > >        at java.security.AccessController.doPrivileged(Native
> > > Method)
> > > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > > >        at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > > >
> > > > > > > Thanks
> > > > > > > -Rohini
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards
> > > > > -Rohini
> > > > >
> > > > > --
> > > > > **
> > > > > People of accomplishment rarely sat back & let things happen to
> them.
> > > > They
> > > > > went out & happened to things - Leonardo Da Vinci
> > > > >
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

Has a Jira been filed for this? I can send my example I am trying if that
helps.

Thanks,
Rohini

On Wed, Mar 21, 2012 at 11:41 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> Sure I can do that. Isn't this something that should be done already? Or
> does it not work if the filter is working on a field that is part of the
> group?
>
> On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > Prashant, mind filing a jira with this example? Technically, this is
> > something we could do automatically.
> >
> > On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Please pull your FILTER out of GROUP BY and do it earlier
> > > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> > >
> > > In this case, you could use a FILTER followed by a bincond to
> introduce a
> > > new field "employerOrLocation", then do a group by and include the new
> > > field in the GROUP BY clause.
> > >
> > > Thanks,
> > > Prashant
> > >
> > > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > My input data size is 9GB and I am using 20 machines.
> > > >
> > > > My grouped criteria has two cases so I want 1) counts by the
> criteria I
> > > > have grouped 2) counts of the two inviduals cases in each of my
> group.
> > > >
> > > > So my script in detail is:
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields1 = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > >                   selectedFields2 = FILTER rawItems  BY
> > type="LOCATION";
> > > >                      GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                               SUM(selectedFields1.count) as
> > > > selectFields1Count,
> > > >                              SUM(selectedFields2.count) as
> > > > selectFields2Count,
> > > >                             COUNT(rawItems) as groupCriteriaCount
> > > >
> > > >              }
> > > >
> > > > Is there a way way to do this?
> > > >
> > > >
> > > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > > wrote:
> > > >
> > > > > you are not doing grouping followed by counting. You are doing
> > grouping
> > > > > followed by filtering followed by counting.
> > > > > Try filtering before grouping.
> > > > >
> > > > > D
> > > > >
> > > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a pig script which does a simple GROUPing followed by
> > couting
> > > > and
> > > > > I
> > > > > > get this error.  My data is certaining not that big for it to
> cause
> > > > this
> > > > > > out of memory error. Is there a chance that this is because of
> some
> > > > bug ?
> > > > > > Did any one come across this kind of error before?
> > > > > >
> > > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > > >
> > > > > > My script:
> > > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > > >
> > > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > > >
> > > > > > counts = FOREACH grouped {
> > > > > >                     selectedFields = FILTER rawItems  BY
> > > > type="EMPLOYER";
> > > > > >                     GENERATE
> > > > > >                             FLATTEN(group) as (item1, item2,
> item3,
> > > > > type) ,
> > > > > >                              SUM(selectedFields.count) as count
> > > > > >
> > > > > >              }
> > > > > >
> > > > > > Stack Trace:
> > > > > >
> > > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> > (main):
> > > > > Error
> > > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > > exceeded
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > > >        at
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > > >        at
> > > > > >
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > > >        at
> > > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > > >        at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > > >        at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > > >
> > > > > > Thanks
> > > > > > -Rohini
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > -Rohini
> > > >
> > > > --
> > > > **
> > > > People of accomplishment rarely sat back & let things happen to them.
> > > They
> > > > went out & happened to things - Leonardo Da Vinci
> > > >
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Sure I can do that. Isn't this something that should be done already? Or
does it not work if the filter is working on a field that is part of the
group?

On Wed, Mar 21, 2012 at 11:02 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Prashant, mind filing a jira with this example? Technically, this is
> something we could do automatically.
>
> On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Please pull your FILTER out of GROUP BY and do it earlier
> > http://pig.apache.org/docs/r0.9.1/perf.html#filter
> >
> > In this case, you could use a FILTER followed by a bincond to introduce a
> > new field "employerOrLocation", then do a group by and include the new
> > field in the GROUP BY clause.
> >
> > Thanks,
> > Prashant
> >
> > On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
> >
> > > My input data size is 9GB and I am using 20 machines.
> > >
> > > My grouped criteria has two cases so I want 1) counts by the criteria I
> > > have grouped 2) counts of the two inviduals cases in each of my group.
> > >
> > > So my script in detail is:
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields1 = FILTER rawItems  BY
> > type="EMPLOYER";
> > >                   selectedFields2 = FILTER rawItems  BY
> type="LOCATION";
> > >                      GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                               SUM(selectedFields1.count) as
> > > selectFields1Count,
> > >                              SUM(selectedFields2.count) as
> > > selectFields2Count,
> > >                             COUNT(rawItems) as groupCriteriaCount
> > >
> > >              }
> > >
> > > Is there a way way to do this?
> > >
> > >
> > > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > wrote:
> > >
> > > > you are not doing grouping followed by counting. You are doing
> grouping
> > > > followed by filtering followed by counting.
> > > > Try filtering before grouping.
> > > >
> > > > D
> > > >
> > > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a pig script which does a simple GROUPing followed by
> couting
> > > and
> > > > I
> > > > > get this error.  My data is certaining not that big for it to cause
> > > this
> > > > > out of memory error. Is there a chance that this is because of some
> > > bug ?
> > > > > Did any one come across this kind of error before?
> > > > >
> > > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > > >
> > > > > My script:
> > > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > > >
> > > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > > >
> > > > > counts = FOREACH grouped {
> > > > >                     selectedFields = FILTER rawItems  BY
> > > type="EMPLOYER";
> > > > >                     GENERATE
> > > > >                             FLATTEN(group) as (item1, item2, item3,
> > > > type) ,
> > > > >                              SUM(selectedFields.count) as count
> > > > >
> > > > >              }
> > > > >
> > > > > Stack Trace:
> > > > >
> > > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child
> (main):
> > > > Error
> > > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> > exceeded
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > > >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > > >        at
> > > > >
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > > >        at
> > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > > >        at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > >
> > > > > Thanks
> > > > > -Rohini
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards
> > > -Rohini
> > >
> > > --
> > > **
> > > People of accomplishment rarely sat back & let things happen to them.
> > They
> > > went out & happened to things - Leonardo Da Vinci
> > >
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Prashant, mind filing a jira with this example? Technically, this is
something we could do automatically.

On Wed, Mar 21, 2012 at 5:02 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Please pull your FILTER out of GROUP BY and do it earlier
> http://pig.apache.org/docs/r0.9.1/perf.html#filter
>
> In this case, you could use a FILTER followed by a bincond to introduce a
> new field "employerOrLocation", then do a group by and include the new
> field in the GROUP BY clause.
>
> Thanks,
> Prashant
>
> On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:
>
> > My input data size is 9GB and I am using 20 machines.
> >
> > My grouped criteria has two cases so I want 1) counts by the criteria I
> > have grouped 2) counts of the two inviduals cases in each of my group.
> >
> > So my script in detail is:
> >
> > counts = FOREACH grouped {
> >                     selectedFields1 = FILTER rawItems  BY
> type="EMPLOYER";
> >                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
> >                      GENERATE
> >                             FLATTEN(group) as (item1, item2, item3,
> type) ,
> >                               SUM(selectedFields1.count) as
> > selectFields1Count,
> >                              SUM(selectedFields2.count) as
> > selectFields2Count,
> >                             COUNT(rawItems) as groupCriteriaCount
> >
> >              }
> >
> > Is there a way way to do this?
> >
> >
> > On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > you are not doing grouping followed by counting. You are doing grouping
> > > followed by filtering followed by counting.
> > > Try filtering before grouping.
> > >
> > > D
> > >
> > > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a pig script which does a simple GROUPing followed by couting
> > and
> > > I
> > > > get this error.  My data is certaining not that big for it to cause
> > this
> > > > out of memory error. Is there a chance that this is because of some
> > bug ?
> > > > Did any one come across this kind of error before?
> > > >
> > > > I am using pig 0.9.1 with hadoop 0.20.205
> > > >
> > > > My script:
> > > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > > >
> > > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > > >
> > > > counts = FOREACH grouped {
> > > >                     selectedFields = FILTER rawItems  BY
> > type="EMPLOYER";
> > > >                     GENERATE
> > > >                             FLATTEN(group) as (item1, item2, item3,
> > > type) ,
> > > >                              SUM(selectedFields.count) as count
> > > >
> > > >              }
> > > >
> > > > Stack Trace:
> > > >
> > > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> > > Error
> > > > running child : java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > > >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > > >        at
> > > >
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > > >        at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > >        at java.security.AccessController.doPrivileged(Native Method)
> > > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > >
> > > > Thanks
> > > > -Rohini
> > > >
> > >
> >
> >
> >
> > --
> > Regards
> > -Rohini
> >
> > --
> > **
> > People of accomplishment rarely sat back & let things happen to them.
> They
> > went out & happened to things - Leonardo Da Vinci
> >
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Please pull your FILTER out of GROUP BY and do it earlier
http://pig.apache.org/docs/r0.9.1/perf.html#filter

In this case, you could use a FILTER followed by a bincond to introduce a
new field "employerOrLocation", then do a group by and include the new
field in the GROUP BY clause.

Thanks,
Prashant

On Wed, Mar 21, 2012 at 4:45 PM, Rohini U <ro...@gmail.com> wrote:

> My input data size is 9GB and I am using 20 machines.
>
> My grouped criteria has two cases so I want 1) counts by the criteria I
> have grouped 2) counts of the two inviduals cases in each of my group.
>
> So my script in detail is:
>
> counts = FOREACH grouped {
>                     selectedFields1 = FILTER rawItems  BY type="EMPLOYER";
>                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
>                      GENERATE
>                             FLATTEN(group) as (item1, item2, item3, type) ,
>                               SUM(selectedFields1.count) as
> selectFields1Count,
>                              SUM(selectedFields2.count) as
> selectFields2Count,
>                             COUNT(rawItems) as groupCriteriaCount
>
>              }
>
> Is there a way way to do this?
>
>
> On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > you are not doing grouping followed by counting. You are doing grouping
> > followed by filtering followed by counting.
> > Try filtering before grouping.
> >
> > D
> >
> > On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a pig script which does a simple GROUPing followed by couting
> and
> > I
> > > get this error.  My data is certaining not that big for it to cause
> this
> > > out of memory error. Is there a chance that this is because of some
> bug ?
> > > Did any one come across this kind of error before?
> > >
> > > I am using pig 0.9.1 with hadoop 0.20.205
> > >
> > > My script:
> > > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> > >
> > > grouped = GROUP rawItems BY (item1, item2, item3, type);
> > >
> > > counts = FOREACH grouped {
> > >                     selectedFields = FILTER rawItems  BY
> type="EMPLOYER";
> > >                     GENERATE
> > >                             FLATTEN(group) as (item1, item2, item3,
> > type) ,
> > >                              SUM(selectedFields.count) as count
> > >
> > >              }
> > >
> > > Stack Trace:
> > >
> > > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> > Error
> > > running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> > >        at
> > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> > >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> > >        at
> > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> > >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> > >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > >        at java.security.AccessController.doPrivileged(Native Method)
> > >        at javax.security.auth.Subject.doAs(Subject.java:396)
> > >        at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >
> > > Thanks
> > > -Rohini
> > >
> >
>
>
>
> --
> Regards
> -Rohini
>
> --
> **
> People of accomplishment rarely sat back & let things happen to them. They
> went out & happened to things - Leonardo Da Vinci
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Rohini U <ro...@gmail.com>.

My input data size is 9GB and I am using 20 machines.

My grouped criteria has two cases so I want 1) counts by the criteria I
have grouped 2) counts of the two inviduals cases in each of my group.

So my script in detail is:

counts = FOREACH grouped {
                     selectedFields1 = FILTER rawItems  BY type="EMPLOYER";
                   selectedFields2 = FILTER rawItems  BY type="LOCATION";
                     GENERATE
                             FLATTEN(group) as (item1, item2, item3, type) ,
                              SUM(selectedFields1.count) as
selectFields1Count,
                              SUM(selectedFields2.count) as
selectFields2Count,
                             COUNT(rawItems) as groupCriteriaCount

              }

Is there a way way to do this?


On Wed, Mar 21, 2012 at 4:29 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> you are not doing grouping followed by counting. You are doing grouping
> followed by filtering followed by counting.
> Try filtering before grouping.
>
> D
>
> On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a pig script which does a simple GROUPing followed by couting and
> I
> > get this error.  My data is certaining not that big for it to cause this
> > out of memory error. Is there a chance that this is because of some bug ?
> > Did any one come across this kind of error before?
> >
> > I am using pig 0.9.1 with hadoop 0.20.205
> >
> > My script:
> > rawItems = LOAD 'in' as (item1, item2, item3, type, count);
> >
> > grouped = GROUP rawItems BY (item1, item2, item3, type);
> >
> > counts = FOREACH grouped {
> >                     selectedFields = FILTER rawItems  BY type="EMPLOYER";
> >                     GENERATE
> >                             FLATTEN(group) as (item1, item2, item3,
> type) ,
> >                              SUM(selectedFields.count) as count
> >
> >              }
> >
> > Stack Trace:
> >
> > 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main):
> Error
> > running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
> >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> >        at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
> >        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >        at java.security.AccessController.doPrivileged(Native Method)
> >        at javax.security.auth.Subject.doAs(Subject.java:396)
> >        at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> > Thanks
> > -Rohini
> >
>



-- 
Regards
-Rohini

--
**
People of accomplishment rarely sat back & let things happen to them. They
went out & happened to things - Leonardo Da Vinci

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

you are not doing grouping followed by counting. You are doing grouping
followed by filtering followed by counting.
Try filtering before grouping.

D

On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:

> Hi,
>
> I have a pig script which does a simple GROUPing followed by couting and I
> get this error.  My data is certaining not that big for it to cause this
> out of memory error. Is there a chance that this is because of some bug ?
> Did any one come across this kind of error before?
>
> I am using pig 0.9.1 with hadoop 0.20.205
>
> My script:
> rawItems = LOAD 'in' as (item1, item2, item3, type, count);
>
> grouped = GROUP rawItems BY (item1, item2, item3, type);
>
> counts = FOREACH grouped {
>                     selectedFields = FILTER rawItems  BY type="EMPLOYER";
>                     GENERATE
>                             FLATTEN(group) as (item1, item2, item3, type) ,
>                              SUM(selectedFields.count) as count
>
>              }
>
> Stack Trace:
>
> 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main): Error
> running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>        at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> Thanks
> -Rohini
>

Re: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Prashant Kommireddi <pr...@gmail.com>.

Hi Rohini,

Can you provide some details into how big is the input dataset, data volume
that reducers receive from Mappers and the number of reducers you are using?

Thanks,
Prashant

On Wed, Mar 21, 2012 at 12:34 PM, Rohini U <ro...@gmail.com> wrote:

> Hi,
>
> I have a pig script which does a simple GROUPing followed by couting and I
> get this error.  My data is certaining not that big for it to cause this
> out of memory error. Is there a chance that this is because of some bug ?
> Did any one come across this kind of error before?
>
> I am using pig 0.9.1 with hadoop 0.20.205
>
> My script:
> rawItems = LOAD 'in' as (item1, item2, item3, type, count);
>
> grouped = GROUP rawItems BY (item1, item2, item3, type);
>
> counts = FOREACH grouped {
>                     selectedFields = FILTER rawItems  BY type="EMPLOYER";
>                     GENERATE
>                             FLATTEN(group) as (item1, item2, item3, type) ,
>                              SUM(selectedFields.count) as count
>
>              }
>
> Stack Trace:
>
> 2012-03-21 19:19:59,346 FATAL org.apache.hadoop.mapred.Child (main): Error
> running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:387)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:406)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:570)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:248)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:316)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:453)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:281)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:324)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:459)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>        at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:662)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:425)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> Thanks
> -Rohini
>