You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Tanton Gibbs <ta...@gmail.com> on 2008/05/23 07:52:06 UTC

Spillable memory manager

I upgraded to hadoop 17 and the latest Pig from svn.

I'm now getting a ton of lines in my log files that say:

2008-05-23 00:49:27,832 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called init = 1441792(1408K) used = 483176072(471851K) committed =
641335296(626304K) max = 954466304(932096K)

In addition, jobs on big files are running very slowly.

Does anyone have any ideas as to what I could have screwed up?

Thanks!
Tanton

Re: Spillable memory manager

Posted by Iván de Prado <iv...@gmail.com>.
I have update to trunk and Hadoop 0.17.0. The memory limit per task is
400 Mb. An OutOfMemory exception is launched at first reduce. I have
notice that this Pig script worked with 1GB of memory per task. What are
the memory requirements for PIG? 

Thanks!
Iván de Prado
www.ivanprado.es

2008-05-30 11:21:29,863 INFO
org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
called init = 5439488(5312K) used = 166885368(162973K) committed =
246087680(240320K) max = 279642112(273088K)
2008-05-30 11:21:33,069 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 225822592(220529K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:21:36,047 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 169349352(165380K) committed = 267780096(261504K) max = 279642112(273088K)
2008-05-30 11:21:39,369 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 267780072(261503K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:21:44,505 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 255668880(249676K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:21:51,019 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 265970168(259736K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:21:58,115 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 266914224(260658K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:22:01,423 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 223674352(218431K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:22:05,163 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 258252264(252199K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-30 11:22:41,457 ERROR org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce: java.lang.OutOfMemoryError: Java heap space

________________________________________________________________________
Explain:

Logical Plan:
|---LOSort ( BY GENERATE {[FLATTEN PROJECT $1]} ) 
      |---LOEval ( GENERATE {[FLATTEN PROJECT $1],[FLATTEN PROJECT $2],[FLATTEN PROJECT $3],[FLATTEN PROJECT $4],[FLATTEN PROJECT $5]} ) 
            |---LOCogroup ( GENERATE {[PROJECT $0],[*]}, GENERATE {[PROJECT $0],[*]}, GENERATE {[PROJECT $0],[*]}, GENERATE {[PROJECT $0],[*]}, GENERATE {[PROJECT $0],[*]} ) 
                  |---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]})]} ) 
                        |---LOCogroup ( GENERATE {[PROJECT $2],[*]} ) 
                              |---LOEval ( [FILTER BY ([PROJECT $6] == ['1'])] ) 
                                    |---LOEval ( GENERATE {[FLATTEN PROJECT $1],[FLATTEN PROJECT $2]} ) 
                                          |---LOCogroup ( GENERATE {[PROJECT $1],[*]}, GENERATE {[PROJECT $0],[*]} ) 
                                                |---LOEval ( [FILTER BY (([PROJECT $3] == ['2']) AND ([PROJECT $4] != ['0']) AND ([PROJECT $6] != ['0']) AND ([PROJECT $6] != ['2']))] ) 
                                                      |---LOLoad ( file = /user/properazzi/mc/mc_20080529000002/input/partition_B.dump AS id,wid,locid,status,proptype,country,sor )
                                                |---LOLoad ( file = /user/properazzi/flm/quotas.txt AS wqid )
                  |---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]})]} ) 
                        |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
                              |---LOEval ( GENERATE {[FLATTEN PROJECT $0]} ) 
                                    |---LOCogroup ( GENERATE {[*],[*]} ) 
                                          |---LOEval ( GENERATE {[PROJECT $2],[PROJECT $1]} ) 
                                                |---LOEval ( [FILTER BY ([PROJECT $6] == ['1'])] ) 
                                                      |---LOEval ( GENERATE {[FLATTEN PROJECT $1],[FLATTEN PROJECT $2]} ) 
                                                            |---LOCogroup ( GENERATE {[PROJECT $1],[*]}, GENERATE {[PROJECT $0],[*]} ) 
                                                                  |---LOEval ( [FILTER BY (([PROJECT $3] == ['2']) AND ([PROJECT $4] != ['0']) AND ([PROJECT $6] != ['0']) AND ([PROJECT $6] != ['2']))] ) 
                                                                        |---LOLoad ( file = /user/properazzi/mc/mc_20080529000002/input/partition_B.dump AS id,wid,locid,status,proptype,country,sor )
                                                                  |---LOLoad ( file = /user/properazzi/flm/quotas.txt AS wqid )
                  |---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]})]} ) 
                        |---LOCogroup ( GENERATE {[PROJECT $2],[*]} ) 
                              |---LOEval ( [FILTER BY (([PROJECT $6] == ['3']) OR ([PROJECT $6] == ['4']) OR ([PROJECT $6] == ['5']))] ) 
                                    |---LOEval ( GENERATE {[FLATTEN PROJECT $1],[FLATTEN PROJECT $2]} ) 
                                          |---LOCogroup ( GENERATE {[PROJECT $1],[*]}, GENERATE {[PROJECT $0],[*]} ) 
                                                |---LOEval ( [FILTER BY (([PROJECT $3] == ['2']) AND ([PROJECT $4] != ['0']) AND ([PROJECT $6] != ['0']) AND ([PROJECT $6] != ['2']))] ) 
                                                      |---LOLoad ( file = /user/properazzi/mc/mc_20080529000002/input/partition_B.dump AS id,wid,locid,status,proptype,country,sor )
                                                |---LOLoad ( file = /user/properazzi/flm/quotas.txt AS wqid )
                  |---LOEval ( GENERATE {[PROJECT $0],[COUNT(GENERATE {[PROJECT $1]})]} ) 
                        |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
                              |---LOEval ( GENERATE {[FLATTEN PROJECT $0]} ) 
                                    |---LOCogroup ( GENERATE {[*],[*]} ) 
                                          |---LOEval ( GENERATE {[PROJECT $2],[PROJECT $1]} ) 
                                                |---LOEval ( [FILTER BY (([PROJECT $6] == ['3']) OR ([PROJECT $6] == ['4']) OR ([PROJECT $6] == ['5']))] ) 
                                                      |---LOEval ( GENERATE {[FLATTEN PROJECT $1],[FLATTEN PROJECT $2]} ) 
                                                            |---LOCogroup ( GENERATE {[PROJECT $1],[*]}, GENERATE {[PROJECT $0],[*]} ) 
                                                                  |---LOEval ( [FILTER BY (([PROJECT $3] == ['2']) AND ([PROJECT $4] != ['0']) AND ([PROJECT $6] != ['0']) AND ([PROJECT $6] != ['2']))] ) 
                                                                        |---LOLoad ( file = /user/properazzi/mc/mc_20080529000002/input/partition_B.dump AS id,wid,locid,status,proptype,country,sor )
                                                                  |---LOLoad ( file = /user/properazzi/flm/quotas.txt AS wqid )
                  |---LOEval ( GENERATE {[FLATTEN PROJECT $0]} ) 
                        |---LOCogroup ( GENERATE {[*],[*]} ) 
                              |---LOEval ( GENERATE {[PROJECT $2],[PROJECT $5]} ) 
                                    |---LOLoad ( file = /user/properazzi/mc/mc_20080529000002/input/partition_B.dump AS id,wid,locid,status,proptype,country,sor )
-----------------------------------------------
Physical Plan:
|---POMapreduce
    Partition Function: org.apache.pig.backend.hadoop.executionengine.mapreduceExec.SortPartitioner

    Map : *
    Reduce : Generate(Project(1))
    Grouping : Generate(Generate(Project(1)),*)
    Input File(s) : /tmp/temp1398936874/tmp-1538794351
    Properties : 
      |---POMapreduce
          Map : Composite(*,Generate(Project(1)))
          Reduce : Generate(FuncEval(org.apache.pig.impl.builtin.FindQuantiles(Generate(Const(1),Composite(Project(1),Sort(*))))))
          Grouping : Generate(Const(all),*)
          Input File(s) : /tmp/temp1398936874/tmp-1538794351
          Properties : 
            |---POMapreduce
                Map : *****
                Reduce : Generate(Project(1),Project(2),Project(3),Project(4),Project(5))
                Grouping : Generate(Project(0),*)Generate(Project(0),*)Generate(Project(0),*)Generate(Project(0),*)Generate(Project(0),*)
                Input File(s) : /tmp/temp1398936874/tmp-585863913, /tmp/temp1398936874/tmp-536934015, /tmp/temp1398936874/tmp23578316, /tmp/temp1398936874/tmp662497645, /tmp/temp1398936874/tmp582570364
                Properties : pig.input.splittable:true
                  |---POMapreduce
                      Map : *
                      Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Project(1)))))
                      Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
                      Grouping : Generate(Project(2),*)
                      Input File(s) : /tmp/temp1398936874/tmp-1880872512
                      Properties : pig.input.splittable:true
                        |---POMapreduce
                            Map : Composite(*,Filter:  AND )*
                            Reduce : Composite(Generate(Project(1),Project(2)),Filter:  COMP )
                            Grouping : Generate(Project(1),*)Generate(Project(0),*)
                            Input File(s) : /user/properazzi/mc/mc_20080529000002/input/partition_B.dump, /user/properazzi/flm/quotas.txt
                            Properties : pig.input.splittable:true
                  |---POMapreduce
                      Map : *
                      Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Project(1)))))
                      Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
                      Grouping : Generate(Project(0),*)
                      Input File(s) : /tmp/temp1398936874/tmp-1242543041
                      Properties : pig.input.splittable:true
                        |---POMapreduce
                            Map : *
                            Reduce : Generate(Project(0))
                            Grouping : Generate(*,*)
                            Input File(s) : /tmp/temp1398936874/tmp2015750396
                            Properties : pig.input.splittable:true
                              |---POMapreduce
                                  Map : Composite(*,Filter:  AND )*
                                  Reduce : Composite(Generate(Project(1),Project(2)),Filter:  COMP ,Generate(Project(2),Project(1)))
                                  Grouping : Generate(Project(1),*)Generate(Project(0),*)
                                  Input File(s) : /user/properazzi/mc/mc_20080529000002/input/partition_B.dump, /user/properazzi/flm/quotas.txt
                                  Properties : pig.input.splittable:true
                  |---POMapreduce
                      Map : *
                      Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Project(1)))))
                      Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
                      Grouping : Generate(Project(2),*)
                      Input File(s) : /tmp/temp1398936874/tmp-1934972255
                      Properties : pig.input.splittable:true
                        |---POMapreduce
                            Map : Composite(*,Filter:  AND )*
                            Reduce : Composite(Generate(Project(1),Project(2)),Filter:  OR )
                            Grouping : Generate(Project(1),*)Generate(Project(0),*)
                            Input File(s) : /user/properazzi/mc/mc_20080529000002/input/partition_B.dump, /user/properazzi/flm/quotas.txt
                            Properties : pig.input.splittable:true
                  |---POMapreduce
                      Map : *
                      Combine : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Initial(Generate(Project(1)))))
                      Reduce : Generate(Project(0),FuncEval(org.apache.pig.builtin.COUNT$Final(Generate(Composite(Project(1),Project(1))))))
                      Grouping : Generate(Project(0),*)
                      Input File(s) : /tmp/temp1398936874/tmp799024189
                      Properties : pig.input.splittable:true
                        |---POMapreduce
                            Map : *
                            Reduce : Generate(Project(0))
                            Grouping : Generate(*,*)
                            Input File(s) : /tmp/temp1398936874/tmp1055965366
                            Properties : pig.input.splittable:true
                              |---POMapreduce
                                  Map : Composite(*,Filter:  AND )*
                                  Reduce : Composite(Generate(Project(1),Project(2)),Filter:  OR ,Generate(Project(2),Project(1)))
                                  Grouping : Generate(Project(1),*)Generate(Project(0),*)
                                  Input File(s) : /user/properazzi/mc/mc_20080529000002/input/partition_B.dump, /user/properazzi/flm/quotas.txt
                                  Properties : pig.input.splittable:true
                  |---POMapreduce
                      Map : Composite(*,Generate(Project(2),Project(5)))
                      Reduce : Generate(Project(0))
                      Grouping : Generate(*,*)
                      Input File(s) : /user/properazzi/mc/mc_20080529000002/input/partition_B.dump
                      Properties : pig.input.splittable:true


El vie, 30-05-2008 a las 20:01 +1000, pi song escribió:
> We've already fixed the memory issue introduced in Pig-85. Could you please
> update to the latest version and try again?
> 
> Pi
> 
> On Wed, May 28, 2008 at 9:18 AM, pi song <pi...@gmail.com> wrote:
> 
> > This might have nothing to do with Hadoop 0.17 but something else that we
> > fixed right after it. I'm investigating. Sorry for inconvenience.
> >
> > FYI,
> > Pi
> >
> >
> > On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
> >>
> >> I think you need to increase the amount of memory you give to java.
> >>
> >> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
> >> course it depends  on how much ram you have available.
> >>
> >> mapred.child.java.opts is the parameter
> >> mine is currently set to 2048M in my hadoop-site.xml file.
> >>
> >> For performance reasons, I upped the io.sort.mb parameter.  However,
> >> if this is too close to 50% of the total memory, you will get the
> >> Spillable messages.
> >>
> >> HTH,
> >> Tanton
> >>
> >
> >


Re: Spillable memory manager

Posted by Alan Gates <ga...@YAHOO-INC.COM>.
What's currently on top of trunk requires use of hadoop17.jar instead of 
hadoop16.jar.

Alan.

Iván de Prado wrote:
> I did ant clean, and tried to recompile with hadoop16.jar. But I got
> these compilation errors:
>
>     [javac] Compiling 241 source files to /opt/hd1/pig-trunk/build/classes
>     [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:28: cannot find symbol
>     [javac] symbol  : class FileOutputFormat
>     [javac] location: package org.apache.hadoop.mapred
>     [javac] import org.apache.hadoop.mapred.FileOutputFormat;
>     [javac]                                 ^
>     [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/SortPartitioner.java:26: cannot find symbol
>     [javac] symbol  : class RawComparator
>     [javac] location: package org.apache.hadoop.io
>     [javac] import org.apache.hadoop.io.RawComparator;
>     [javac]                             ^
>     [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/SortPartitioner.java:37: cannot find symbol
>     [javac] symbol  : class RawComparator
>     [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.SortPartitioner
>     [javac]     RawComparator comparator;
>     [javac]     ^
>     [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:48: cannot find symbol
>     [javac] symbol  : variable FileOutputFormat
>     [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat
>     [javac]         Path outputDir = FileOutputFormat.getWorkOutputPath(job);
>     [javac]                          ^
>     [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:62: cannot find symbol
>     [javac] symbol  : variable FileOutputFormat
>     [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat
>     [javac]         String parentName = FileOutputFormat.getOutputPath(job).getName();
>
>
> Is not Pig compatible with Hadoop 0.16 anymore? Did I do something wrong when compiling?
>
> I'm using the revision 661633
>
> Iván
>
> El vie, 30-05-2008 a las 13:06 +0200, Iván de Prado escribió:
>   
>> With the latest version I'm getting an error:
>>
>> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileOutputFormat
>> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:48)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.setupMapPipe(PigMapReduce.java:257)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:111)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
>> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.FileOutputFormat
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>> 	... 5 more
>>
>> What did I do wrong? I'm launching pig using bin/pig script. Before the update, it worked. 
>>
>> Iván de Prado
>> www.ivanprado.es
>>
>>
>> El vie, 30-05-2008 a las 20:01 +1000, pi song escribió:
>>     
>>> We've already fixed the memory issue introduced in Pig-85. Could you please
>>> update to the latest version and try again?
>>>
>>> Pi
>>>
>>> On Wed, May 28, 2008 at 9:18 AM, pi song <pi...@gmail.com> wrote:
>>>
>>>       
>>>> This might have nothing to do with Hadoop 0.17 but something else that we
>>>> fixed right after it. I'm investigating. Sorry for inconvenience.
>>>>
>>>> FYI,
>>>> Pi
>>>>
>>>>
>>>> On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
>>>>         
>>>>> I think you need to increase the amount of memory you give to java.
>>>>>
>>>>> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
>>>>> course it depends  on how much ram you have available.
>>>>>
>>>>> mapred.child.java.opts is the parameter
>>>>> mine is currently set to 2048M in my hadoop-site.xml file.
>>>>>
>>>>> For performance reasons, I upped the io.sort.mb parameter.  However,
>>>>> if this is too close to 50% of the total memory, you will get the
>>>>> Spillable messages.
>>>>>
>>>>> HTH,
>>>>> Tanton
>>>>>
>>>>>           
>>>>         
>
>   

Re: Spillable memory manager

Posted by Iván de Prado <iv...@properazzi.com>.
I did ant clean, and tried to recompile with hadoop16.jar. But I got
these compilation errors:

    [javac] Compiling 241 source files to /opt/hd1/pig-trunk/build/classes
    [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:28: cannot find symbol
    [javac] symbol  : class FileOutputFormat
    [javac] location: package org.apache.hadoop.mapred
    [javac] import org.apache.hadoop.mapred.FileOutputFormat;
    [javac]                                 ^
    [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/SortPartitioner.java:26: cannot find symbol
    [javac] symbol  : class RawComparator
    [javac] location: package org.apache.hadoop.io
    [javac] import org.apache.hadoop.io.RawComparator;
    [javac]                             ^
    [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/SortPartitioner.java:37: cannot find symbol
    [javac] symbol  : class RawComparator
    [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.SortPartitioner
    [javac]     RawComparator comparator;
    [javac]     ^
    [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:48: cannot find symbol
    [javac] symbol  : variable FileOutputFormat
    [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat
    [javac]         Path outputDir = FileOutputFormat.getWorkOutputPath(job);
    [javac]                          ^
    [javac] /opt/hd1/pig-trunk/src/org/apache/pig/backend/hadoop/executionengine/mapreduceExec/PigOutputFormat.java:62: cannot find symbol
    [javac] symbol  : variable FileOutputFormat
    [javac] location: class org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat
    [javac]         String parentName = FileOutputFormat.getOutputPath(job).getName();


Is not Pig compatible with Hadoop 0.16 anymore? Did I do something wrong when compiling?

I'm using the revision 661633

Iván

El vie, 30-05-2008 a las 13:06 +0200, Iván de Prado escribió:
> With the latest version I'm getting an error:
> 
> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileOutputFormat
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.setupMapPipe(PigMapReduce.java:257)
> 	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:111)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.FileOutputFormat
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> 	... 5 more
> 
> What did I do wrong? I'm launching pig using bin/pig script. Before the update, it worked. 
> 
> Iván de Prado
> www.ivanprado.es
> 
> 
> El vie, 30-05-2008 a las 20:01 +1000, pi song escribió:
> > We've already fixed the memory issue introduced in Pig-85. Could you please
> > update to the latest version and try again?
> > 
> > Pi
> > 
> > On Wed, May 28, 2008 at 9:18 AM, pi song <pi...@gmail.com> wrote:
> > 
> > > This might have nothing to do with Hadoop 0.17 but something else that we
> > > fixed right after it. I'm investigating. Sorry for inconvenience.
> > >
> > > FYI,
> > > Pi
> > >
> > >
> > > On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
> > >>
> > >> I think you need to increase the amount of memory you give to java.
> > >>
> > >> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
> > >> course it depends  on how much ram you have available.
> > >>
> > >> mapred.child.java.opts is the parameter
> > >> mine is currently set to 2048M in my hadoop-site.xml file.
> > >>
> > >> For performance reasons, I upped the io.sort.mb parameter.  However,
> > >> if this is too close to 50% of the total memory, you will get the
> > >> Spillable messages.
> > >>
> > >> HTH,
> > >> Tanton
> > >>
> > >
> > >


Re: Spillable memory manager

Posted by Iván de Prado <iv...@gmail.com>.
With the latest version I'm getting an error:

java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileOutputFormat
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.setupMapPipe(PigMapReduce.java:257)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:111)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.FileOutputFormat
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
	... 5 more

What did I do wrong? I'm launching pig using bin/pig script. Before the update, it worked. 

Iván de Prado
www.ivanprado.es


El vie, 30-05-2008 a las 20:01 +1000, pi song escribió:
> We've already fixed the memory issue introduced in Pig-85. Could you please
> update to the latest version and try again?
> 
> Pi
> 
> On Wed, May 28, 2008 at 9:18 AM, pi song <pi...@gmail.com> wrote:
> 
> > This might have nothing to do with Hadoop 0.17 but something else that we
> > fixed right after it. I'm investigating. Sorry for inconvenience.
> >
> > FYI,
> > Pi
> >
> >
> > On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
> >>
> >> I think you need to increase the amount of memory you give to java.
> >>
> >> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
> >> course it depends  on how much ram you have available.
> >>
> >> mapred.child.java.opts is the parameter
> >> mine is currently set to 2048M in my hadoop-site.xml file.
> >>
> >> For performance reasons, I upped the io.sort.mb parameter.  However,
> >> if this is too close to 50% of the total memory, you will get the
> >> Spillable messages.
> >>
> >> HTH,
> >> Tanton
> >>
> >
> >


Re: Spillable memory manager

Posted by pi song <pi...@gmail.com>.
We've already fixed the memory issue introduced in Pig-85. Could you please
update to the latest version and try again?

Pi

On Wed, May 28, 2008 at 9:18 AM, pi song <pi...@gmail.com> wrote:

> This might have nothing to do with Hadoop 0.17 but something else that we
> fixed right after it. I'm investigating. Sorry for inconvenience.
>
> FYI,
> Pi
>
>
> On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
>>
>> I think you need to increase the amount of memory you give to java.
>>
>> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
>> course it depends  on how much ram you have available.
>>
>> mapred.child.java.opts is the parameter
>> mine is currently set to 2048M in my hadoop-site.xml file.
>>
>> For performance reasons, I upped the io.sort.mb parameter.  However,
>> if this is too close to 50% of the total memory, you will get the
>> Spillable messages.
>>
>> HTH,
>> Tanton
>>
>
>

Re: Spillable memory manager

Posted by pi song <pi...@gmail.com>.
This might have nothing to do with Hadoop 0.17 but something else that we
fixed right after it. I'm investigating. Sorry for inconvenience.

FYI,
Pi


On 5/28/08, Tanton Gibbs <ta...@gmail.com> wrote:
>
> I think you need to increase the amount of memory you give to java.
>
> It looks like it is currently set to 256M.  I upped mine to 2G.  Of
> course it depends  on how much ram you have available.
>
> mapred.child.java.opts is the parameter
> mine is currently set to 2048M in my hadoop-site.xml file.
>
> For performance reasons, I upped the io.sort.mb parameter.  However,
> if this is too close to 50% of the total memory, you will get the
> Spillable messages.
>
> HTH,
> Tanton
>

Re: Spillable memory manager

Posted by Tanton Gibbs <ta...@gmail.com>.
 I think you need to increase the amount of memory you give to java.

It looks like it is currently set to 256M.  I upped mine to 2G.  Of
course it depends  on how much ram you have available.

mapred.child.java.opts is the parameter
mine is currently set to 2048M in my hadoop-site.xml file.

For performance reasons, I upped the io.sort.mb parameter.  However,
if this is too close to 50% of the total memory, you will get the
Spillable messages.

HTH,
Tanton

Re: Spillable memory manager

Posted by Iván de Prado <iv...@gmail.com>.
Hi Tanton, 

I am having the same problem, but I have got an Out of Memory Exception
in the reduce phase. Which Hadoop config parameter did you change? Is it
the io.seqfile.compress.blocksize?

My current value for this parameter is:

<property>
  <name>io.seqfile.compress.blocksize</name>
  <value>1000000</value>
  <description>The minimum block size for compression in block compressed
                                SequenceFiles.
  </description>
</property>

The logs I got:

2008-05-27 11:05:38,087 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 273852568(267434K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:05:38,802 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 273860104(267441K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:05:44,893 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 279642088(273087K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:05:52,704 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 217972656(212863K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:05:56,510 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 269271376(262960K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:03,686 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 244418296(238689K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:10,610 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 269740120(263418K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:16,370 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 271831992(265460K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:19,813 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 258029960(251982K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:23,948 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 279642104(273087K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:27,208 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 195321216(190743K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:30,932 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 266489648(260243K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:34,463 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 279642088(273087K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:38,214 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 279642112(273088K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:06:44,571 INFO org.apache.pig.impl.util.SpillableMemoryManager: low memory handler called init = 5439488(5312K) used = 268382184(262091K) committed = 279642112(273088K) max = 279642112(273088K)
2008-05-27 11:11:02,570 INFO org.apache.hadoop.mapred.TaskRunner: Communication exception: java.lang.OutOfMemoryError: Java heap space

2008-05-27 11:11:02,571 ERROR org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce: java.lang.OutOfMemoryError: Java heap space
2008-05-27 11:11:03,234 INFO org.apache.hadoop.ipc.Client: java.net.SocketException: Socket closed
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.FilterInputStream.read(FilterInputStream.java:116)
	at org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:190)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at java.io.DataInputStream.readInt(DataInputStream.java:370)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:276)

Thanks, 
Iván


El vie, 23-05-2008 a las 10:23 -0500, Tanton Gibbs escribió:
> I upped my maximum memory from 1024M to 2048M and the problem went
> away.  I think the problem was that my sortable memory was already set
> to 400M so it was very close to the 50% mark already.
> 
> Is there a way to up the spillable threshold to 80%?
> 
> On Fri, May 23, 2008 at 10:04 AM, Tanton Gibbs <ta...@gmail.com> wrote:
> > It is in a map phase.  I don't think I used a custom chunker.   My
> > splits are set to  be 128M.
> >
> > On Fri, May 23, 2008 at 9:07 AM, pi song <pi...@gmail.com> wrote:
> >> Dear Tanton,
> >>
> >> This means the MemoryManager is not successful at reclaiming memory.  Did
> >> that happen in Map phase or Reduce phase? Did you use a custom chunker? How
> >> big is your split?
> >>
> >> Pi
> >>
> >> On Fri, May 23, 2008 at 3:52 PM, Tanton Gibbs <ta...@gmail.com>
> >> wrote:
> >>
> >>> I upgraded to hadoop 17 and the latest Pig from svn.
> >>>
> >>> I'm now getting a ton of lines in my log files that say:
> >>>
> >>> 2008-05-23 00:49:27,832 INFO
> >>> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> >>> called init = 1441792(1408K) used = 483176072(471851K) committed =
> >>> 641335296(626304K) max = 954466304(932096K)
> >>>
> >>> In addition, jobs on big files are running very slowly.
> >>>
> >>> Does anyone have any ideas as to what I could have screwed up?
> >>>
> >>> Thanks!
> >>> Tanton
> >>>
> >>
> >


Re: Spillable memory manager

Posted by Tanton Gibbs <ta...@gmail.com>.
I upped my maximum memory from 1024M to 2048M and the problem went
away.  I think the problem was that my sortable memory was already set
to 400M so it was very close to the 50% mark already.

Is there a way to up the spillable threshold to 80%?

On Fri, May 23, 2008 at 10:04 AM, Tanton Gibbs <ta...@gmail.com> wrote:
> It is in a map phase.  I don't think I used a custom chunker.   My
> splits are set to  be 128M.
>
> On Fri, May 23, 2008 at 9:07 AM, pi song <pi...@gmail.com> wrote:
>> Dear Tanton,
>>
>> This means the MemoryManager is not successful at reclaiming memory.  Did
>> that happen in Map phase or Reduce phase? Did you use a custom chunker? How
>> big is your split?
>>
>> Pi
>>
>> On Fri, May 23, 2008 at 3:52 PM, Tanton Gibbs <ta...@gmail.com>
>> wrote:
>>
>>> I upgraded to hadoop 17 and the latest Pig from svn.
>>>
>>> I'm now getting a ton of lines in my log files that say:
>>>
>>> 2008-05-23 00:49:27,832 INFO
>>> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
>>> called init = 1441792(1408K) used = 483176072(471851K) committed =
>>> 641335296(626304K) max = 954466304(932096K)
>>>
>>> In addition, jobs on big files are running very slowly.
>>>
>>> Does anyone have any ideas as to what I could have screwed up?
>>>
>>> Thanks!
>>> Tanton
>>>
>>
>

Re: Spillable memory manager

Posted by Tanton Gibbs <ta...@gmail.com>.
It is in a map phase.  I don't think I used a custom chunker.   My
splits are set to  be 128M.

On Fri, May 23, 2008 at 9:07 AM, pi song <pi...@gmail.com> wrote:
> Dear Tanton,
>
> This means the MemoryManager is not successful at reclaiming memory.  Did
> that happen in Map phase or Reduce phase? Did you use a custom chunker? How
> big is your split?
>
> Pi
>
> On Fri, May 23, 2008 at 3:52 PM, Tanton Gibbs <ta...@gmail.com>
> wrote:
>
>> I upgraded to hadoop 17 and the latest Pig from svn.
>>
>> I'm now getting a ton of lines in my log files that say:
>>
>> 2008-05-23 00:49:27,832 INFO
>> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
>> called init = 1441792(1408K) used = 483176072(471851K) committed =
>> 641335296(626304K) max = 954466304(932096K)
>>
>> In addition, jobs on big files are running very slowly.
>>
>> Does anyone have any ideas as to what I could have screwed up?
>>
>> Thanks!
>> Tanton
>>
>

Re: Spillable memory manager

Posted by pi song <pi...@gmail.com>.
Dear Tanton,

This means the MemoryManager is not successful at reclaiming memory.  Did
that happen in Map phase or Reduce phase? Did you use a custom chunker? How
big is your split?

Pi

On Fri, May 23, 2008 at 3:52 PM, Tanton Gibbs <ta...@gmail.com>
wrote:

> I upgraded to hadoop 17 and the latest Pig from svn.
>
> I'm now getting a ton of lines in my log files that say:
>
> 2008-05-23 00:49:27,832 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 1441792(1408K) used = 483176072(471851K) committed =
> 641335296(626304K) max = 954466304(932096K)
>
> In addition, jobs on big files are running very slowly.
>
> Does anyone have any ideas as to what I could have screwed up?
>
> Thanks!
> Tanton
>

Re: Spillable memory manager

Posted by pi song <pi...@gmail.com>.
Dear Tanton,

This means the MemoryManager is not successful at reclaiming memory.  Did
that happen in Map phase or Reduce phase? Did you use a custom chunker? How
big is your split?

Pi

On Fri, May 23, 2008 at 3:52 PM, Tanton Gibbs <ta...@gmail.com>
wrote:

> I upgraded to hadoop 17 and the latest Pig from svn.
>
> I'm now getting a ton of lines in my log files that say:
>
> 2008-05-23 00:49:27,832 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 1441792(1408K) used = 483176072(471851K) committed =
> 641335296(626304K) max = 954466304(932096K)
>
> In addition, jobs on big files are running very slowly.
>
> Does anyone have any ideas as to what I could have screwed up?
>
> Thanks!
> Tanton
>